Verses to Visions:

Poetry and Generative AI Images - Midori Davis

ESSAY

Upon receiving short poems as prompts, can AI (DALL-E) fill in the blanks and generate images similar to what the writer envisioned when writing the poems? Can images generated by AI in this manner be used to gauge the efficacy of poems in articulating the artistic visions of writers?

These questions are intriguing to me as the boundaries of what AI is and is not have been expanding with this rapidly-developing technology. Writers generally have a vision with their work, and this is especially true for poets. With how AI can be used as a collaborative partner, engaging in a call-and-response conversation to develop ideas, I think ChatGPT’s DALL-E can assist tremendously with the creative process to help writers get a sense for what other people may see as they read their work, and even help them find the right words that evoke imagery that is the truest manifestation of their creative vision. To test this, I came up with three different short poems: one written entirely on my own, one written with the help of ChatGPT, and one written entirely by ChatGPT with just a simple prompt from me about the topic. Brevity was key, as DALL-E can get overwhelmed by too much information in a single prompt.

The first poem, written entirely by me, is about deer and how they have adapted to control the growth of bone cancer into what we recognize today as antlers:

Over a bed of ferns softly trodden,
Morning mist embraces the mountainside.

The echo alone remains
Of the deer's crown once untamed.

Millennia of once-unbridled malign growth;
Bone chiseled, hewn, whittled, ground

Year after year
To keep death at bay.

The second poem was started by me, and finished with AI helping to find suitable words for alliteration that made sense:

Synthetic synapses snap simultaneously as nimble neural networks navigate nearby.

Daring data dreams dart downward, minding magnificent machine learning model magic.

The third poem was generated entirely by AI after I asked it to generate a poem about sugar-loving dragons and chocolate coins in iambic pentameter:

In lands where sugar mountains touch the skies,
A dragon with a sweet tooth grandly lies.
It dreams of streams where chocolate coins do flow,
And nibbles on the nuggets, row by row.

Its scales shimmer with candied, glossy sheen,
In hues of caramel and tangerine.
Each day it feasts 'neath candy-flossed trees,
Savoring the sugared breeze with ease.

At night, it counts its hoard of chocolate gold,
In caves of velvet cake, so rich and bold.
This dragon, unlike any fearsome beast,
Finds joy in sweets, on which it loves to feast.

After writing and generating these poems, I then had DALL-E generate images based on these poems, which can be viewed on the Gallery page of this website.

The image results for the first poem were much better than I had expected. While DALL-E did not seem to catch on to the topic of bone cancer, it understood that the poem was describing bones. The first image showed the deer and the mist around the mountain, while the second image generated a deer made out of ferns. I was pleasantly surprised by how close the first image was to my artistic vision.

The image results for the second poem were not at all what I had expected. I had more of an organic-looking brain with wires attached to it in mind, but DALL-E seemed to focus on the synapses and data. This poem was the shortest and lacked clarifying words that would have provided more context about the scene to DALL-E.

The third poem produced the best image results, judging on how accurate the depictions of most things are to the descriptions in the poem. The dragon is consistently orange and brown, eating the chocolate coins, the landscape is made of similar if not the same kind of candy, and there is a stream. But this poem was also the longest, and the most simple in sentence structure and layout. As ChatGPT wrote this poem, what I imagined it to look like was indeed different from DALL-E’s generations.

From this experiment, I concluded that ChatGPT and DALL-E can be very helpful in refining or getting closer to an artistic vision. As for whether it can gauge how close the words actually are to the vision, that will require repeated experiments on a much larger scale with more variables.

My strengths are almost entirely in writing, especially when compared to my abilities with coding or digital art. I have spent the semester learning how to use AI tools to generate many different things with text prompts, from images, to code, to poems and short stories. I have seen firsthand, as a novice in both coding and digital art, how these AI tools allow people who don’t have medium-specific skills and tools to also create works, democratizing the creative process. But now, AI is no longer just tools. The development of ChatGPT allows users to communicate back and forth with the AI to synthesize ideas and continuously build upon them, effectively making it a creative collaborative partner, as we have learned through Mark Amerika’s work with AI. This role is elaborated on further in "Artificial Intelligence and the Arts: Toward Computational Creativity," where Ramón López de Mántaras states that “Computational creativity is the study of building software that exhibits behavior that would be deemed creative in humans…However, computational creativity studies also enable us to understand human creativity and to produce programs for creative people to use, where the software acts as a creative collaborator rather than a mere tool.” Simply put, AI has developed into an entity that can be creative in many ways that are on par with humans.

With any new and developing technology, there will be ethical concerns that must be addressed. Many artists have had their work taken and used without their consent as part of the data that is fed to Generative AI models to train them. The concept of a copyright has already once evolved with the development of the internet, and it will inevitably have to evolve again as AI technology continues to develop and spread.

This enables people to generate whatever they want from AI and sell it as their own handmade work, with no obligation to credit the artists and their work that fed the AI model. Understandably, this has angered many artists, and they are calling for regulations to protect their work from being used by AI without their consent. A recent update in this issue is the development of programs such as Nightshade and Glaze, developed by the University of Chicago, which can “pollute" art and make it difficult for AI models to use them. Nightshade scrambles metadata of digital art so that what is actually in the art is different from what it is tagged as in the metadata. Glaze goes a step further and changes pixels in the art itself to make it more difficult for AI models to mimic the artist's style, while the artwork itself still looks the same to the human eye.

The potential misuse of generative AI poses significant risks, including the creation of deepfakes, which can spread misinformation and cause public harm. The ease with which convincing fake content can be produced makes it a powerful tool for malicious activities. It is necessary to invest in the development of detection technologies that can identify AI-generated content to protect individuals as well as corporations. In addition to this, legal frameworks must also be established to deter misuse and provide recourse for victims of AI-related malpractices. Generative AI holds incredible potential to revolutionize various industries by enhancing creativity and efficiency. However, this technology must be developed and used with a strong ethical framework to prevent harm and ensure it benefits society as a whole.

Sources cited:

López de Mántaras, Ramón. "Artificial Intelligence and the Arts: Toward Computational Creativity." In The Next Step. Exponential Life. Madrid: BBVA, 2016.