From Dataset to Output: A LoRA Training Pipeline
By Luke Auretto-Piper
Frame 1 / 6
Frame image: intro

Introduction

From the start of this semester, I have been very interested in developing a deeper understanding of how AI is trained and calibrated to produce the wide variety of content and services that are becoming common in our daily lives. So when our last group project came around, we set out to explore the concepts of Data Ethics in image generation. I felt like it was an ideal time to delve into the open source world of AI and take a crack at building my own Image generator to better understand the process.


Using the open source version of Stable Diffusion, I troubleshot my way through python and cmd prompts to link it with another open source program that specialized in training image datasets called Kohya. By experimenting with different custom trained datasets and variables injected into the base model of Stable Diffusion, I was not only able to manipulate outcomes to explore concepts of Data Ethics such as dataset influence, generative behavior, and prompt weighting, but also get a glimpse into the inner workings of many of our other course concepts.


I hope you enjoy my gallery experiments!

How it works

AI image generation models work by being trained on very large datasets of images. These images are assigned metadata tags or "captions" used to identify key elements inside the composition. These tags roughly allow the model to decide how much influence certain training images from the dataset have on the influence of the generation. This is called its weight. By training a small niche dataset called a LoRA (Low rank adaptation) and injecting it into a much larger dataset, it becomes possible to drastically change the output in unique and specialized ways.


Through the course of our group project and my continued work for the final, I developed two well trained LoRA's. One based on the copyright free work of J.M.W. Turner, the famous British painter known for impressive skylines and seascapes, the other, on the copyrighted work of Android Jones, a contemporary digital artist.


These became essential in my further research into the intricacies of creating an image using AI and a great way to test the ethical considerations surrounding copyright material.

Testing LoRA Quality with neutral prompts

My first experiment after establishing a running generator, was a test to determine how well the LoRA had been trained. By using a neutral text based prompt and combining it with a command to use the LoRA trained on Android Jones at different LoRA level weights. This was accomplished using a command in stable diffusion that allowed for adjustment of LoRA weight between generations. (lora:AndroidXL:1 - 1 being the weight in this example.)


By generating sets of images at different LoRA weights starting at 0, observable alterations to the normal Stable Diffusion model could be easily replicated. This turned into a very important test, as it demonstrated that the LoRA model was strong enough to influence the base model, even if not directly prompted through keywords in the text.


Synopsis
A small open source dataset trained on copyright material can drastically influence an AI model without imitating the material closely enough to infringe copyright. This potentially exposes a problem with the lack of transparency in larger models. Most major models outside of open source ones do not disclose their dataset origins, training methods, or weightings.

Testing improbable LoRA influence conditions.

Once establishing that the AndroidXL LoRA was trained sufficiently enough to influence the base model, I crafted a series of difficult prompts in attempt to find situations where the LoRA would have little or no influence. After a bit of research and back and forth with ChatGPT I settled on doing a large run of generations on black and white sketches of humans at varying weights. This was chosen because I had noticed that a large part of the LoRAs learning relied on color to trigger its influence. A human was chosen as most of the Android Jones dataset was comprised of dragons and other creatures with very few human shapes.


These results ended up being foundational for future generations as I was sure that this LoRA was going to provide visible influence even at low weights.


Synopsis
While this does further considerations into the data ethics of non-transparency in LLMs, it also is an exciting example of a way to put at least some agency back in human hands in an AI driven environment. Curating your own datasets and adjusting values manually allows you to take ownership of a larger percentage of the creativity in generations.

Prompting towards LoRA influence

After confirming that the LoRA would influence prompts that were not directly related to its training dataset, I wanted to see if the LoRA could replicate imagery from the dataset. In order to conduct this experiment, I worked with Chatgpt to develop unique prompts designed to produce imagery reminiscient of the artwork by Android Jones used in the dataset.


Since the most common figure in the dataset images was a dragon, I decided to use this as the focal point of the experiment and began to remove words from the text prompt to isolate Stable Diffusions base definition of a dragon. Next I reloaded that prompt with enough keywords from the AndroidXL Lora to heavily influence the generation and began running generations at different weights and other values.


Synopsis
Surprising results were achieved by heavily weighting the base model in favor of the LoRA. The dragons exploded into psychedelic and fractal patterns, rarely ever achieving the same shape or design as the base model. This revealed an intriguing example of one of our other class topics, human-machine entanglement, and gives insight into what workflow in many creative workspaces might look like in the future.


Ethics are also again considered. Many of these high weightings copied enough of Android Jones style to produce very interesting and unique results, but this increases the risk of saturating the already saturated market of creativity. Even if AI can produce unique results, they are based on the ideas of real human creatives from the past and present, whose work may become less valuable in a world of instant mass generation.

Frame image: culture

Closing Thoughts

Its no secret that AI is rapidly changing the world around us. As it becomes more entangled in our daily routine and augments our future, I feel it is important for any tech minded person to develop an understanding of how its various forms operate. This being my first foray into the inner workings of a system such as this, I feel like I finished this project with more questions then answers. Lack of transparency and widely available educational material makes it a difficult subject to understand. It is my hope that as the technology grows, more information will become widely available.


At the least, I can gather from my experiments that with open source tools and a novice level of experience, a model can be drastically changed. So I can only imagine the capabilities of a team of people working with extremely high end and specialized equipment. If the internet is being scraped to teach larger models, then checks and balances need to be approved at a national level to insure that work by artists like Android Jones are not being used to subtly influencing datasets for corporate profit.