Generative Art Using Real-World Data
By Hailey Gebhart
April 25, 2024

Generative art, a fascinating domain within the art world, employs algorithms and data to create art that is not entirely directed by the human hand but rather generated through autonomous systems. This art form often leverages randomness and algorithmic processes to produce unique, abstract, visuals that are meant to showcase the artistry of natural and random processes. Generative art begs us to ask questions about forces beyond our understanding by visualizing the random in such ways that the outcomes sometimes seem not-so-random. This showcases the beauty of our natural world in a deep and abstract way.

Unlike traditional generative art, which often relies purely on algorithmic randomness to produce unpredictability and abstraction, my goal is to integrate real-world data sets to explore how "random noise" and inherent patterns in data can influence artistic outputs. By using data as the main driver of the art, the project challenges the conventional randomness in generative art, introducing a structured, yet unpredictable, element through natural data patterns that can morph and grow as the data changes.

We as humans need visualization for data in order to wrap out heads around it. Just looking at numbers is not enough, we need to see it in sometimes abstract ways in order to understand it. This is one reason why math and art are not enemies, but close allies. However, I want to make it clear that the purpose of this project is NOT to perform any sort of exploratory analysis or visualize patterns in the data in a useful way. The primary goal of this art piece is to create a visually interesting piece of art like any other work of generative art. My whole point is that generative art has a wonderful place in the way it makes observations about the world around us. I want to show that real collected data can have the same effect as the randomness at a macro level. However, I am curious to explore in which the ways a structured dataset with ingrained patterns makes the process different. In addition, I will be exploring ways in which machine learning techniques and other algorithmic approaches to data transformation can impact the end result of the piece.

I selected the "Individual household power consumption" data set from the University of California, Irvine Machine Learning Repository. This choice was strategic to ensure that the visualization would reveal complex patterns that could translate into captivating visual narratives. It will be good for this task for two reasons. The first is that the dataset has mostly numerical, continuous columns. For the purposes of this project, I want to stay away from categorical data to keep as much variation in the output as possible. Secondly, this data is tagged as a data set suitable for clustering.

Clustering is a machine learning method used in data analysis to group a set of objects in such a way that objects in the same group (or cluster) are more similar to each other than to those in other groups. It is often used in exploratory data analysis to find natural groupings within data or to determine the underlying patterns. For this project, clustering is relevant because data that has distinct clusters is more likely to have this reflected in the final art piece. I want to see if data with clusters or patterns in this way will have these patterns show up in the visualization somehow, even though exploratory data analysis is not the purpose of this project.

Here's how I prepared the data using Python:


import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# Load the data with specified types, parsing dates, and handling missing values
data = pd.read_csv(
    'data/household_power_consumption.txt',
    delimiter=';',
    parse_dates=[['Date', 'Time']],
    infer_datetime_format=True,
    na_values='?',
    usecols=cols_to_use,
    dtype=dtype_dict
)



# Drop rows with any missing values
data.dropna(inplace=True)



# Extract hour and minute from Date_Time to use as x and y coordinates
data['hour'] = data['Date_Time'].dt.hour
data['minute'] = data['Date_Time'].dt.minute
data['x'] = data['hour'] * 60 + data['minute']  # Total minutes in the day as x
data['y'] = np.random.rand(len(data)) * 600  # Random spread on the y-axis

# Setup the plot
fig, ax = plt.subplots(figsize=(10, 8))
ax.set_facecolor('black')



# Colors
colors = ['#6a0dad', '#ff6347', '#3cb371', '#00bfff', '#ff1493']

# Vectorized calculations
data['size'] = np.interp(data['Global_active_power'], [0, 10], [10, 100])
data['color_index'] = np.interp(data['Voltage'], [230, 240], [0, 4]).astype(int)
data['alpha'] = np.interp(data['Global_intensity'], [0, 30], [0.2, 1])
data['total_sub_metering'] = data['Sub_metering_1'] + data['Sub_metering_2'] \
+ data['Sub_metering_3']
data['edge_width'] = np.interp(data['total_sub_metering'], [0, 50], [1, 10])

# Plotting all points at once
sc = ax.scatter(
    data['x'],
    data['y'],
    s=data['size'],
    c=[colors[i] for i in data['color_index']],
    alpha=data['alpha'],
    edgecolors='white',
    linewidth=data['edge_width']
)



ax.axis('off')
plt.show()

For this piece, the x-axis of the image represents the hour from midnight in the dataset, denoted by the respective time column. The placement of each data point on the y-axis of the image is determined at random. The parameters of each data point are controlled by numbers from the other columns. This includes size, color, opacity, and the edge width. I am not going to pretend to know what the specific columns mean, as it does not matter for the purposes of this project. However, you can read more about it in the documentation for the data set, which is cited at the bottom of this page.

You will see in the final result that the art experiment was a success. Even though finding explicit patterns to utilize from a data analytic perspective was not the goal, the patterns of the data still yield an interesting outcome that has the potential to tell a story.

However, this method did not use a lot of AI in of itself, ChatGPT helped me format the code in order to create this work of generative art, but there is not a lot of AI in the program itself. I decided to try some modifications in order to increase the amount of actual AI used in this project.

First, I performed Principal Component Analysis (PCA) on the data, which is an actual machine learning algorithm. Principal Component Analysis is a form of dimensionality reduction. Dimensionality reduction is useful in data sets with lots of variables. Principal Component Analysis is actually often used as a way to allow for data analysts to visualize complex data by allowing data with a lot of variables to be plotted on two dimensions (or by two variables). It does this by using machine learning to break the data down to its most important parts. Unfortunately for this project, this produced the least interesting and most organized result, which does not surprise me, as that is the entire purpose of Principal Component Analysis in the first place.


import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn.decomposition import PCA



# Specify the data types to improve load time and memory efficiency
dtype_dict = {
    'Global_active_power': float,
    'Global_reactive_power': float,
    'Voltage': float,
    'Global_intensity': float,
    'Sub_metering_1': float,
    'Sub_metering_2': float,
    'Sub_metering_3': float
}

# Columns to use
cols_to_use = ['Date', 'Time', 'Global_active_power', \
'Global_reactive_power', 'Voltage', 'Global_intensity', 'Sub_metering_1', \
'Sub_metering_2', 'Sub_metering_3']

data = pd.read_csv('data/household_power_consumption.txt', \
delimiter=';', parse_dates=[['Date', 'Time']],
                   infer_datetime_format=True, na_values='?', \
                   usecols=cols_to_use, dtype=dtype_dict)
data.dropna(inplace=True)


# PCA for dimensionality reduction
pca = PCA(n_components=2)
data_pca = pca.fit_transform(data[['Global_active_power', 'Voltage', 'Global_intensity']])
data['x'] = data_pca[:, 0]  # first principal component
data['y'] = data_pca[:, 1]  # second principal component



# Setup the plot
fig, ax = plt.subplots(figsize=(10, 8))
ax.set_facecolor('black')

# Using more creative color mapping and sizes
data['size'] = np.interp(data['Global_active_power'], [0, 10], [10, 100])
data['color'] = np.interp(data['Voltage'], [230, 240], [0, 4]).astype(int)

# Plotting with enhanced aesthetics
sc = ax.scatter(data['x'], data['y'], s=data['size'], c=data['color'], cmap='viridis', alpha=0.7)
ax.axis('off')
plt.show()

The next experiment I performed was doing nonlinear transformations on the data. Nonlinear transformations are mathematical operations applied to data that convert values in a way that does not follow a straight line. Unlike linear transformations, which preserve the operations of addition and scalar multiplication, nonlinear transformations can alter the structure and relationships within the data in more complex ways. These transformations can help expose hidden patterns or structures that are not apparent under linear assumptions, or they can simply serve to adjust the scale or distribution of the data to better meet the assumptions of certain analytical methods or algorithms. This also produced a very interesting piece of art.


import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn.decomposition import PCA


# Specify the data types to improve load time and memory efficiency
dtype_dict = {
'Global_active_power': float,
'Global_reactive_power': float,
'Voltage': float,
'Global_intensity': float,
'Sub_metering_1': float,
'Sub_metering_2': float,
'Sub_metering_3': float
}

# Columns to use
cols_to_use = ['Date', 'Time', 'Global_active_power', 'Global_reactive_power',\
 'Voltage', 'Global_intensity', 'Sub_metering_1', 'Sub_metering_2', 'Sub_metering_3']

data = pd.read_csv('data/household_power_consumption.txt', delimiter=';',\
 parse_dates=[['Date', 'Time']],
infer_datetime_format=True, na_values='?', usecols=cols_to_use, dtype=dtype_dict)
data.dropna(inplace=True)


# Extract hour and minute from Date_Time to use as x and y coordinates
data['hour'] = data['Date_Time'].dt.hour
data['minute'] = data['Date_Time'].dt.minute
data['x'] = data['hour'] * 60 + data['minute'] # Total minutes in the day as x
data['y'] = np.random.rand(len(data)) * 600 # Random spread on the y-axis

# Setup the plot
fig, ax = plt.subplots(figsize=(10, 8))
ax.set_facecolor('black')



# Using more creative color mapping and sizes
data['size'] = np.exp(data['Global_active_power']) # Exponential scaling for size
data['color'] = np.log1p(data['Voltage']) # Logarithmic scaling for color values



# Plotting with enhanced aesthetics
sc = ax.scatter(data['x'], data['y'], s=data['size'], c=data['color'], \
cmap='viridis', alpha=0.7)
ax.axis('off')
plt.show()

In conclusion, I showcased that interesting results can be used using data analysis techniques and data visualization to produce generative pieces of art. Instead of the gravity of the pendulum swing, the outcome of the data and residing noise of the data is what gave me my art. I want to stress that I most likely only scratched the surface of what can be done with this type of generative art. There is so much more exploration and experimentation and fun to be had when this type of data science and art intermingle.

This exploration highlights the dynamic interface between data science and art, suggesting a multitude of pathways for future artistic endeavors. It underscores the potential of data, when viewed through a creative lens, to enrich our aesthetic experience and enhance our understanding of complex data through visual representation. I hope this opens the door to explore a broader array of data sets and complex algorithms to further challenge the boundaries between data visualization and generative art, continually pushing towards new artistic advancements.

Hebrail,Georges and Berard,Alice. (2012). Individual Household Electric Power Consumption. UCI Machine Learning Repository. https://doi.org/10.24432/C58K54.

A majority of the code used in this project developed with ChatGPT

Generative Art Using Real-World DataBy Hailey GebhartApril 25, 2024

Generative Art Using Real-World Data
By Hailey Gebhart
April 25, 2024