What is Latent Space, for artists?

Aslan French
8 min readMar 27, 2023

I’m a traditionally trained studio artist and design technologist who has been experimenting with neural networks as an artistic medium since 2015. I’ve written about my process in the past and often with a direct focus on explaining technical subjects to artists. I am just a layman, speaking from a self taught and simplified perspective, but I figured this would be a good mini-essay topic. Buckle up.

If you’ve been following the discussion around AI, neural networks, and/or machine learning you have maybe heard of “latent space”.

What is latent space?

Let’s start with a space you’re probably familiar with: physical space

Let’s think about a city grid:

Image created using Midjourney and images from this article as prompts: https://www.archdaily.com/949094/orthogonal-grids-and-their-variations-in-17-cities-viewed-from-above

I could tell you how to get to the library by saying “go two blocks east, and 3 blocks south”.

A mathey way to say this would be with a linear equation.

This can be written a couple of ways depending on how you balance the equation:

Visualization created using: https://www.mathpapa.com/algebra-calculator.html

You probably already know all this, but it bears repeating. XY math wasn’t always in our vocabulary. Legend has it, the philosopher and mathematician Descartes thought up the idea while watching a fly buzz around the corner of his home. That’s why this system is called the Cartesian coordinate system. Not all cities were built on grids. Our language and the tools we have to talk about these things have changed over time.

You know physical space, and you understand XY space, but what does this have to do with latent space?

I’m getting there but let’s up the complexity a little.

Two dimensions is easy. Why not 3? XYZ space

A linear equation for this might look something like: 3x+2y+4z=0

You can see here that the equation 3x+2y+0z=0 describes a plane that from above (reducing things down to just 2 dimensions) looks like our 2D line above, and when given a positive constant for z, we get a plane that intersects with the other plane but has additional Z depth that can only be described with 3 variables.

Okay so we have XY or XYZ space that we can use to describe physical space. Any additional dimensions gets a little weird because our brains are oriented towards existing in 3 dimensional space.

Here’s something neat though, and maybe more familiar to artists.

You have 3 variables describing a space right, but XYZ isn’t the only thing you could describe as a mathematical space. You could also talk about RGB space.

https://en.wikipedia.org/wiki/RGB_color_spaces#/media/File:RGB_Cube_Show_lowgamma_cutout_b.png

All the colors on your screen are described using a mixture of Red, Green, and Blue. Standard encoding has a maximum value of 255.

Or some mixture:

You can play around with this here: https://www.w3schools.com/css/css_colors_rgb.asp

Brilliant sculpture by Tauba Auerbach demonstrating this in physical form: https://taubaauerbach.com/view.php?id=286&alt=699

RGB space is a space just like XYZ, and you can use the same kind of math to manipulate it. That’s how a gradient works! Gradients are translations from one color to another in RGB space. Different algorithms can be used to produce different kinds of gradients. This is a much discussed issue amongst UI designers since using a simple linear algorithm will produce gray dead zones between complementary colors.

https://www.learnui.design/tools/gradient-generator.html

RGB cube is getting squashed into a circle for visualization purposes. Game designer Bennett Foddy created a tool to improve gradients in Photoshop, which does a similar thing: https://www.foddy.net/2010/10/gentle-gradient/

Okay so what’s my point here? XYZ and RGB are just different dimensions. You could have lots of other kinds of dimensions. For instance normal maps in video games (also called bump maps) are used to encode 3D angular information in RGB: https://en.wikipedia.org/wiki/Normal_mapping

Source: https://commons.wikimedia.org/wiki/File:Normal_map_example_with_scene_and_result.png

In this case the angle of the normal is mapped to RGB. This allows for a video game to encode much higher resolution surface texture than you would be easily computable if you were to have the full polygon count. A “flat” image containing 3 dimensional information! Woah!

Old 3D model I made of Bowser where I created detailed skin texture using normal maps

https://sketchfab.com/3d-models/bowser-heavyweight-champ-7092e8b3c8264aecb355670c1ec9ea5c

As you can see there are not enough polygons to describe the microtexture of the skin, but the normal map can help fake it enough for what a video game needs and for much less computational cost.

Right so what’s latent space again???

If you can have 2 dimensions you can have 3 dimensions. In fact you could have 4, or 5, or 6 and so on.

We can easily visualize it in 2D and 3D but it’s hard to go beyond that. You could create a 3D graph that maps points to XYZ space and then tack on some additional dimensions using things like color:

https://www.mathworks.com/help/matlab/visualize/visualizing-four-dimensional-data.html

The four dimensions being latitude, longitude, % of rural population, and fatalities per 100m vehicle-miles.

Our brain and visual system just doesn’t grok higher dimensions intuitively, so this is a bit of a hack. In machine learning more advanced techniques are used to reduce dimensionality and visualize things in clusters.

Maybe I could have skipped all this and just linked this video Google put out in 2016: https://www.youtube.com/watch?v=wvsE8jm1GzE

In mathematics, this stuff gets generalized as a “tensor” which can be any arbitrary number of dimensions. The same kind of math you do in XYZ or RGB space can be done on these higher dimensional spaces. Instead of having Redness, Greenness, and Blueness, you can measure hundreds of spectrums like “gender”, “age”, or “happiness”.

In fact the online tool Art Breeder lets you do exactly that:

A sampling along the age spectrum. And now manipulating the gender spectrum:

https://www.artbreeder.com/create/portraits

Art Breeder gives you many “genes” to manipulate an image. Each of these genes is its own dimension. Gender, age, race, different clothing, whatever. Generative neural networks like Stable Diffusion or Midjourney have been trained using images tagged with metadata. Trained neural networks are a mathematical space that results from these many dimensions. A picture gets tagged by gender, age, but other things like if it has a cat in it, or is in pointillist art style.

Here’s a visualization of this stuff I made last year using the v1 model of Midjourney.

The text prompt “genie” the anglicized translation of the arabic word jinn, جن :

Now an image created with the romanized prompt “djinn” :

Now let’s do some math. Midjourney allows you to use negative prompts. The idea for this series was inspired by the “Subreddit Algebra” project of 538: https://fivethirtyeight.com/features/dissecting-trumps-most-rabid-online-following/

What is a genie minus the djinn? What do we get when we invoke genieness but ask the neural network to subtract djinness from it?

And a djinn minus the genie?

You can see with the original djinn and genie images a lot of similarities but also some subtle differences. Genie is a lighter blue, rounder, more cartooney, while djinn seems more adult in it’s proportions and the blue is a darker shade generally speaking. When we subtract djinness from genie we get this distinction heightened. Genie is a term we associate these days with specific Disney cartoons, while the romanized djinn gets more associated with folkloric depicitions. You subtract the family friendly genie from the djinn and you get something foreboding, exotic, and in more in keeping with the mythological reputation of djinn as dangerous and potentially evil spirits. There are also potential racialized or gendered interpretations of these distinctions as well.

So this is latent space???

Okay latent space has a precise and rigorous technical definition but for artists, latent space can be thought of as basically “semantic/meaning space”. Or if you like “vibe space”. It’s using math to manipulate many hundreds and thousands of vibe spectrums. It’s a way to refer to a general mathematical space defined by a bunch of dimensions.

BONUS:

I also wasn’t aware of the djinn/genie being around romanization versus anglicization until talking with someone while exhibiting these images. So I tried the original arabic “جن”, the transliteration “jinn”, the anglicization, “genie”, and the romanization “djinn”. These were all done with Midjourney v 5 (the latest version).

If I redo the djinn minus genie with model v 5 you get the rather bizarre:

I am guessing that the “Genie” is some kind of kitchen appliance that has found its way into the dataset since my original experiment. I originally assumed the more feminine and apparently blonde result of genie minus djinn was due to the 60s TV show “I dream of Jeannie” but I think it may just be an effect of it being associated with disney princesses, as the MJ v 5 model seems to indicate.

--

--

Aslan French

Design technologist. Civic hacker. I talk too much. Sometimes I write it down. Sometimes I publish it here.