NOTE: Make sure you also see this post that has a summary of all the Text-to-Image scripts supported by Visions of Chaos with example images.
Input a short phrase or sentence into a neural network and see what image it creates.
Phil used the code/models from Ryan Murdock (@advadnoun). Ryan has a blog post explaining the basics of how all the parts connect up here. Ryan has some newer Text-to-Image experiments but they are behind a Patreon paywall, so I have not played with them. Hopefully he (or anyone) releases the colabs publicly sometime in the future. I don’t want to experiment with a Text-to-Image system that I cannot share with everyone, otherwise it is just a tease.
The most simple explanation is that BigGAN generates images that try to satisfy CLIP which rates how closely the image matches the input phrase. BigGAN creates an image and CLIP looks at it and says “sorry, that does not look like a cat to me, try again”. As each repeated iteration is performed BigGAN gets better at generating an image that matches the desired phrase text.
Big Sleep Examples
Big Sleep uses a seed number which means you can have thousands/millions of different outputs for the same input phrase. Note there is an issue with the seed not always being able to create the same images though. From my testing, even with the torch_deterministic flag set to True and setting the CUDA envirnmental variable does not help. Every time Big Sleep is called it will generate a different image with the same seed. That means you will never be able to reproduce the same output in the future.
These images are 512×512 pixels square (the largest resolution Big Sleep supports) and took 4 minutes each to generate on an RTX 3090 GPU. The same code takes 6 minutes 45 seconds per image on an older 2080 Super GPU.
Also note that these are the “cherry picked” best results. Big Sleep is not going to create awesome art every time. For these examples or when experimenting with new phrases I usually run a batch of multiple images and then manually select the best 4 or 8 to show off (4 or 8 because that satisfies one or two tweets).
To start, these next four images were created from the prompt phrase “Gandalf and the Balrog”
Here are results from “disturbing flesh”. These are like early David Cronenberg nightmare visuals.
A suggestion from @MatthewKafker on Twitter “spatially ambiguous water lillies painting”
After experimenting with acrylic pour painting in the past I wanted to see what BigSleep could generate from “acrylic pour painting”
I have always enjoyed David Lynch movies so let’s see what “david lynch visuals” results in. This one got a lot of surprises and worked great. These images really capture the feeling of a Lynchian cinematic look. A lot of these came out fairly dark so I have tweaked exposure in GIMP.
More from “david lynch visuals” but these are more portraits. The famous hair comes through.
I have now added a simple GUI front end for Big Sleep into Visions of Chaos, so once you have installed all the pre-requisites you can run these models on any prompt phrase you feed into them. The following images shows Big Sleep in the process of generating an image for the prompt text “cyberpunk aesthetic”.
After spending a lot of time experimenting with Big Sleep over the last few days, I highly encourage anyone with a decent GPU to try these. The results are truly fascinating. This page says at least a 2070 8GB or greater is required, but Martin in the comments managed to generate a 128×128 image on a 1060 6GB GPU after 26 (!!) minutes.