Input a short phrase or sentence into a neural network and see what image it creates.
The most simple explanation is that BigGAN generates images that try to satisfy CLIP which rates how closely the image matches the input phrase. BigGAN creates an image and CLIP looks at it and says “sorry, that does not look like a cat to me, try again”. As each repeated iteration is performed BigGAN gets better at generating an image that matches the desired phrase text.
BigSleep seems to generate images clearer and quicker than DeepDaze does so I have concentrated on BigSleep.
BigSleep uses a seed number which means you can have thousands/millions of different outputs for the same input phrase. Note there is an issue with the seed not always being able to create the same images though. From my testing, even with the torch_deterministic flag set to True and setting the CUDA envirnmental variable does not help. Every time BigSleep is called it will generate a different image with the same seed. That means you will never be able to reproduce the same output in the future.
These images are 512×512 pixels square (the largest resolution BigSleep supports) and took 4 minutes each to generate on an RTX 3090 GPU. The same code takes 6 minutes 45 seconds per image on an older 2080 Super GPU.
Also note that these are the “cherry picked” best results. BigSleep is not going to create awesome art every time. For these examples or when experimenting with new phrases I usually run a batch of multiple images and then manually select the best 4 or 8 to show off (4 or 8 because that satisfies one or two tweets).
To start, these next four images were created from the prompt phrase “Gandalf and the Balrog”
Here are results from “disturbing flesh”. These are like early David Cronenberg nightmare visuals.
A suggestion from @MatthewKafker on Twitter “spatially ambiguous water lillies painting”
After experimenting with acrylic pour painting in the past I wanted to see what BigSleep could generate from “acrylic pour painting”
I have always enjoyed David Lynch movies so let’s see what “david lynch visuals” results in. This one got a lot of surprises and worked great. These images really capture the feeling of a Lynchian cinematic look. A lot of these came out fairly dark so I have tweaked exposure in GIMP.
More from “david lynch visuals” but these are more portraits. The famous hair comes through.
I have now added a simple GUI front end for DeepDaze and BigSleep into Visions of Chaos, so once you have installed all the pre-requisites you can run these models on any prompt phrase you feed into them. The following images shows BigSleep in the process of generating an image for the prompt text “cyberpunk aesthetic”.
After spending a lot of time experimenting with BigSleep over the last few days, I highly encourage anyone with a decent GPU to try these. The results are truly fascinating. This page says at least a 2070 8GB or greater is required, but Martin in the comments managed to generate 128×128 images on a 1060 6GB GPU after 26 (!!) minutes per image.