Text-to-Image Machine Learning

Text-to-Image

Input a short phrase or sentence into a neural network and see what image it creates.

I am using DeepDaze and BigSleep from Phil Wang (@lucidrains).

Phil used the code/models from Ryan Murdock (@advadnoun). Ryan has a blog post explaining the basics of how all the parts connect up here.

The most simple explanation is that BigGAN generates images that try to satisfy CLIP which rates how closely the image matches the input phrase. BigGAN creates an image and CLIP looks at it and says “sorry, that does not look like a cat to me, try again”. As each repeated iteration is performed BigGAN gets better at generating an image that matches the desired phrase text.

BigSleep Examples

BigSleep seems to generate images clearer and quicker than DeepDaze does so I have concentrated on BigSleep.

BigSleep uses a seed number which means you can have thousands/millions of different outputs for the same input phrase. Note there is an issue with the seed not always being able to create the same images though. From my testing, even with the torch_deterministic flag set to True and setting the CUDA envirnmental variable does not help. Every time BigSleep is called it will generate a different image with the same seed. That means you will never be able to reproduce the same output in the future.

These images are 512×512 pixels square (the largest resolution BigSleep supports) and took 4 minutes each to generate on an RTX 3090 GPU. The same code takes 6 minutes 45 seconds per image on an older 2080 Super GPU.

Also note that these are the “cherry picked” best results. BigSleep is not going to create awesome art every time. For these examples or when experimenting with new phrases I usually run a batch of multiple images and then manually select the best 4 or 8 to show off (4 or 8 because that satisfies one or two tweets).

To start, these next four images were created from the prompt phrase “Gandalf and the Balrog”

BigSleep - Gandalf and the Balrog

BigSleep - Gandalf and the Balrog

BigSleep - Gandalf and the Balrog

BigSleep - Gandalf and the Balrog

Here are results from “disturbing flesh”. These are like early David Cronenberg nightmare visuals.

BigSleep - Disturbing Flesh

BigSleep - Disturbing Flesh

BigSleep - Disturbing Flesh

BigSleep - Disturbing Flesh

A suggestion from @MatthewKafker on Twitter “spatially ambiguous water lillies painting”

BigSleep - Spatially Ambiguous Water Lillies Painting

BigSleep - Spatially Ambiguous Water Lillies Painting

BigSleep - Spatially Ambiguous Water Lillies Painting

BigSleep - Spatially Ambiguous Water Lillies Painting

BigSleep - Spatially Ambiguous Water Lillies Painting

BigSleep - Spatially Ambiguous Water Lillies Painting

BigSleep - Spatially Ambiguous Water Lillies Painting

BigSleep - Spatially Ambiguous Water Lillies Painting

“stormy seascape”

BigSleep - Stormy Seascape

BigSleep - Stormy Seascape

BigSleep - Stormy Seascape

BigSleep - Stormy Seascape

After experimenting with acrylic pour painting in the past I wanted to see what BigSleep could generate from “acrylic pour painting”

BigSleep - Acrylic Pour Painting

BigSleep - Acrylic Pour Painting

BigSleep - Acrylic Pour Painting

BigSleep - Acrylic Pour Painting

“beautiful sunset”

BigSleep - Beautiful Sunset

BigSleep - Beautiful Sunset

BigSleep - Beautiful Sunset

BigSleep - Beautiful Sunset

I have always enjoyed David Lynch movies so let’s see what “david lynch visuals” results in. This one got a lot of surprises and worked great. These images really capture the feeling of a Lynchian cinematic look. A lot of these came out fairly dark so I have tweaked exposure in GIMP.

BigSleep - David Lynch Visuals

BigSleep - David Lynch Visuals

BigSleep - David Lynch Visuals

BigSleep - David Lynch Visuals

BigSleep - David Lynch Visuals

BigSleep - David Lynch Visuals

BigSleep - David Lynch Visuals

BigSleep - David Lynch Visuals

More from “david lynch visuals” but these are more portraits. The famous hair comes through.

BigSleep - David Lynch Visuals

BigSleep - David Lynch Visuals

BigSleep - David Lynch Visuals

BigSleep - David Lynch Visuals

“H.R.Giger”

BigSleep - H.R.Giger

BigSleep - H.R.Giger

BigSleep - H.R.Giger

BigSleep - H.R.Giger

BigSleep - H.R.Giger

BigSleep - H.R.Giger

BigSleep - H.R.Giger

BigSleep - H.R.Giger

“metropolis”

BigSleep - Metropolis

BigSleep - Metropolis

BigSleep - Metropolis

BigSleep - Metropolis

“surrealism”

BigSleep - Surrealsim

BigSleep - Surrealsim

BigSleep - Surrealsim

BigSleep - Surrealsim

Availability

I have now added a simple GUI front end for DeepDaze and BigSleep into Visions of Chaos, so once you have installed all the pre-requisites you can run these models on any prompt phrase you feed into them. The following images shows BigSleep in the process of generating an image for the prompt text “cyberpunk aesthetic”.

Text-To-Image GUI

After spending a lot of time experimenting with BigSleep over the last few days, I highly encourage anyone with a decent GPU to try these. The results are truly fascinating. This page says at least a 2070 8GB or greater is required, but Martin in the comments managed to generate 128×128 images on a 1060 6GB GPU after 26 (!!) minutes per image.

Jason.

8 responses to “Text-to-Image Machine Learning

  1. hello
    dear professor, your work is so great and beautiful, thank you very much. I want to know how to export the 3D image as bmp format pictures in the form of slices  along a direction (ex z axis)
    thanks very much.
    Warm regards
    jmk2021

  2. Thank you. These are absolutely fascinating! Now if only I could put my hands on a decent GPU (RTX 3080 or +).

  3. In the “2070 8GB or greater required” link, one post said:

    I’ve been able to get VRAM usage down near 6GB and even 4GB by lowering image_size and num_cutouts parameters.
    –num-cutouts=16 and –image-size=128 should work on a 4GB card, but I haven’t tested yet.

    Is this something you can do with your implementation?

    • I do provide the option for image size so you can try 128×128. If that fails I could add a “low memory” checkbox that adds the num cutouts option.

  4. Finally got it all working. I can confirm that a 10×0 class GPU card with 6GB is able to create pictures at the 128×128 size. My particular 1060 card take about 26 mins per image. They are small but perfectly formed.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s