Text-to-Image Summary

What Are Text-to-Image Systems

Text-to-Image systems/models/scripts/networks (?) are machine learning based models that take a descriptive phrase as input and attempt to generate images that match the input phrase. No other training required by the end user.

GUI Interface For Text-to-Image Scripts

Text-to-Image GUI

As long as you have a decent GPU and the prerequisites installed you will be able to run these locally from Visions of Chaos on your PC without needing to use colab.

Text-to-Image Scripts Implemented In Visions of Chaos

These are the 15 (so far) Text-to-Image colabs and githubs that I have been able to edit to work under Python 1.8 and Visions of Chaos.

If you are the author of one of these scripts then many thanks to you for sharing the code publicly. If you are a creator of a script I do not include here, please leave a comment with a link or send me an email so I can try it out. If you are a better coder than I am and improve any of these also let me know and I will share your fixes with the world.

I have shown a few sample image outputs for each script. Each of the sample images is generated at the default settings from Visions of Chaos at the maximum supported resolution of 512×512 to give an idea of each of the scripts outputs. These are “cherry picked” best results from a batch of output images. The displayed times are for a single image at 512×512 resolution running on a 3090 RTX GPU.

Name: Deep Daze
Author: Phil Wang
Original script: https://github.com/lucidrains/deep-daze
Time: 2 minutes 3 seconds.
Description: This is the first Text-to-Image script I ever found and tested. It tends to give washed out pastel shaded images (I do include an option to deepen the colors, so untweaked results are even more washed out that these are).

'H R Giger' Deep Daze Text-to-Image
H R Giger

'surrealism' Deep Daze Text-to-Image

'rainforest' Deep Daze Text-to-Image

Name: Big Sleep
Author: Phil Wang
Original script: https://github.com/lucidrains/big-sleep
Time: 4 minutes 0 seconds
Description: I still consider Big Sleep to be the best Text-to-Image system I have tried so far. It always gives a good variety of images for any prompt text and does not suffer from the coloring or tiled image issues some of the other methods do. If I only had one of these to choose from it would be Big Sleep. See here for my older post with a lot of Big Sleep examples.

'H R Giger' Big Sleep Text-to-Image
H R Giger

'surrealism' Big Sleep Text-to-Image

'colorful surrealism' Big Sleep Text-to-Image
colorful surrealism

Author: Frederico Galatolo
Original script: https://github.com/galatolofederico/clip-glass
Time: 4 minutes 51 seconds
Description: This one is different. Difficult to get it to output anything resembling the input. Rather than slowly converge to the final image it seems to jump around a lot, or maybe I converted something wrong. Needs a closer look when I get more time.

'H R Giger' CLIP-GLaSS Text-to-Image
H R Giger

Name: VQGAN+CLIP z-quantize
Author: Katherine Crowson
Original script: https://colab.research.google.com/drive/1L8oL-vLJXVcRzCFbPwOoMkPKJ8-aYdPN
Time: 2 minutes 28 seconds
Description: The outputs tend to be divided up into rectangular regions, but the resulting imagery can be interesting.

'H R Giger' VQGAN+CLIP z-quantize Text-to-Image
H R Giger

'seascape painting' VQGAN+CLIP z-quantize Text-to-Image
seascape painting

'flowing water' VQGAN+CLIP z-quantize Text-to-Image
flowing water

Name: VQGAN+CLIP codebook
Author: Katherine Crowson
Original script: https://colab.research.google.com/drive/15UwYDsnNeldJFHJ9NdgYBYeo6xPmSelP
Time: 3 minutes 19 seconds
Description: VQGAN-CLIP codebook seem to give very similar images for the same prompt phrase, so repeatedly running the script (with different seed values) does not give a wide variety of resulting images. Still gives interesting results.

'H R Giger' VQGAN+CLIP codebook Text-to-Image
H R Giger

'seascape painting' VQGAN+CLIP codebook Text-to-Image
seascape painting

'flowing water' VQGAN+CLIP codebook Text-to-Image
flowing water

Name: Aleph2Image Gamma
Author: Ryan Murdock
Original script: https://colab.research.google.com/drive/1VAO22MNQekkrVq8ey2pCRznz4A0_jY29
Time: 2 minutes 1 second
Description: This one seems to evolve white blotches that grow and take over the entire image. Before the white out stage the images tend to have too much contrast. Previous results from Deep Daze were too washed out, this one is too “contrasty”. If they could both be pushed towards that “sweet spot” they would both look much better.

'H R Giger' Aleph2Image Gamma Text-to-Image
H R Giger

'surrealism' Aleph2Image Gamma Text-to-Image

'seascape painting' Aleph2Image Gamma Text-to-Image
seascape painting

Name: Aleph2Image Delta
Author: Ryan Murdock
Original script: https://colab.research.google.com/drive/1oA1fZP7N1uPBxwbGIvOEXbTsq2ORa9vb
Time: 2 minutes 1 second
Description: A newer revision of Aleph2Image that doesn’t have the white out issues. The resulting images have much more vibrant colors and that may be a good or bad point depending on your preferences.

'H R Giger' Aleph2Image Delta Text-to-Image
H R Giger

'surrealism' Aleph2Image Delta Text-to-Image

'seascape painting' Aleph2Image Delta Text-to-Image
seascape painting

Name: Aleph2Image Delta v2
Author: Ryan Murdock
Original script: https://colab.research.google.com/drive/1NGM9L8qP0gwl5z5GAuB_bd0wTNsxqclG
Time: 3 minutes 42 seconds
Description: A newer revision of Aleph2Image Delta that gives much sharper results. The resulting images tend to be similar to each other for each prompt text so not a lot of variety.

'H R Giger' Aleph2Image Delta v2 Text-to-Image
H R Giger

'surrealism' Aleph2Image Delta v2 Text-to-Image

'seascape painting' Aleph2Image v2 Delta Text-to-Image
seascape painting

Name: Deep Daze Fourier
Author: Vadim Epstein
Original script: https://colab.research.google.com/gist/afiaka87/e018dfa86d8a716662d30c543ce1b78e/text2image-siren.ipynb
Time: 4 minutes 54 seconds
Description: Compared to the original Deep Daze that generated washed out and pastel shaded results this Deep Daze Fourier creates images with sharp, crisp bright colors.

'H R Giger' Deep Daze Fourier Text-to-Image
H R Giger

'Shrek eating pizza' Deep Daze Fourier Text-to-Image
Shrek eating pizza

'surrealist Homer Simpson' Deep Daze Fourier Text-to-Image
surrealist Homer Simpson

Name: Text2Image v2
Author: Denis Malimonov
Original script: https://colab.research.google.com/github/tg-bomze/collection-of-notebooks/blob/master/Text2Image_v2.ipynb
Time: 1 minute 48 seconds
Description: This one is good. Colors and details are sharp. Good variety of output for each input phrase. Definitely worth a try. Gives the original Big Sleep a challenge for the best system shown here.

'H R Giger' Text2Image v2 Text-to-Image
H R Giger

'surrealism' Text2Image v2 Text-to-Image

'seascape painting' Text2Image v2 Text-to-Image
seascape painting

Name: The Big Sleep Customized
Author: NMKD
Original script: https://colab.research.google.com/drive/1Q2DIeMqYm_Sc5mlurnnurMMVqlgXpZNO
Time: 1 minute 45 seconds
Description: Another good one. Worth exploring further.

'H R Giger' The Big Sleep Customized Text-to-Image
H R Giger

'surrealism' The Big Sleep Customized Text-to-Image

'seascape painting' The Big Sleep Customized Text-to-Image
seascape painting

Name: Big Sleep Minmax
Author: @!goose
Original script: https://colab.research.google.com/drive/12CnlS6lRGtieWujXs3GQ_OlghmFyl8ch
Time: 1 minute 45 seconds
Description: Another interesting Big Sleep variation.

'H R Giger' Big Sleep Minmax Text-to-Image
H R Giger

'surrealism' Big Sleep Minmax Text-to-Image

'seascape painting' Big Sleep Minmax Text-to-Image
seascape painting

Name: CLIP Pseudo Slime Mold
Author: hotgrits
Original script: https://discord.com/channels/729741769192767510/730484623028519072/850857930881892372
Time: 2 minutes 57 seconds
Description: This one gives unique output compared to the others. Really nicely defined sharp details. The colors come from any color palette you select (currently all the 3,479 palettes within Visions of Chaos can be used) so you can “tint” the resulting images with color shades you prefer.

'H R Giger' CLIP Pseudo Slime Mold Text-to-Image
H R Giger

'H R Giger' CLIP Pseudo Slime Mold Text-to-Image
H R Giger with a different color palette

'Shrek eating pizza' CLIP Pseudo Slime Mold Text-to-Image
Shrek eating pizza

'seascape painting' CLIP Pseudo Slime Mold Text-to-Image
seascape painting

Name: Aleph2Image Dall-E Remake
Author: danielrussruss
Original script: https://colab.research.google.com/drive/17ZSyxCyHUnwI1BgZG22-UFOtCWFvqQjy
Time: 3 minutes 42 seconds
Description: Another Aleph2Image variant.

'H R Giger' Aleph2Image Dall-E Remake Text-to-Image
H R Giger

'surrealism' Aleph2Image Dall-E Remake Text-to-Image

'seascape painting' Aleph2Image Dall-E Remake Text-to-Image
seascape painting

Author: Eleiber
Original script: https://colab.research.google.com/drive/1go6YwMFe5MX6XM9tv-cnQiSTU50N9EeT
Time: 2 minutes 52 seconds
Description: The best of the VQGAN systems so far. “v3” because it is the third VQGAN system I have tried and it didn’t have a unique specific name. Gives clear sharp images. Equal first place with Big Sleep for best results. Can give very painterly results with visible brush strokes if you use “a painting of” before the prompt subject.

'H R Giger' VQGAN+CLIP v3 Text-to-Image
H R Giger

'seascape painting' VQGAN+CLIP v3 Text-to-Image

'seascape painting' VQGAN+CLIP v3 Text-to-Image
seascape painting

'flowing water' VQGAN+CLIP v3 Text-to-Image
flowing water

Getting these CLIP based Text-to-Image scripts working under Python 1.8

I have seen people thinking that CLIP will not work under the latest Python 1.8. It will (at least as far as the CLIP functionality required to get these Text-to-Image systems working).
If you get an error about “RuntimeError: Method ‘forward’ is not defined.” then add jit=Flase to the clip.load command.
ie change




Then CLIP seems to work fine with Python 1.8.
All of the Text-to-Image systems in this post are working fine under Python 1.8 using the latest CLIP with only that single change.

Deterministic Results Impossible?

Setting the random seed does not seem to work with these systems.

Even if you declare

import numpy as np
import random
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False 

The line


sets torch to use deterministic functions (may be slower), but when this is set other CUDA based routines complain that they have no deterministic version so the scripts fail to run.

If anyone works out a way to get these scripts to be deterministic (ie multiple runs with the same seed give identical outputs) let me know.

Using These Scripts Outside Visions of Chaos

If you want to see the changes I made to the scripts or want to run the scripts from a command line, you can see my edits of the scripts under the folder

C:\Users\YOURUSERNAME\AppData\Roaming\Visions of Chaos\Examples\MachineLearning\Text To Image\

once Visions of Chaos is installed.

They can all be run from the command line. To see example command lines for each script you can run them once in Visions of Chaos. The command line will be shown in the progress dialog box. That will allow you to copy/paste the command(s) into a command prompt window if you do not want the GUI front end.

If you do not have a powerful enough GPU you can always resort to running them in the colabs linked to above.

Any Others I Missed?

Do you know of any other colabs and/or github Text-to-Image systems I have missed? Let me know and I will see if I can convert them to work with Visions of Chaos for a future release. If you know of any public Discords with other colabs being shared let me know too.