Text-to-Image Summary – Part 5

This is Part 5. There is also Part 1, Part 2, Part 3 and Part 4.

This post continues listing the Text-to-Image scripts included with Visions of Chaos and some example outputs from each script.


Name: Multi-Perceptor CLIP Guided Diffusion Secondary Model Method
Author: SOMNAI
Original script: https://colab.research.google.com/drive/1Pf5F84FzWe9iAKNbiPaEo_v4hvQZ9SqS
Time for 512×512 on a 3090: 7 minutes 23 seconds
Maximum resolution on a 24 GB 3090: 1792×768 or 2048×640.
Description: The winner for the longest name so far. Needs tweaking as the addition of the secondary model here reduces the usual excellent quality of the Multi-Perceptor CLIP Guided Diffusion. Still shows a lot of potential.

'a 3D render of Robocop' Multi-Perceptor CLIP Guided Diffusion Secondary Model Method Text-to-Image
a 3D render of Robocop

'a futuristic city IMAX' Multi-Perceptor CLIP Guided Diffusion Secondary Model Method Text-to-Image
a futuristic city IMAX

'a matte painting of trypophobia' Multi-Perceptor CLIP Guided Diffusion Secondary Model Method Text-to-Image
a matte painting of trypophobia

'a renaissance painting of a cloudy sunset trending on ArtStation' Multi-Perceptor CLIP Guided Diffusion Secondary Model Method Text-to-Image
a renaissance painting of a cloudy sunset trending on ArtStation

'a woman 4K photo' Multi-Perceptor CLIP Guided Diffusion Secondary Model Method Text-to-Image
a woman 4K photo

'an evil clown Flickr' Multi-Perceptor CLIP Guided Diffusion Secondary Model Method Text-to-Image
an evil clown Flickr

'an oil painting of a nightmare creature by Louis Janmot' Multi-Perceptor CLIP Guided Diffusion Secondary Model Method Text-to-Image
an oil painting of a nightmare creature by Louis Janmot

'Indiana Jones' Multi-Perceptor CLIP Guided Diffusion Secondary Model Method Text-to-Image
Indiana Jones

'reflective spheres' Multi-Perceptor CLIP Guided Diffusion Secondary Model Method Text-to-Image
reflective spheres

'zombies filmic' Multi-Perceptor CLIP Guided Diffusion Secondary Model Method Text-to-Image
zombies filmic


Name: Multi-Perceptor VQGAN+CLIP v2
Author: Remi Durant
Original script: https://colab.research.google.com/drive/1peZ98vBihDD9A1v7JdH5VvHDUuW5tcRK
Time for 512×512 on a 3090: 3 minutes 45 seconds
Maximum resolution on a 24 GB 3090: 1120×480.
Description: Version 2 of Remi’s Multi-Perceptor VQGAN+CLIP script.

'a babbling brook by Zhou Wenjing' Multi-Perceptor VQGAN+CLIP v2 Text-to-Image
a babbling brook by Zhou Wenjing

'a bedroom by Francesco Furini' Multi-Perceptor VQGAN+CLIP v2 Text-to-Image
a bedroom by Francesco Furini

'a computer by Édouard Detaille' Multi-Perceptor VQGAN+CLIP v2 Text-to-Image
a computer by Édouard Detaille

'a cross stitch of a landscape vivid colors' Multi-Perceptor VQGAN+CLIP v2 Text-to-Image
a cross stitch of a landscape vivid colors

'a kitchen filmic' Multi-Perceptor VQGAN+CLIP v2 Text-to-Image
a kitchen filmic

'a matte painting of halloween' Multi-Perceptor VQGAN+CLIP v2 Text-to-Image
a matte painting of halloween

'a pastel of a peacock' Multi-Perceptor VQGAN+CLIP v2 Text-to-Image
a pastel of a peacock

'a storybook illustration of a kitchen by Lena Alexander' Multi-Perceptor VQGAN+CLIP v2 Text-to-Image
a storybook illustration of a kitchen by Lena Alexander

'an oil on canvas painting of a zombie made of voxels' Multi-Perceptor VQGAN+CLIP v2 Text-to-Image
an oil on canvas painting of a zombie made of voxels

'vector art of Darth Vader' Multi-Perceptor VQGAN+CLIP v2 Text-to-Image
vector art of Darth Vader


Name: 360Diffusion
Author: @sadly_existent
Original script: https://colab.research.google.com/github/sadnow/360Diffusion/blob/main/360Diffusion_Public.ipynb
Time for 512×512 on a 3090: 2 minutes 50 seconds
Maximum resolution on a 24 GB 3090: 1120×480.
Description: A new diffusion based script. Capable of some interesting results

'a bronze sculpture of a crying person by Auguste BaudBovy' 360Diffusion Text-to-Image
a bronze sculpture of a crying person by Auguste BaudBovy

'a flemish baroque of a bouquet of flowers' 360Diffusion Text-to-Image
a flemish baroque of a bouquet of flowers

'a haunted house trending on ArtStation' 360Diffusion Text-to-Image
a haunted house trending on ArtStation

'a hyperrealistic painting of trypophobia by Xia Gui' 360Diffusion Text-to-Image
a hyperrealistic painting of trypophobia by Xia Gui

'a nightmare creature' 360Diffusion Text-to-Image
a nightmare creature

'a space nebula rendered in Cinema4D' 360Diffusion Text-to-Image
a space nebula rendered in Cinema4D

'a tentacle monster 4K HD realism' 360Diffusion Text-to-Image
a tentacle monster 4K HD realism

'an oil on canvas painting of Danny Trejo by Pablo Rey' 360Diffusion Text-to-Image
an oil on canvas painting of Danny Trejo by Pablo Rey

'Frankenstein' 360Diffusion Text-to-Image
Frankenstein

'heaven 8K 3D' 360Diffusion Text-to-Image
heaven 8K 3D


Name: Multi-Perceptor VQGAN+CLIP v3
Author: Remi Durant
Original script: https://colab.research.google.com/drive/1peZ98vBihDD9A1v7JdH5VvHDUuW5tcRK
Time for 512×512 on a 3090: 3 minutes 38 seconds
Maximum resolution on a 24 GB 3090: 1120×480.
Description: Version 3 of Remi’s Multi-Perceptor VQGAN+CLIP script.

'a bronze sculpture of Gandalf' Multi-Perceptor VQGAN+CLIP v3 Text-to-Image
a bronze sculpture of Gandalf

'a clown made of clay' Multi-Perceptor VQGAN+CLIP v3 Text-to-Image
a clown made of clay

'a detailed painting of a desert oasis' Multi-Perceptor VQGAN+CLIP v3 Text-to-Image
a detailed painting of a desert oasis

'a house by Kathleen Guthrie' Multi-Perceptor VQGAN+CLIP v3 Text-to-Image
a house by Kathleen Guthrie

'a peacock made of metal' Multi-Perceptor VQGAN+CLIP v3 Text-to-Image
a peacock made of metal

'a tilt shift photo of the Las Vegas strip' Multi-Perceptor VQGAN+CLIP v3 Text-to-Image
a tilt shift photo of the Las Vegas strip

'a watercolor painting of reflective spheres 8K 3D' Multi-Perceptor VQGAN+CLIP v3 Text-to-Image
a watercolor painting of reflective spheres 8K 3D

'an art deco painting of an amusement park' Multi-Perceptor VQGAN+CLIP v3 Text-to-Image
an art deco painting of an amusement park

'lineart of Big Bird by Alesso Baldovinetti' Multi-Perceptor VQGAN+CLIP v3 Text-to-Image
lineart of Big Bird by Alesso Baldovinetti

'vector art of a forest fire' Multi-Perceptor VQGAN+CLIP v3 Text-to-Image
vector art of a forest fire


Name: FuseDream
Author: Xingchao Liu et al
Original script: https://github.com/gnobitab/FuseDream
Time for 512×512 on a 3090: 3 minutes 38 seconds
Maximum resolution on a 24 GB 3090: Locked to 512×512.
Description: Gives some unique outputs compared to all the previous scripts.

'a clown' FuseDream Text-to-Image
a clown

'a king' FuseDream Text-to-Image
a king

'a matte painting of New York City by Robin Guthrie' FuseDream Text-to-Image
a matte painting of New York City by Robin Guthrie

'a portrait of a young girl' FuseDream Text-to-Image
a portrait of a young girl

'a rough seascape' FuseDream Text-to-Image
a rough seascape

'a sea monster' FuseDream Text-to-Image
a sea monster

'a teddy bear' FuseDream Text-to-Image
a teddy bear

'a werewolf' FuseDream Text-to-Image
a werewolf

'an airbrush painting of an angry woman' FuseDream Text-to-Image
an airbrush painting of an angry woman

'an attractive woman' FuseDream Text-to-Image
an attractive woman


Any Others I Missed?

Do you know of any other colabs and/or github Text-to-Image systems I have missed? Let me know and I will see if I can convert them to work with Visions of Chaos for a future release. If you know of any public Discords with other colabs being shared let me know too.

Jason.

Text-to-Image Summary – Part 4

This is Part 4. There is also Part 1, Part 2, Part 3 and Part 5.

This post continues listing the Text-to-Image scripts included with Visions of Chaos and some example outputs from each script.


Name: PixelDraw
Author: dribnet
Original script: https://colab.research.google.com/github/dribnet/clipit/blob/master/demos/PixelDrawer.ipynb
Time for 512×512 on a 3090: 1 minutes 59 seconds
Maximum resolution on a 24 GB 3090: Huge. 4096×4096 and beyond.
Description: Generates “pixel art” images. I had a lot of requests to add support for this one.

'a cartoon of a peacock' PixelDraw Text-to-Image
a cartoon of a peacock

'a cloudy sunset' PixelDraw Text-to-Image
a cloudy sunset

'a gorilla' PixelDraw Text-to-Image
a gorilla

'a morning landcsape' PixelDraw Text-to-Image
a morning landscape

'a watercolor painting of a castle' PixelDraw Text-to-Image
a watercolor painting of a castle

'an art deco painting of Al Pacino' PixelDraw Text-to-Image
an art deco painting of Al Pacino

'Hell' PixelDraw Text-to-Image
Hell

'Shrek' PixelDraw Text-to-Image
Shrek


Name: DirectVisions
Author: Jens Goldberg
Original script: https://colab.research.google.com/drive/127lKSsQjx-UDDUSvIkLL6mREfZ0KQu5D
Time for 512×512 on a 3090: 2 minutes 39 seconds
Maximum resolution on a 24 GB 3090: Huge. 4096×4096 and beyond.
Description: Interesting detailed images. Can create huge resolution results.

'a color pencil sketch of a western town' DirectVisions Text-to-Image
a color pencil sketch of a western town

'a detailed painting of a cephalopod' DirectVisions Text-to-Image
a detailed painting of a cephalopod

'a digital rendering of an ugly face' DirectVisions Text-to-Image
a digital rendering of an ugly face

'a pencil sketch of Buzz Lightyear' DirectVisions Text-to-Image
a pencil sketch of Buzz Lightyear

'a rough seascape by Pinchus Kremegne' DirectVisions Text-to-Image
a rough seascape by Pinchus Kremegne

'a stock photo of a president' DirectVisions Text-to-Image
a stock photo of a president

'a sunset' DirectVisions Text-to-Image
a sunset

'an alien city' DirectVisions Text-to-Image
an alien city

'an alien forest by Helen Berman' DirectVisions Text-to-Image
an alien forest by Helen Berman

'an evening landscape' DirectVisions Text-to-Image
an evening landscape


Name: Pixel Direct
Author: Unknown
Original script: https://colab.research.google.com/drive/1F9ZOZnpV3uBPRDSESaAXYwzNZJQRJT75
Time for 512×512 on a 3090: 1 minutes 03 seconds
Maximum resolution on a 24 GB 3090: Huge. 4096×4096 and beyond.
Description: Another “Pixel Art” script. More abstract results than the PixelDraw script above.

'a bronze sculpture of a nightmare creature' Pixel Direct Text-to-Image
a bronze sculpture of a nightmare creature

'a cartoon of Al Pacino' Pixel Direct Text-to-Image
a cartoon of Al Pacino

'a nightclub' Pixel Direct Text-to-Image
a nightclub

'a silk screen of a bouquet of flowers' Pixel Direct Text-to-Image
a silk screen of a bouquet of flowers

'an etching of a worried woman' Pixel Direct Text-to-Image
an etching of a worried woman

'an illustration of of a thunder storm' Pixel Direct Text-to-Image
an illustration of of a thunder storm


Name: FourierVisions
Author: Unknown
Original script: https://colab.research.google.com/drive/1nGNBjhbYnDHSumGPjpFHjDOsaZFAqGgF
Time for 512×512 on a 3090: 1 minutes 40 seconds
Maximum resolution on a 24 GB 3090: Huge. 4096×4096 and beyond.
Description: Detailed images. The default script generates washed out pastel images, but with some gamma and brightness tweaks they can be improved (still not ideal, but better). Allows very large resolution images.

'a cathedral' FourierVisions Text-to-Image
a cathedral

'a charcoal drawing of zombies' FourierVisions Text-to-Image
a charcoal drawing of zombies

'a detailed painting of a sunset by Thomas Cantrell Dugdale' FourierVisions Text-to-Image
a detailed painting of a sunset by Thomas Cantrell Dugdale

'a ghost made of mist' FourierVisions Text-to-Image
a ghost made of mist

'a kitchen' FourierVisions Text-to-Image
a kitchen

'a movie monster' FourierVisions Text-to-Image
a movie monster

'a pencil sketch of a sad clown' FourierVisions Text-to-Image
a pencil sketch of a sad clown

'a werewolf' FourierVisions Text-to-Image
a werewolf

'an evil clown by Viktor Oliva' FourierVisions Text-to-Image
an evil clown by Viktor Oliva

'an ink drawing of an ugly monster' FourierVisions Text-to-Image
an ink drawing of an ugly monster


Name: PyramidVisions
Author: Unknown
Original script: https://colab.research.google.com/drive/1dpAS_wK34y7c6s-CatAFmBtbkjGT_erM
Time for 512×512 on a 3090: 3 minutes 08 seconds
Maximum resolution on a 24 GB 3090: Huge. 4096×4096 and beyond.
Description: Very detailed images. Not the fastest script, but gives some very nice results. Lower VRAM requirements so good for lesser spec GPUs. Definitely one of the better scripts worth exploring.

'a desert oasis' PyramidVisions Text-to-Image
a desert oasis

'a lush rainforest' PyramidVisions Text-to-Image
a lush rainforest

'a marble sculpture of an angry person' PyramidVisions Text-to-Image
a marble sculpture of an angry person

'a minimalist painting of the Amazon Rainforest' PyramidVisions Text-to-Image
a minimalist painting of the Amazon Rainforest

'a nightmare creature' PyramidVisions Text-to-Image
a nightmare creature

'a pastel of a computer made of paper' PyramidVisions Text-to-Image
a pastel of a computer made of paper

'an abstract sculpture of a sad clown' PyramidVisions Text-to-Image
an abstract sculpture of a sad clown

'an acrylic painting of an alien forest | vivid colors' PyramidVisions Text-to-Image
an acrylic painting of an alien forest | vivid colors

'Medusa' PyramidVisions Text-to-Image
Medusa

'vector art of an ugly woman' PyramidVisions Text-to-Image
vector art of an ugly woman


Name: Visions of AI v1
Author: Jason Rampe
Original script: Included with Visions of Chaos. No colab.
Time for 512×512 on a 3090: 1 minutes 32 seconds
Maximum resolution on a 24 GB 3090: 768×768 or 1120×480.
Description: My first attempt at actually creating a Text-to-Image script. Based on the excellent example from Jonathan Whitaker‘s AIAIArt Lesson 3 tutorial. Gives some very nice fine detail in some areas, but suffers the non coherance of other scripts in that it creates multiple copies of the subject throughout the image. After actually trying to write my own script I only have more respect for those who can do this. Hopefully I can improve these results for a version 2. In the meantime, here are some sample from the current Visions of AI script.

'a cartoon of the human condition by Judy Takács' Visions of AI Text-to-Image
a cartoon of the human condition by Judy Takács

'a cubist painting of an evening landscape' Visions of AI Text-to-Image
a cubist painting of an evening landscape

'a digital rendering of frogs' Visions of AI Text-to-Image
a digital rendering of frogs

'a fire breathing dragon' Visions of AI Text-to-Image
a fire breathing dragon

'a hyperrealistic painting of a movie monster' Visions of AI Text-to-Image
a hyperrealistic painting of a movie monster

'a morning landscape' Visions of AI Text-to-Image
a morning landscape

'a shark' Visions of AI Text-to-Image
a shark

'a woodcut of an ugly man' Visions of AI Text-to-Image
a woodcut of an ugly man

'an airbrush painting of C-3PO' Visions of AI Text-to-Image
an airbrush painting of C-3PO

'Frankenstein' Visions of AI Text-to-Image
Frankenstein


Name: Visions of AI v2
Author: Jason Rampe
Original script: Included with Visions of Chaos. No colab.
Time for 512×512 on a 3090: 2 minutes 35 seconds
Maximum resolution on a 24 GB 3090: 768×768 or 1120×480.
Description: An attempt to improve the coherency of the previous script. The first 30 iterations zoom into the image every 10 frames. This results in larger shapes/blobs for the rest of the script to work from. The idea is that it will give larger subjects compared to the v1 script. Kind of works. Gives blurrier results. To be fixed in the next version?

'a morning landscape by William Gear' Visions of AI v2 Text-to-Image
a morning landscape by William Gear

'a raytraced image of a nightclub lens flare' Visions of AI v2 Text-to-Image
a raytraced image of a nightclub lens flare

'a tentacle monster by Carlo Crivelli' Visions of AI v2 Text-to-Image
a tentacle monster by Carlo Crivelli

'a woodcut of a worried woman by Li Keran' Visions of AI v2 Text-to-Image
a woodcut of a worried woman by Li Keran

'an illustration of of a cave made of cheese' Visions of AI v2 Text-to-Image
an illustration of of a cave made of cheese

'Cthulhu' Visions of AI v2 Text-to-Image
Cthulhu

'cyberpunk art of a futuristic city' Visions of AI v2 Text-to-Image
cyberpunk art of a futuristic city

'goldfish' Visions of AI v2 Text-to-Image
goldfish

'reflective spheres' Visions of AI v2 Text-to-Image
reflective spheres

'the Australian outback' Visions of AI v2 Text-to-Image
the Australian outback


Name: Multi-Perceptor CLIP Guided Diffusion
Author: Varkarrus
Original script: https://colab.research.google.com/drive/1y3Vt39A5KSNFRa6Z2bCqDHxteZSVH9NC
Time for 512×512 on a 3090: 3 minutes 08 seconds
Maximum resolution on a 24 GB 3090: 896×512 or 1152×384 (dimensions must be divisible by 128).
Description: Builds upon previous CLIP Guided Diffusion scripts. Like the previous script by Dango233 it uses three CLIP models simultaneously to “rate” the generated images, and I have added options to use up to six different CLIP models. The resulting image accuracy compared to the prompt, and the resulting image coherence seem to be much better than previous CLIP Guided Diffusion scripts that could almost have random outputs sometimes. This script is superb and highly recommended. Great lighting, textures and brushstrokes. Normally with these blog posts I do a batch run of random prompts overnight and then pick the best 10 images. In this case I had nearly 50 images in my “good” folder after going through the batch results. So, for this script I am showing 20 sample images.

'a cute creature | TriX 400 TX' Multi-Perceptor CLIP Guided Diffusion Text-to-Image
a cute creature | TriX 400 TX

'a digital painting of Frankenstein by Kanzan Shimomura' Multi-Perceptor CLIP Guided Diffusion Text-to-Image
a digital painting of Frankenstein by Kanzan Shimomura

'a morning landscape by János SaxonSzász' Multi-Perceptor CLIP Guided Diffusion Text-to-Image
a morning landscape by János SaxonSzász

'a nightmare creature' Multi-Perceptor CLIP Guided Diffusion Text-to-Image
a nightmare creature

'a photorealistic painting of a teddy bear' Multi-Perceptor CLIP Guided Diffusion Text-to-Image
a photorealistic painting of a teddy bear

'a portrait of a young girl' Multi-Perceptor CLIP Guided Diffusion Text-to-Image
a portrait of a young girl

'a space nebula | IMAX' Multi-Perceptor CLIP Guided Diffusion Text-to-Image
a space nebula | IMAX

'a worried man' Multi-Perceptor CLIP Guided Diffusion Text-to-Image
a worried man

'a zombie by Nathaniel Hone' Multi-Perceptor CLIP Guided Diffusion Text-to-Image
a zombie by Nathaniel Hone

'an acrylic painting of a spider by Abram Arkhipov' Multi-Perceptor CLIP Guided Diffusion Text-to-Image
an acrylic painting of a spider by Abram Arkhipov

'an airbrush painting of a monkey by Jeremy Henderson' Multi-Perceptor CLIP Guided Diffusion Text-to-Image
an airbrush painting of a monkey by Jeremy Henderson

'an alien landscape' Multi-Perceptor CLIP Guided Diffusion Text-to-Image
an alien landscape

'an ugly creature made of insects' Multi-Perceptor CLIP Guided Diffusion Text-to-Image
an ugly creature made of insects

'an ultrafine detailed painting of a sad person | ZBrush' Multi-Perceptor CLIP Guided Diffusion Text-to-Image
an ultrafine detailed painting of a sad person | ZBrush

'Arnold Schwarzenegger | trending on ArtStation' Multi-Perceptor CLIP Guided Diffusion Text-to-Image
Arnold Schwarzenegger | trending on ArtStation

'concept art of Robocop' Multi-Perceptor CLIP Guided Diffusion Text-to-Image
concept art of Robocop

'dinosaurs' Multi-Perceptor CLIP Guided Diffusion Text-to-Image
dinosaurs

'Dracula | CGSociety' Multi-Perceptor CLIP Guided Diffusion Text-to-Image
Dracula | CGSociety

'flesh made of insects' Multi-Perceptor CLIP Guided Diffusion Text-to-Image
flesh made of insects

'God by William Simpson' Multi-Perceptor CLIP Guided Diffusion Text-to-Image
God by William Simpson


Name: Pixel MultiColors
Author: Remi Durant
Original script: https://colab.research.google.com/drive/17c-13cl_VQKpHq2rDrnFVi6ZT-CHeZNn
Time for 512×512 on a 3090: 0 minutes 44 seconds
Maximum resolution on a 24 GB 3090: 4096×4096.
Description: Very noisy/pixelated/abstract results. The default script gives dark images which some tweaks to brightness and contrast can help. Maybe a little bit of blur could help too in a future revision. It is fast though, and can support huge image sizes.

'a charcoal drawing of a cute creature made of metal' Pixel MultiColors Text-to-Image
a charcoal drawing of a cute creature made of metal

'a farm' Pixel MultiColors Text-to-Image
a farm

'a forest path by Walter Leighton Clark' Pixel MultiColors Text-to-Image
a forest path by Walter Leighton Clark

'a lighthouse' Pixel MultiColors Text-to-Image
a lighthouse

'a surrealist painting of a beachside resort' Pixel MultiColors Text-to-Image
a surrealist painting of a beachside resort

'a well kept garden' Pixel MultiColors Text-to-Image
a well kept garden

'an abstract sculpture of Pikachu' Pixel MultiColors Text-to-Image
an abstract sculpture of Pikachu

'an art deco painting of a volcano' Pixel MultiColors Text-to-Image
an art deco painting of a volcano

'an ink drawing of tentacles' Pixel MultiColors Text-to-Image
an ink drawing of tentacles

'an octopus Rendered in Cinema4D' Pixel MultiColors Text-to-Image
an octopus Rendered in Cinema4D


Name: Ultraquick CLIP Guided Diffusion
Author: @sadly_existent
Original script: https://colab.research.google.com/github/sadnow/360Diffusion/blob/main/360Diffusion_AlphaTesting.ipynb
Time for 512×512 on a 3090: 1 minute 57 seconds
Maximum resolution on a 24 GB 3090: Locked to either 256×256 or 512×512.
Description: Another CLIP Guided Diffusion script. Can give some interesting results.

'a cave' Pixel MultiColors Text-to-Image
a cave

'a color pencil sketch of Cthulhu' Pixel MultiColors Text-to-Image
a color pencil sketch of Cthulhu

'a detailed painting of Shrek' Pixel MultiColors Text-to-Image
a detailed painting of Shrek

'a flemish baroque of the human condition by George Barret Jr' Pixel MultiColors Text-to-Image
a flemish baroque of the human condition by George Barret Jr

'a low poly render of halloween' Pixel MultiColors Text-to-Image
a low poly render of halloween

'a photorealistic painting of a worried woman made of paper by Ann Thetis Blacker' Pixel MultiColors Text-to-Image
a photorealistic painting of a worried woman made of paper by Ann Thetis Blacker

'a surrealist painting of a worried man' Pixel MultiColors Text-to-Image
a surrealist painting of a worried man

'a surrealist sculpture of an angry man 8K 3D' Pixel MultiColors Text-to-Image
a surrealist sculpture of an angry man 8K 3D

'Robocop' Pixel MultiColors Text-to-Image
Robocop

'zombies' Pixel MultiColors Text-to-Image
zombies


Name: ruDALL-E
Author: @sadly_existent
Original script: https://colab.research.google.com/drive/1wGE-046et27oHvNlBNPH07qrEQNE04PQ
Optimized script: https://colab.research.google.com/drive/1euIMG8E6kSFA2nU58LqrVsq6nbXjqELY
Time for 256×256 on a 3090: 1 minute 05 seconds
Maximum resolution on a 24 GB 3090: Locked to 256×256.
Description: Russian version of DALL-E. Only takes text prompts in Russian, so I do some auto English to Russian translations. Locked to small 256×256 images at this stage, but can create some interesting results.

'a hyperrealistic painting of Chewbacca by Edith Grace Wheatley' ruDALL-E Text-to-Image
a hyperrealistic painting of Chewbacca by Edith Grace Wheatley

'a low poly render of Pikachu' ruDALL-E Text-to-Image
a low poly render of Pikachu

'a man' ruDALL-E Text-to-Image
a man

'a rose' ruDALL-E Text-to-Image
a rose

'a stock photo of puppies' ruDALL-E Text-to-Image
a stock photo of puppies

'egyptian art of a portrait of a woman' ruDALL-E Text-to-Image
egyptian art of a portrait of a woman

'Harry Potter' ruDALL-E Text-to-Image
Harry Potter

'Indiana Jones' ruDALL-E Text-to-Image
Indiana Jones

'Robocop made of gold' ruDALL-E Text-to-Image
Robocop made of gold

'Yoda' ruDALL-E Text-to-Image
Yoda


Name: ruVQGAN+CLIP
Author: nev
Original script: https://colab.research.google.com/drive/1wAnIHocDYFAbWtA7rk8C7cFEUdRyLzwZ
Time for 512×512 on a 3090: 1 minute 28 seconds
Maximum resolution on a 24 GB 3090: 1120×480.
Description: Creates fairly blurry results. Even with post process sharpening. If anyone could get these results crisper it would be really improve the output.

'a 3D render of a wizard by Gertrude Greene' ruVQGAN+CLIP Text-to-Image
a 3D render of a wizard by Gertrude Greene

'a cubist painting of a Pokemon character' ruVQGAN+CLIP Text-to-Image
a cubist painting of a Pokemon character

'a cute creature' ruVQGAN+CLIP Text-to-Image
a cute creature

'a matte painting of halloween by Carlos Trillo Name' ruVQGAN+CLIP Text-to-Image
a matte painting of halloween by Carlos Trillo Name

'a photorealistic painting of an alien landscape by Jacob Ochtervelt' ruVQGAN+CLIP Text-to-Image
a photorealistic painting of an alien landscape by Jacob Ochtervelt

'a rough seascape filmic' ruVQGAN+CLIP Text-to-Image
a rough seascape filmic

'a sea monster' ruVQGAN+CLIP Text-to-Image
a sea monster

'a woodcut of a skull by Gu Hongzhong trending on ArtStation' ruVQGAN+CLIP Text-to-Image
a woodcut of a skull by Gu Hongzhong trending on ArtStation

'Cthulhu' ruVQGAN+CLIP Text-to-Image
Cthulhu

'trypophobia' ruVQGAN+CLIP Text-to-Image
trypophobia


Name: Multi-Perceptor VQGAN+CLIP
Author: Remi Durant
Original script: https://colab.research.google.com/drive/1peZ98vBihDD9A1v7JdH5VvHDUuW5tcRK
Time for 512×512 on a 3090: 2 minute 30 seconds
Maximum resolution on a 24 GB 3090: 1120×480.
Description: As with the previous Multi-Perceptor CLIP Guided Diffusion scripts this one allows two different CLIP models to be used to rate the VQGAN output images. VQGAN is not going to beat diffusion for image coherance, but this script can give some very nice lighting and fine details in images.

'a bronze sculpture of an evil clown made of clay by Dionisio Baixeras Verdaguer' Multi-Perceptor VQGAN+CLIP Text-to-Image
a bronze sculpture of an evil clown made of clay by Dionisio Baixeras Verdaguer

'a fantasy land by Shigeru Aoki' Multi-Perceptor VQGAN+CLIP Text-to-Image
a fantasy land by Shigeru Aoki

'a hyperrealistic painting of puppies' Multi-Perceptor VQGAN+CLIP Text-to-Image
a hyperrealistic painting of puppies

'a midnineteenth century engraving of the Sydney Opera House' Multi-Perceptor VQGAN+CLIP Text-to-Image
a midnineteenth century engraving of the Sydney Opera House

'a statue of reflective spheres' Multi-Perceptor VQGAN+CLIP Text-to-Image
a statue of reflective spheres

'a surrealist painting of a tropical beach' Multi-Perceptor VQGAN+CLIP Text-to-Image
a surrealist painting of a tropical beach

'an alien city CGSociety' Multi-Perceptor VQGAN+CLIP Text-to-Image
an alien city CGSociety

'an oil painting of a fire breathing dragon' Multi-Perceptor VQGAN+CLIP Text-to-Image
an oil painting of a fire breathing dragon

'computer rendering of a well kept garden by Norman Garstin ZBrush' Multi-Perceptor VQGAN+CLIP Text-to-Image
computer rendering of a well kept garden by Norman Garstin ZBrush

'war CryEngine' Multi-Perceptor VQGAN+CLIP Text-to-Image
war CryEngine


Name: Hypertron
Author: Philipuss
Original script: https://colab.research.google.com/drive/10fa8X6EsfZfda1dfhJ_BtfPZ7Te1WGoX
Time for 512×512 on a 3090: 2 minute 00 seconds
Maximum resolution on a 24 GB 3090: 1120×480.
Description: Another VQGAN based script. Has various “flavors” to give different results. Works OK. Can give the “image in a sea of purple/grey” that previous MSE based scripts suffered from. Still worth a try.

'a black and white photo of a fireman' Hypertron Text-to-Image
a black and white photo of a fireman

'a cute monster by Józef Mehoffer' Hypertron Text-to-Image
a cute monster by Józef Mehoffer

'a matte painting of a forest clearing' Hypertron Text-to-Image
a matte painting of a forest clearing

'a pop art painting of a human' Hypertron Text-to-Image
a pop art painting of a human

'a renaissance painting of a ghost by Jan van de Cappelle film' Hypertron Text-to-Image
a renaissance painting of a ghost by Jan van de Cappelle film

'a sea monster made of metal' Hypertron Text-to-Image
a sea monster made of metal

'a tattoo of a zombie' Hypertron Text-to-Image
a tattoo of a zombie

'a watercolor painting of a dragon Flickr' Hypertron Text-to-Image
a watercolor painting of a dragon Flickr

'an art deco painting of a haunted house by Mary Cameron' Hypertron Text-to-Image
an art deco painting of a haunted house by Mary Cameron

'concept art of a mountainscape by Maximilian Cercha' Hypertron Text-to-Image
concept art of a mountainscape by Maximilian Cercha


Name: CLIP Guided Diffusion Secondary Model Method
Author: Katherine Crowson
Original script: https://colab.research.google.com/drive/1mpkrhOjoyzPeSWy2r7T8EYRaU7amYOOi
Time for 512×512 on a 3090: 2 minute 28 seconds
Maximum resolution on a 24 GB 3090: 1792×768 or 2048×640.
Description: A new diffusion based script from Katherine Crowson including a new “secondary model” she trained. Capable of some unique results with good textures and lighting.

'a detailed painting of Fozzy Bear by LeConte Stewart' CLIP Guided Diffusion Secondary Model Method Text-to-Image
a detailed painting of Fozzy Bear by LeConte Stewart

'a flemish baroque of a happy person trending on pixiv' CLIP Guided Diffusion Secondary Model Method Text-to-Image
a flemish baroque of a happy person trending on pixiv

'a flock of birds' CLIP Guided Diffusion Secondary Model Method Text-to-Image
a flock of birds

'a Ghostbuster CGSociety' CLIP Guided Diffusion Secondary Model Method Text-to-Image
a Ghostbuster CGSociety

'a kitchen made of cheese' CLIP Guided Diffusion Secondary Model Method Text-to-Image
a kitchen made of cheese

'a nightmare creature' CLIP Guided Diffusion Secondary Model Method Text-to-Image
a nightmare creature

'a photorealistic painting of The Grinch' CLIP Guided Diffusion Secondary Model Method Text-to-Image
a photorealistic painting of The Grinch

'a portrait of a woman' CLIP Guided Diffusion Secondary Model Method Text-to-Image
a portrait of a woman

'an art deco painting of a sad clown' CLIP Guided Diffusion Secondary Model Method Text-to-Image
an art deco painting of a sad clown

'an oil painting of a nightmare' CLIP Guided Diffusion Secondary Model Method Text-to-Image
an oil painting of a nightmare



Any Others I Missed?

Do you know of any other colabs and/or github Text-to-Image systems I have missed? Let me know and I will see if I can convert them to work with Visions of Chaos for a future release. If you know of any public Discords with other colabs being shared let me know too.

Jason.

Text-to-Image Summary – Part 3

This is Part 3. There is also Part 1, Part 2, Part 4 and Part 5.

This post continues listing the Text-to-Image scripts included with Visions of Chaos and some example outputs from each script.


Name: CLIP Guided Diffusion v4
Author: Katherine Crowson
Original script: https://colab.research.google.com/drive/1V66mUeJbXrTuQITvJunvnWVn96FEbSI3
Time for 512×512 on a 3090: 3 minutes 05 seconds
Maximum resolution on a 24 GB 3090: Locked to 512×512
Description: Another CLIP Guided Diffusion script. Locked to 512×512 resolution. Like the other CLIP Diffusion scripts, some of the results can be very detailed and interesting, but a lot of time it is hit and miss to get a result that reliably matches the input phrase. When it gets a “hit” it can create very detailed impressive results, but the amount of “misses” stops it from getting a great rating. Still worth a try if you have the patience to run a large batch of images waiting for the best results. The following samples came hand picked from a large batch run of random prompt phrases.

'a forest clearing' CLIP Guided Diffusion v4 Text-to-Image
a forest clearing

'a storybook illustration of a nightmare' CLIP Guided Diffusion v4 Text-to-Image
a storybook illustration of a nightmare

'an impressionist painting of a cemetery' CLIP Guided Diffusion v4 Text-to-Image
an impressionist painting of a cemetery

'Harry Potter in the style of Rembrandt' CLIP Guided Diffusion v4 Text-to-Image
Harry Potter in the style of Rembrandt

'a detailed painting of a witch' CLIP Guided Diffusion v4 Text-to-Image
a detailed painting of a witch

'a babbling brook' CLIP Guided Diffusion v4 Text-to-Image
a babbling brook

'a desert oasis' CLIP Guided Diffusion v4 Text-to-Image
a desert oasis

'a hyperrealistic painting of an android' CLIP Guided Diffusion v4 Text-to-Image
a hyperrealistic painting of an android

'eyeballs' CLIP Guided Diffusion v4 Text-to-Image
eyeballs

'a cross stitch of Buzz Lightyear' CLIP Guided Diffusion v4 Text-to-Image
a cross stitch of Buzz Lightyear


Name: CLIP Guided Decision Transformer
Author: Katherine Crowson
Original script: https://colab.research.google.com/drive/1V66mUeJbXrTuQITvJunvnWVn96FEbSI3
Time for 512×512 on a 3090: 1 minutes 13 seconds
Maximum resolution on a 24 GB 3090: Locked to 384×384
Description: Another one from Katherine Crowson. Some of the results can be very detailed and interesting, but a lot of time it is hit and miss to get a result that reliably matches the input phrase. When it gets a “hit” it can create very detailed impressive results, but the amount of “misses” stops it from getting a great rating. The following samples came hand picked from a large batch run of random prompt phrases.
Another good point for CLIP Decsision Transformer is that it will generate a batch of images from each run. So rather than a single image for the prompt text you can specify (for example) 8 images to be generated from the prompt. This allows a much larger set of images to be quickly generated to find those great outputs in.
For these images I have enhanced the resolution 4x using Real-ESRGAN (the thumnails are the original output images and the clicked images are resized x4).

a detailed painting of a palace by Thomas Kinkade
a detailed painting of a palace by Thomas Kinkade

a drawing of Chewbacca
a drawing of Chewbacca

a forest path
a forest path

a renaissance painting of a mountain range
a renaissance painting of a mountain range

a rough seascape
a rough seascape

a rough seascape
a rough seascape

a spooky forest
a spooky forest

an oil on canvas painting of a western town
an oil on canvas painting of a western town

Frankenstein
Frankenstein

The Grand Canyon
The Grand Canyon


Name: CLIPIT
Author: dribnet
Original script: https://github.com/dribnet/clipit
Time for 512×512 on a 3090: 2 minutes 38 seconds
Maximum resolution on a 24 GB 3090: 768×768 or 1120×480
Description: Another GAN+CLIP script. Gives nice results that tend to match the prompt text more closely. This one is heavy on VAM usage.

'a happy family by Piet Mondiran' CLIPIT
a happy family by Piet Mondiran

'a landscape' CLIPIT
a landscape

'a peacock' CLIPIT
a peacock

'a tropical beach by Thomas Kinkade' CLIPIT
a tropical beach by Thomas Kinkade

'a woodcut of Dracula' CLIPIT
a woodcut of Dracula

'an ambient occlusion render of a zombie' CLIPIT
an ambient occlusion render of a zombie

'eyeballs in the style of Claude Monet' CLIPIT
eyeballs in the style of Claude Monet


Name: Art Machine
Author: Hillel Wayne
Original script: https://colab.research.google.com/drive/1n_xrgKDlGQcCF6O-eL3NOd_x4NSqAUjK
Time for 512×512 on a 3090: 4 minutes 04 seconds
Maximum resolution on a 24 GB 3090: 768×768 or 1120×480
Description: Another VQGAN+CLIP scipt.

'a charcoal drawing of a kitchen' Art Machine
a charcoal drawing of a kitchen

'a mosaic of a mountain path | CryEngine' Art Machine
a mosaic of a mountain path | CryEngine

'a silk screen of a tropical beach in the style of Kandinsky' Art Machine
a silk screen of a tropical beach in the style of Kandinsky

'a woodcut of a nightmare creature' Art Machine
a woodcut of a nightmare creature

'an illustration of of a mountainscape' Art Machine
an illustration of of a mountainscape

'an ultrafine detailed painting of a green tree frog as created by Craig Mullins' Art Machine
an ultrafine detailed painting of a green tree frog as created by Craig Mullins

'Dracula' Art Machine
Dracula

'Planets' Art Machine
Planets


Name: VQGAN+CLIP v5
Author: Max Woolf
Original script: https://colab.research.google.com/drive/1wkF67ThUz37T2_oPIuSwuO4e_-0vjaLs
Time for 512×512 on a 3090: 2 minutes 13 seconds
Maximum resolution on a 24 GB 3090: 768×768 or 1120×480
Description: Another VQGAN+CLIP scipt. More abstract results from this one.

'a desert oasis in the style of Salvador Dali' VQGAN+CLIP v5
a desert oasis in the style of Salvador Dali

'a hyperrealistic painting of a dragon' VQGAN+CLIP v5
a hyperrealistic painting of a dragon

'Big Bird' VQGAN+CLIP v5
Big Bird

'Cthulhu' VQGAN+CLIP v5
Cthulhu

'Robert DeNiro' VQGAN+CLIP v5
Robert DeNiro

'Yoda' VQGAN+CLIP v5
Yoda “hmmm, abstract I am”


Name: Zoetrope 5.5
Author: Bearsharktopusdev
Original script: https://colab.research.google.com/drive/1LpEbICv1mmta7Qqic1IcRTsRsq7UKRHM
Time for 512×512 on a 3090: 3 minutes 27 seconds
Maximum resolution on a 24 GB 3090: 768×768 or 1120×720
Description: Updated version of Zoetrope 5. Supports more VQGAN models, CLIP models and optimizers compared to Zoetrope 5.

'a cephalopod' Zoetrope 5.5 Text-to-Image
a cephalopod

'a flemish baroque of a demon' Zoetrope 5.5 Text-to-Image
a flemish baroque of a demon

'a photo of a submarine in the style of Vincent van Gogh' Zoetrope 5.5 Text-to-Image
a photo of a submarine in the style of Vincent van Gogh

'a snail' Zoetrope 5.5 Text-to-Image
a snail

'Cthulhu' Zoetrope 5.5 Text-to-Image
Cthulhu

'flesh' Zoetrope 5.5 Text-to-Image
flesh


Name: Zeta Quantize
Author: afiaka87
Original script: https://colab.research.google.com/gist/afiaka87/a97cca3b54c02209b94ff805224f9eb5/zeta_quantize.ipynb
Time for 512×512 on a 3090: 4 minutes 18 seconds
Maximum resolution on a 24 GB 3090: 768×768 or 1120×720
Description: Another VQGAN+CLIP scipt.

'a cute creature made of silver' Zeta Quantize
a cute creature made of silver

'a detailed painting of a cephalopod' Zeta Quantize
a detailed painting of a cephalopod

'a detailed painting of a ghost' Zeta Quantize
a detailed painting of a ghost

'a forest fire made of copper' Zeta Quantize
a forest fire made of copper

'a peacock' Zeta Quantize
a peacock

'a sketch of a Pokemon character in the style of Odilon Redon' Zeta Quantize
a sketch of a Pokemon character in the style of Odilon Redon

'a watercolor painting of dense woodland' Zeta Quantize
a watercolor painting of dense woodland


Name: Experimental VQGAN
Author: Various
Original script: https://colab.research.google.com/drive/1jx3klUxlGbYUwvtqzC9SYl4XZKHL3R81
Time for 512×512 on a 3090: 1 minutes 12 seconds
Maximum resolution on a 24 GB 3090: 768×768 or 1120×720
Description: Very nice smooth results from this one.

'a desert oasis in the style of Craig Mullins' Experimental VQGAN
a desert oasis in the style of Craig Mullins

'a dragon' Experimental VQGAN
a dragon

'a manga drawing of a happy alien' Experimental VQGAN
a manga drawing of a happy alien

'a nightmare' Experimental VQGAN
a nightmare

'a surrealist painting of love' Experimental VQGAN
a surrealist painting of love

'a watercolor painting of a lighthouse' Experimental VQGAN
a watercolor painting of a lighthouse

'an airbrush painting of a well kept garden by Piet Mondiran' Experimental VQGAN
an airbrush painting of a well kept garden by Piet Mondiran

'Cookie Monster' Experimental VQGAN
Cookie Monster


Name: SlideShowVisions
Author: Active Galaxy
Original script: https://colab.research.google.com/drive/1IihC4ZJvCh_tOgBVd900BzHX-ulPEFsa
Time for 512×512 on a 3090: 2 minutes 25 seconds
Maximum resolution on a 24 GB 3090: 768×768 or 1120×720
Description: Tends to give more abstract paper cutout looks.

'a happy child' SlideShowVisions
a happy child

'a house vivid colors' SlideShowVisions
a house vivid colors

'a sea monster' SlideShowVisions
a sea monster

'a thunder storm' SlideShowVisions
a thunder storm

'a tree' SlideShowVisions
a tree

'a woodcut of war' SlideShowVisions
a woodcut of war

'an engraving of zombies' SlideShowVisions
an engraving of zombies

'Han Solo' SlideShowVisions
Han Solo


Name: Quick CLIP Guided Diffusion
Author: Daniel Russell
Original script: https://colab.research.google.com/drive/1FuOobQOmDJuG7rGsMWfQa883A9r4HxEO
Time for 512×512 on a 3090: 43 seconds
Maximum resolution on a 24 GB 3090: 512×512
Description: Modified version of CLIP Guided Diffusion that gets results quicker. Option for 256×256 or 512×512 sized images. Still very hit and miss when getting images that resemble the input prompt. The following samples came from a large overnight batch run of random prompts.

'a cathedral' Quick CLIP Guided Diffusion
a cathedral

'a digital painting of a space nebula' Quick CLIP Guided Diffusion
a digital painting of a space nebula

'a lounge room' Quick CLIP Guided Diffusion
a lounge room

'a monkey | lens flare' Quick CLIP Guided Diffusion
a monkey | lens flare

'a nightmare creature' Quick CLIP Guided Diffusion
a nightmare creature

'a rough seascape' Quick CLIP Guided Diffusion
a rough seascape

'a landscape' Quick CLIP Guided Diffusion
a landscape

'an android' Quick CLIP Guided Diffusion
an android

'an attractive woman' Quick CLIP Guided Diffusion
an attractive woman

'an oil on canvas painting of a cloudy sunset' Quick CLIP Guided Diffusion
an oil on canvas painting of a cloudy sunset


Name: CLIP Guided Diffusion v5
Author: Katherine Crowson
Original script: https://colab.research.google.com/drive/1QBsaDAZv8np29FPbvjffbE1eytoJcsgA
Time for 512×512 on a 3090: 3 minutes 48 seconds
Maximum resolution on a 24 GB 3090: Locked to 512×512
Description: Another CLIP Guided Diffusion script. Locked to 512×512 resolution. Needs less VRAM than the previous versions. The following samples came hand picked from a large batch run of random prompt phrases.

'a cityscape' CLIP Guided Diffusion v5 Text-to-Image
a cityscape

'a gorilla' CLIP Guided Diffusion v5 Text-to-Image
a gorilla

'Cthulhu by Craig Mullins' CLIP Guided Diffusion v5 Text-to-Image
Cthulhu by Craig Mullins

'computer rendering of Emporer Palpatine made of cheese by Evan Charlton' CLIP Guided Diffusion v5 Text-to-Image
computer rendering of Emporer Palpatine made of cheese by Evan Charlton

'digital art of a mountainscape as created by Persis Goodale Thurston Taylor' CLIP Guided Diffusion v5 Text-to-Image
digital art of a mountainscape as created by Persis Goodale Thurston Taylor

'a digital rendering of Chewbacca' CLIP Guided Diffusion v5 Text-to-Image
a digital rendering of Chewbacca

'an ugly person' CLIP Guided Diffusion v5 Text-to-Image
an ugly person

See this tweet for an example of using CLIP Guided Diffusion to stylize a portrait.


Name: MSE Regulized Modified
Author: jbusted
Original script: https://colab.research.google.com/drive/1gFn9u3oPOgsNzJWEFmdK-N9h_y65b8fj
Time for 512×512 on a 3090: 3 minutes 02 seconds
Maximum resolution on a 24 GB 3090: 768×768 or 1120×720
Description: Modified and updated version of the previous “MSE Regulized VQGAN+CLIP” script. Less likely to suffer the previous script’s issue of subjects floating in a purple void.

'a bronze sculpture of a planet' MSE Regulized Modified Text-to-Image
a bronze sculpture of a planet

'a cave by Asher Brown Durand' MSE Regulized Modified Text-to-Image
a cave by Asher Brown Durand

'a charcoal drawing of Emporer Palpatine' MSE Regulized Modified Text-to-Image
a charcoal drawing of Emporer Palpatine

'a cozy den' MSE Regulized Modified Text-to-Image
a cozy den

'a detailed drawing of a heart made of string by William MacTaggart' MSE Regulized Modified Text-to-Image
a detailed drawing of a heart made of string by William MacTaggart

'a digital rendering of Arnold Schwarzenegger made of metal by Muriel Brandt' MSE Regulized Modified Text-to-Image
a digital rendering of Arnold Schwarzenegger made of metal by Muriel Brandt

'a lounge room' MSE Regulized Modified Text-to-Image
a lounge room

'a palace by Jules Joseph Lefebvre' MSE Regulized Modified Text-to-Image
a palace by Jules Joseph Lefebvre

'an oil on canvas painting of a lush rainforest' MSE Regulized Modified Text-to-Image
an oil on canvas painting of a lush rainforest

'an oil on canvas painting of Cookie Monster' MSE Regulized Modified Text-to-Image
an oil on canvas painting of Cookie Monster


Name: Pixray
Author: dribnet
Original script: https://colab.research.google.com/github/dribnet/clipit/blob/master/demos/Start_Here.ipynb
Time for 512×512 on a 3090: 1 minutes 44 seconds
Maximum resolution on a 24 GB 3090: 768×768 or 1120×720
Description: Updated version of the previous “CLIPIT” script.

'a bronze sculpture of a nightmare creature' Pixray Text-to-Image
a bronze sculpture of a nightmare creature

'a fire breathing dragon by Jan Baptist Weenix' Pixray Text-to-Image
a fire breathing dragon by Jan Baptist Weenix

'a morning landscape' Pixray Text-to-Image
a morning landscape

'a surrealist sculpture of an elephant' Pixray Text-to-Image
a surrealist sculpture of an elephant

'a watercolor painting of an astronaut' Pixray Text-to-Image
a watercolor painting of an astronaut

'an oil painting of a worried woman | Rendered in Cinema4D' Pixray Text-to-Image
an oil painting of a worried woman | Rendered in Cinema4D

'an ugly creature' Pixray Text-to-Image
an ugly creature

'Dracula' Pixray Text-to-Image
Dracula

'Frankenstein' Pixray Text-to-Image
Frankenstein

'vector art of a forest clearing' Pixray Text-to-Image
vector art of a forest clearing


Name: CLIP Guided Diffusion v6
Author: Dango233
Original script: https://colab.research.google.com/drive/14xBm1aSxQLbq26-jmDJi8I1HJ4ti5ybt
Time for 512×512 on a 3090: 3 minutes 10 seconds
Maximum resolution on a 24 GB 3090: Locked to 512×512
Description: Latest CLIP Guided Diffusion script. The best one yet. Capable of some very nice results.

'a hyperrealistic painting of a human' CLIP Guided Diffusion v6 Text-to-Image
a hyperrealistic painting of a human

'a sketch of planets' CLIP Guided Diffusion v6 Text-to-Image
a sketch of planets

'a storybook illustration of a cloudy sunset' CLIP Guided Diffusion v6 Text-to-Image
a storybook illustration of a cloudy sunset

'a wizard | vivid colors' CLIP Guided Diffusion v6 Text-to-Image
a wizard | vivid colors

'an art deco sculpture of a planet' CLIP Guided Diffusion v6 Text-to-Image
an art deco sculpture of a planet

'an attractive man by John Linnell' CLIP Guided Diffusion v6 Text-to-Image
an attractive man by John Linnell

'an oil on canvas painting of satan' CLIP Guided Diffusion v6 Text-to-Image
an oil on canvas painting of satan

'an oil painting of a clown' CLIP Guided Diffusion v6 Text-to-Image
an oil painting of a clown

'digital art of an ugly person by Avigdor Arikha' CLIP Guided Diffusion v6 Text-to-Image
digital art of an ugly person by Avigdor Arikha

'princess in sanctuary trending on artstation photorealistic portrait of a young princess' CLIP Guided Diffusion v6 Text-to-Image
princess in sanctuary trending on artstation photorealistic portrait of a young princess


Name: CLIPDraw
Author: Kevin Frans
Original script: https://colab.research.google.com/github/kvfrans/clipdraw/blob/main/clipdraw.ipynb
Time for 512×512 on a 3090: 7 minutes 10 seconds
Maximum resolution on a 24 GB 3090: Huge. 4096×4096 and beyond.
Description: Generates images by a series of lines. Very abstract results.

'a cloudy sunset' CLIPDraw Text-to-Image
a cloudy sunset

'a digital painting of a rose' CLIPDraw Text-to-Image
a digital painting of a rose

'a sad clown' CLIPDraw Text-to-Image
a sad clown

'an abstract painting of Yoda' CLIPDraw Text-to-Image
an abstract painting of Yoda

'an etching of a library' CLIPDraw Text-to-Image
an etching of a library

'The Sydney Harbour Bridge' CLIPDraw Text-to-Image
The Sydney Harbour Bridge



Any Others I Missed?

Do you know of any other colabs and/or github Text-to-Image systems I have missed? Let me know and I will see if I can convert them to work with Visions of Chaos for a future release. If you know of any public Discords with other colabs being shared let me know too.

Jason.

Text-to-Image Summary – Part 2

This is Part 2. There is also Part 1, Part 3, Part 4 and Part 5.

This post continues listing the Text-to-Image scripts included with Visions of Chaos and some example outputs from each script.


Name: VQGAN Gumbel
Author: Eleiber
Original script: https://colab.research.google.com/drive/1tim3xTsZXafK-A2rOUsevckdl4OitIiw
Time for 512×512 on a 3090: 3 minutes 27 seconds
Maximum resolution on a 24 GB 3090: 768×768 or 1120×480
Description: Variation using the gumbel-8192 model. Results are a bit rougher than others.

'a childs drawing of a space nebula' VQGAN Gumbel Text-to-Image
a childs drawing of a space nebula

'a movie monster in the style of Edvard Munch' VQGAN Gumbel Text-to-Image
a movie monster in the style of Edvard Munch

'a raytraced image of the Amazon Rainforest' VQGAN Gumbel Text-to-Image
a raytraced image of the Amazon Rainforest

'a tropical beach in the style of Polock' VQGAN Gumbel Text-to-Image
a tropical beach in the style of Polock

'digital art of a rose' VQGAN Gumbel Text-to-Image
digital art of a rose


Name: OpenAI DVAE+CLIP
Author: Katherine Crowson
Original script: https://colab.research.google.com/drive/10DzGECHlEnL4oeqsN-FWCkIe_sq3wVqt
Time for 512×512 on a 3090: 3 minutes 07 seconds
Maximum resolution on a 24 GB 3090: 768×768 or 1120×480
Description: Results are very colorful and more abstract. By default it gives more noisy output images but this can be disabled if you prefer.

'a dragon' OpenAI DVAE+CLIP Text-to-Image
a dragon

'a hyperrealistic painting of planets' OpenAI DVAE+CLIP Text-to-Image
a hyperrealistic painting of planets

'a mountain cabin' OpenAI DVAE+CLIP Text-to-Image
a mountain cabin

'a woodcut of a mountain range in the style of Marvel Comics' OpenAI DVAE+CLIP Text-to-Image
a woodcut of a mountain range in the style of Marvel Comics

'an angry person' OpenAI DVAE+CLIP Text-to-Image
an angry person


Name: Aphantasia
Author: Vadim Epstein
Original script: https://github.com/eps696/aphantasia
Time for 512×512 on a 3090: 1 minute 5 seconds
Maximum resolution on a 24 GB 3090: 4096×4096 or 2520×1080
Description: Different and more messy pastel abstract “Turneresque” output. I spent a few hours trying many different combinations of settings trying to get the output more coherent and deeper colors. The following samples are as good as I could push it. I give up for now. If you can do better let me know. It does support creating larger 1280×720 resolution images on a 3090 GPU.

'a marble sculpture of a computer' Aphantasia Text-to-Image
a marble sculpture of a computer

'an eyeball' Aphantasia Text-to-Image
an eyeball

'an octopus' Aphantasia Text-to-Image
an octopus

'digital art of frogs in the style of Dr Seuss' Aphantasia Text-to-Image
digital art of frogs in the style of Dr Seuss

'medusa' Aphantasia Text-to-Image
medusa


Name: Text2Image VQGAN
Author: Vadim Epstein
Original script: https://colab.research.google.com/github/eps696/aphantasia/blob/master/CLIP_VQGAN.ipynb
Time for 512×512 on a 3090: 2 minutes 8 seconds
Maximum resolution on a 24 GB 3090: 768×768 or 1120×480
Description: Allows larger sized 480p images (854×480) on a 3090 GPU.

'a digital painting of the Las Vegas strip' Text2Image VQGAN Text-to-Image
a digital painting of the Las Vegas strip

'a midnineteenth century engraving of a cute monster' Text2Image VQGAN Text-to-Image
a midnineteenth century engraving of a cute monster

'a skeleton' Text2Image VQGAN Text-to-Image
a skeleton

'an ultrafine detailed painting of a crying person' Text2Image VQGAN Text-to-Image
an ultrafine detailed painting of a crying person

'puppies' Text2Image VQGAN Text-to-Image
puppies


Name: MSE VQGAN+CLIP z+quantize
Author: jbusted
Original script: https://colab.research.google.com/drive/1gFn9u3oPOgsNzJWEFmdK-N9h_y65b8fj
Time for 512×512 on a 3090: 6 minutes 19 seconds
Maximum resolution on a 24 GB 3090: 768×768 or 1120×480
Description: Awesome crisp results. Allows larger sized 480p images (854×480) on a 3090 GPU. One of the best scripts in this list worth exploring.

'a charcoal drawing of a country town' MSE VQGAN+CLIP z+quantize Text-to-Image
a charcoal drawing of a country town

'a hyperrealistic painting of an ugly creature' MSE VQGAN+CLIP z+quantize Text-to-Image
a hyperrealistic painting of an ugly creature

'a landscape made of mist' MSE VQGAN+CLIP z+quantize Text-to-Image
a landscape made of mist

'a mosaic of christmas' MSE VQGAN+CLIP z+quantize Text-to-Image
a mosaic of christmas

'an octopus in the style of Vincent van Gogh' MSE VQGAN+CLIP z+quantize Text-to-Image
an octopus in the style of Vincent van Gogh

MSE VQGAN+CLIP z+quantize allows specifying an image as the input starting point. If you take the output and repeatedly use it as the input with some minor image stretching each frame you can get a movie zooming into the Text-to-Image output. No blending of frames or optical flow for this one, just straight combining of the 854×480 resolution frames into a movie. The VQGAN model was “vqgan_imagenet_f16_16384” and the CLIP model was “ViT-B/32”. The prompts for this movie were “hyperrealistic homer simpson”, “hyperrealistic marge simpson”, “hyperrealistic bart simpson”, “hyperrealistic lisa simpson” and “hyperrealistic maggie simpson”. The original 480p upload was badly compressed and looked terrible after YouTube compressed it, so I upscaled the 480p to 2160p (4K) in DaVinci Resolve and reuploaded to YouTube. This caused their compression to do a better encoding job so the movie is now watchable.

This next example is how MSE VQGAN+CLIP z+quantize interprets various common human phobias. Text prompts were “a hyperrealistic painting depicting acrophobia” etc. To try and smooth out the “flickering” when zooming I started using ImageMagick for zooming. ImageMagick allows sub pixel image resizing options. This movie was also originally 480p and upsized to 4K in Davinci Resolve before uploading.

I have also added some basic scripting (as in automating a series of steps rather than a Python py script) support to Visions of Chaos. Scripting allows the prompt, zoom speed, rotation and panning to be changed during the movie with smooth interpolations between them each frame.

Text-to-Image Script GUI

The following video is a test of the scripting. This video is a Powers of Ten homage with zooming in from the largest scales to the smallest scales.

Another recent addition is the ability to use a series of images as “seed images” that are processed one at a time and then combined into a movie. The following GIF of the Alien chestburster scene is an example of this. The Text-to-Image prompt was “impasto oil painting”.

This next example movie is showing a “Self-Driven” zoom movie. As in a regular zoom movie the output frames are slightly stretched and fed back into the system each frame. The self-driven difference with this movie is that the Text-to-Image prompt text is automatically changed every 2 seconds by CLIP detecting what it “sees” in the current frame. This way the movie subjects are automatically changed and steered in new directions in a totally automated way. There is no human control except me setting the initial “Rainbow colored blobs” prompt. After that it was fully automated.

By default the CLIP Image Captioning script is very good at detecting what is in an image. Using the default accuracy resulted in a zoom movie that got stuck with a single topic or subject. One got stuck on a slight variation of a prompt dealing with kites, so as the zoom movie went deeper it only showed kites. Luckily after tweaking and decreasing the accuracy of the CLIP captioning the predicitons allow the resulting subjects to drift to new topics during the movie.


Name: Monster Maker
Author: P_Hoep
Original script: https://colab.research.google.com/drive/1ZbLnt5fLS_BDfpQY-9Dh_T40pLjfqSAC
Time for 512×512 on a 3090: 2 minutes 01 seconds
Description: No longer available. I was contacted by the author who does not want it shared publicly. The colab link no longer works.

'a black and white photo of a library in the style of Rembrandt' Monster Maker Text-to-Image
a black and white photo of a library in the style of Rembrandt

'a forest fire' Monster Maker Text-to-Image
a forest fire

'a forest path' Monster Maker Text-to-Image
a forest path

'a heart made of feathers' Monster Maker Text-to-Image
a heart made of feathers

'a surrealist painting of the Las Vegas strip' Monster Maker Text-to-Image
a surrealist painting of the Las Vegas strip


Name: CLIP Guided Diffusion
Author: Katherine Crowson
Original script: https://colab.research.google.com/drive/12a_Wrfi2_gwwAuN3VvMTwVMz9TfqctNj
Time for 256×256 on a 3090: 1 minutes 35 seconds
Maximum resolution on a 24 GB 3090: Locked to 256×256
Description: This one gives very unique results compared to the other scripts. Locked to 256×256 resolution. Some of the results can be very detailed and interesting, but a lot of time it is hit and miss to get a result that reliably matches the input phrase. The following samples came hand picked from a large batch run of random phrases.

'a clown' CLIP Guided Diffusion Text-to-Image
a clown

'a hyperrealistic painting of a witch' CLIP Guided Diffusion Text-to-Image
a hyperrealistic painting of a witch

'a sea monster' CLIP Guided Diffusion Text-to-Image
a sea monster

'a surrealist sculpture of an android' CLIP Guided Diffusion Text-to-Image
a surrealist sculpture of an android

'Brad Pitt' CLIP Guided Diffusion Text-to-Image
Brad Pitt

'New York City' CLIP Guided Diffusion Text-to-Image
New York City


Name: CLIP Guided Diffusion v2
Author: afiaka87
Original script: https://colab.research.google.com/github/afiaka87/clip-guided-diffusion/blob/main/colab_clip_guided_diff_hq.ipynb
Time for 256×256 on a 3090: 2 minutes 38 seconds
Maximum resolution on a 24 GB 3090: Locked to 256×256
Description: Modified CLIP Guided Diffusion with more options. This one gives very unique results compared to the other scripts. Locked to 256×256 resolution. Hopefully larger resolution versions of this script will appear in the future. Some of the results can be very detailed and interesting, but a lot of time it is hit and miss to get a result that reliably matches the input phrase. The following samples came hand picked from a large batch run of random phrases.

'a digital painting of a crying person' CLIP Guided Diffusion v2 Text-to-Image
a digital painting of a crying person

'a fine art painting of heaven in the style of Edvard Munch' CLIP Guided Diffusion Text-to-Image
a fine art painting of heaven in the style of Edvard Munch

'a flemish baroque of an angry person' CLIP Guided Diffusion v2 Text-to-Image
a flemish baroque of an angry person

'a flemish baroque of hell' CLIP Guided Diffusion v2 Text-to-Image
a flemish baroque of hell

'a surrealist painting of a witch' CLIP Guided Diffusion v2 vText-to-Image
a surrealist painting of a witch

'the australian outback' CLIP Guided Diffusion v2 Text-to-Image
the australian outback


Name: CLIPRGB
Author: Jonathan Whitaker
Original script: https://colab.research.google.com/drive/1MiKaFFgau6V5QhIed5tpNdLUiSbof4nI
Time for 512×512 on a 3090: 4 minutes 51 seconds
Maximum resolution on a 24 GB 3090: 4096×4096
Description: Very early 0.1 version shows a lot of potential. Can render huge resolution images up to 4096×4096 on a 3090 so I am really looking forward to future versions of this code with sharper details.

'a digital painting of a wizard' CLIPRGB
a digital painting of a wizard

'a forest path' CLIPRGB
a forest path

'a tattoo of planets' CLIPRGB
a tattoo of planets

'a vampire' CLIPRGB
a vampire


Name: CLIP Guided Diffusion v3
Author: Michael Friesen
Original script: https://colab.research.google.com/drive/1Fl2SZvLv23MVSAHxkoiNdxPeAZwibvu1
Time for 512×512 on a 3090: 2 minutes 23 seconds
Maximum resolution on a 24 GB 3090: Locked to 512×512
Description: Modified CLIP Guided Diffusion that generates larger 512×512 images. Some of the results can be very detailed and interesting, but a lot of time it is hit and miss to get a result that reliably matches the input phrase. The following samples came hand picked from a large batch run of random phrases.

'a cubist painting of a castle' CLIP Guided Diffusion v2 Text-to-Image
a cubist painting of a castle

'a human made of vines' CLIP Guided Diffusion Text-to-Image
a human made of vines

'a rough seascape' CLIP Guided Diffusion v2 Text-to-Image
a rough seascape

'frogs' CLIP Guided Diffusion v2 Text-to-Image
frogs

'h r giger' CLIP Guided Diffusion v2 Text-to-Image
h r giger

'a matte painting of a landscape' CLIP Guided Diffusion v2 Text-to-Image
a matte painting of a landscape


Name: Zoetrope 5
Author: Bearsharktopusdev
Original script: https://colab.research.google.com/drive/1LpEbICv1mmta7Qqic1IcRTsRsq7UKRHM
Time for 512×512 on a 3090: 2 minutes 36 seconds
Maximum resolution on a 24 GB 3090: 768×768 or 1280×720
Description: Nice crisp results. Can generates up to 720p (1280×720) resolution images on a 3090. Includes a lot of new ideas from multiple people to help improve the outputs.

'a detailed painting of a Pixar character' Zoetrope 5 Text-to-Image
a detailed painting of a Pixar character

'a futuristic city' Zoetrope 5 Text-to-Image
a futuristic city

'a planet' Zoetrope 5 Text-to-Image
a planet

'a surrealist sculpture of a sea monster' Zoetrope 5 Text-to-Image
a surrealist sculpture of a sea monster

'an art deco scultpture of a policeman' Zoetrope 5 Text-to-Image
an art deco scultpture of a policeman

'cyberpunk art of a forest fire in the style of Edvard Munch' Zoetrope 5 Text-to-Image
cyberpunk art of a forest fire in the style of Edvard Munch


Name: CLIP RGB Optimization
Author: hotgrits
Original script: https://cdn.discordapp.com/attachments/730484623028519072/871624258260987934/CLIP__RGB_Optimization_v0_3.ipynb
Time for 512×512 on a 3090: 2 minutes 50 seconds
Maximum resolution on a 24 GB 3090: 4096×4096
Description: Another CLIP RGB based script without the pixelated artefacts of the CLIPRGB script. Can render huge resolution images up to 4096×4096 on a 3090. This script gives more impressionistic textures. By default the output was a bit too dark for my liking so I have added options to tweak the gamma and contrast of the output images in the script. The gamma and contrast tweaks are only at the display stage and do not change the internal image being generated.

'a babbling brook' CLIP RGB Optimization
a babbling brook

'a movie monster' CLIP RGB Optimization
a movie monster

'an amusement park' CLIP RGB Optimization
an amusement park

'Chewbacca' CLIP RGB Optimization
Chewbacca

'Freddy Kruger in the style of Rembrandt' CLIP RGB Optimization
Freddy Kruger in the style of Rembrandt


Name: MSE Regulized VQGAN+CLIP
Author: jbusted
Original script: https://colab.research.google.com/drive/1hf1seGOZctOJUznkhJNblLluXHbWLKZh
Time for 512×512 on a 3090: 3 minutes 16 seconds
Maximum resolution on a 24 GB 3090: 768×768 or 1120×480
Description: Generates good images but they tend to be inside a grey/purple border void.

'a bronze sculpture of a heart' MSE Regulized VQGAN+CLIP
a bronze sculpture of a heart

'a cubist painting of Buzz Lightyear' MSE Regulized VQGAN+CLIP
a cubist painting of Buzz Lightyear

'a house made of string' MSE Regulized VQGAN+CLIP
a house made of string

'an art deco sculpture of a vampire' MSE Regulized VQGAN+CLIP
an art deco sculpture of a vampire

'chalk art of C-3PO' MSE Regulized VQGAN+CLIP
chalk art of C-3PO


Name: Sequential VQGAN+CLIP
Author: Jakeukalane and Avengium
Original script: https://colab.research.google.com/drive/1CcibxlLDng2yzcjLwwwSADRcisc1qVCs
Time for 512×512 on a 3090: 1 minutes 41 seconds
Maximum resolution on a 24 GB 3090: 768×768 or 1120×480
Description: Really nice results and fast.

'a campfire in the style of Vincent van Gogh' Sequential VQGAN+CLIP
a campfire in the style of Vincent van Gogh

'a colorful parrot' Sequential VQGAN+CLIP
a colorful parrot

'a hyperrealistic painting of C-3PO' Sequential VQGAN+CLIP
a hyperrealistic painting of C-3PO

'an impressionist painting of Buzz Lightyear made of paper' Sequential VQGAN+CLIP
an impressionist painting of Buzz Lightyear made of paper

'New York City' Sequential VQGAN+CLIP
New York City


Name: CLIPRGB ImStack
Author: Jonathan Whitaker
Original script: https://colab.research.google.com/drive/1MCC2IwAaRNCTBUzghuG41ypAkxjJvGtq
Time for 512×512 on a 3090: 2 minutes 07 seconds
Maximum resolution on a 24 GB 3090: 2048×2048
Description: Another CLIP RGB variation. Nice results after some brightness, contrast and sharpness tweaks to the generated images. Could still be a bit sharper.

'a fine art painting of an angry person' CLIPRGB ImStack
a fine art painting of an angry person

'a fireplace in the style of Claude Monet' CLIPRGB ImStack
a fireplace in the style of Claude Monet

'a frog in the style of Beksinski' CLIPRGB ImStack
a frog in the style of Beksinski

'a nightmare creature in the style of H R Giger' CLIPRGB ImStack
a nightmare creature in the style of H R Giger

'a pointalism painting of a vampire made of copper' CLIPRGB ImStack
a pointalism painting of a vampire made of copper



Any Others I Missed?

Do you know of any other colabs and/or github Text-to-Image systems I have missed? Let me know and I will see if I can convert them to work with Visions of Chaos for a future release. If you know of any public Discords with other colabs being shared let me know too.

Jason.

Text-to-Image Summary – Part 1

This is Part 1. There is also Part 2, Part 3, Part 4 and Part 5.

What Are Text-to-Image Systems

Text-to-Image systems/models/scripts/networks (what is the official correct term for these?) are machine learning based models that take a descriptive phrase as input and attempt to generate images that match the input phrase.

Requirements

You do need a decent NVIDIA GPU. 3090 recommended for 768×768 resolution, 2080 for smaller 256×256 images, 10xx possibly for tiny images or if you want to try reduced settings and wait ages for results. If you have a commercial grade GPU with more memory you will be able to push these resolutions higher. VRAM matters more than GPU model, ie you can get 3090s with only 16GB of VRAM and others with 24GB. You may see a laptop with an advertised 3080 GPU, but the total VRAM will likely be much smaller than a desktop 3080.

To run these scripts from Visions of Chaos you need to have installed these prerequisites. Once you get all the prerequisites setup it really is as simple as typing your prompt text and clicking a button. I do include a lot of other settings so you can tweak the script parameters as you do more experimentation.

Text-to-Image GUI

Visions of Chaos Text-to-Image Tutorial

You can watch the following tutorial video to get an idea of how the Text-to-Image mode works in Visions of Chaos.

Text-to-Image Scripts Included With Visions of Chaos

The rest of this blog post (and other parts) lists the 65 (so far) Text-to-Image scripts that I have been able to get working with Visions of Chaos.

If you are the author of one of these scripts then many thanks to you for sharing the code publicly. If you are a creator of a script I do not include here, please leave a comment with a link or send me an email so I can try it out. If you are a better coder than I am and improve any of these also let me know and I will share your fixes with the world.

I have included sample image outputs from each script. Most of the text prompts for these samples come from a prompt builder I include with Visions of Chaos that randomly combines subjects, adjectives, styles and artists.

Note also that these samples all use the default settings for GAN and CLIP models. Most of the included scripts allow tweaking of settings and different models to alter the outputs. There is a much wider range of output images possible. Download Visions of Chaos to experiment with all the combinations of scripts, models, prompts and settings.


Name: Deep Daze
Author: Phil Wang
Original script: https://github.com/lucidrains/deep-daze
Time for 512×512 on a 3090: 1 minutes 53 seconds.
Maximum resolution on a 24 GB 3090: 1024×1024
Description: This was the first Text-to-Image script I ever found and tested. The output images from the original script are very washed out and pastel shaded, but after adding some torchvision transforms for brightness, contrast and sharpness tweaks they are a little better. Very abstract output compared to the other scripts.

'a bronze sculpture of a colorful parrot in the style of Kandinsky' Deep Daze Text-to-Image
a bronze sculpture of a colorful parrot in the style of Kandinsky

'a crying person' Deep Daze Text-to-Image
a crying person

'a desert oasis' Deep Daze Text-to-Image
a desert oasis

'a surrealist painting of the Terminator made of silver' Deep Daze Text-to-Image
a surrealist painting of the Terminator made of silver

'a zombie in the style of Turner' Deep Daze Text-to-Image
a zombie in the style of Turner


Name: Big Sleep
Author: Phil Wang
Original script: https://github.com/lucidrains/big-sleep
Time for 512×512 on a 3090: 4 minutes 0 seconds
Maximum resolution on a 24 GB 3090: 512×512
Description: Can give a good variety of images for any prompt text and does not suffer from the coloring or tiled image issues some of the other methods do. See here for my older post with a lot of Big Sleep examples. If you give it a chance and run repeated batches of the same prompt you can get some very nice results.

'H R Giger' Big Sleep Text-to-Image
H R Giger

'surrealism' Big Sleep Text-to-Image
surrealism

'colorful surrealism' Big Sleep Text-to-Image
colorful surrealism

'a charcoal drawing of a landscape' Big Sleep Text-to-Image
a charcoal drawing of a landscape


Name: VQGAN+CLIP z-quantize
Author: Katherine Crowson
Original script: https://colab.research.google.com/drive/1L8oL-vLJXVcRzCFbPwOoMkPKJ8-aYdPN
Time for 512×512 on a 3090: 2 minutes 28 seconds
Maximum resolution on a 24 GB 3090: 768×768 or 1120×480
Description: The outputs tend to be divided up into rectangular regions, but the resulting imagery can be interesting.

'a drawing of a bouquet of flowers made of cardboard' VQGAN+CLIP z-quantize Text-to-Image
a drawing of a bouquet of flowers made of cardboard

'a rose made of silver' VQGAN+CLIP z-quantize Text-to-Image
a rose made of silver

'a tilt shift photo of traffic' VQGAN+CLIP z-quantize Text-to-Image
a tilt shift photo of traffic

'an abstract painting of a house made of crystals' VQGAN+CLIP z-quantize Text-to-Image
an abstract painting of a house made of crystals

'an abstract painting of a skull' VQGAN+CLIP z-quantize Text-to-Image
an abstract painting of a skull

VQGAN+CLIP z-quantize allows specifying an image as the input starting point. If you take the output, stretch it very slightly, and then feed it back into the system each frame you get a movie zooming in. For this movie I used SRCNN Super Resolution to double the resolution of the frames and then Super Slo-Mo for optical flow frame interpolation (both SRCNN and Super Slo-Mo are included with Visions of Chaos). The VQGAN model was “vqgan_imagenet_f16_16384” and the CLIP model was “ViT-B/32”. The prompts were the seven deadly sins, ie “a watercolor painting depicting pride”, “a watercolor painting depicting greed” etc.

The more astute viewers among you will notice there are only 6 of the sins in the previous video. What happened to “lust”? A while back one of my uploads was flagged as porn by the YouTube robots. Their (what I assume is) machine learning based system detected my upload as porn when there was no porn in it. An appeal was met with instant denial and so I now have a permanent “warning” on my channel with no way to talk to a person who could spend 1 minute looking at the video to tell it isn’t porn. Another warning would lead to a strike, so I am being overly cautious and omitting the lust part from the YouTube video. Those who want to see the full 7 part movie can click the following link to watch it on my LBRY channel.

https://open.lbry.com/@Softology:5/Seven-Deadly-Sins:6

Thanks LBRY!


Name: VQGAN+CLIP codebook
Author: Katherine Crowson
Original script: https://colab.research.google.com/drive/15UwYDsnNeldJFHJ9NdgYBYeo6xPmSelP
Time for 512×512 on a 3090: 3 minutes 19 seconds
Maximum resolution on a 24 GB 3090: 768×768 or 1120×480
Description: VQGAN-CLIP codebook seem to give very similar images for the same prompt phrase, so repeatedly running the script (with different seed values) does not give a wide variety of resulting images. Still gives interesting results.

'a happy alien' VQGAN+CLIP codebook Text-to-Image
a happy alien

'a library' VQGAN+CLIP codebook Text-to-Image
a library

'a teddy bear' VQGAN+CLIP codebook Text-to-Image
a teddy bear

'digital art of a colorful parrot' VQGAN+CLIP codebook Text-to-Image
digital art of a colorful parrot

'digital art of an amusement park' VQGAN+CLIP codebook Text-to-Image
digital art of an amusement park


Name: Aleph2Image Gamma
Author: Ryan Murdock
Original script: https://colab.research.google.com/drive/1VAO22MNQekkrVq8ey2pCRznz4A0_jY29
Time for 512×512 on a 3090: 2 minutes 1 second
Maximum resolution on a 24 GB 3090: Locked to 512×512
Description: This one seems to evolve white blotches that grow and take over the entire image. Before the white out stage the images tend to have too much contrast.

'H R Giger' Aleph2Image Gamma Text-to-Image
H R Giger

'surrealism' Aleph2Image Gamma Text-to-Image
surrealism

'seascape painting' Aleph2Image Gamma Text-to-Image
seascape painting


Name: Aleph2Image Delta
Author: Ryan Murdock
Original script: https://colab.research.google.com/drive/1oA1fZP7N1uPBxwbGIvOEXbTsq2ORa9vb
Time for 512×512 on a 3090: 2 minutes 1 second
Maximum resolution on a 24 GB 3090: Locked to 512×512
Description: A newer revision of Aleph2Image that doesn’t have the white out issues. The resulting images have much more vibrant colors and that may be a good or bad point depending on your preferences.

'a sketch of an angry person' Aleph2Image Delta Text-to-Image
a sketch of an angry person

'a spooky forest' Aleph2Image Delta Text-to-Image
a spooky forest

'a sunset in the style of Rembrandt' Aleph2Image Delta Text-to-Image
a sunset in the style of Rembrandt

'a surrealist painting of a forest path' Aleph2Image Delta Text-to-Image
a surrealist painting of a forest path

'a tropical beach' Aleph2Image Delta Text-to-Image
a tropical beach


Name: Aleph2Image Delta v2
Author: Ryan Murdock
Original script: https://colab.research.google.com/drive/1NGM9L8qP0gwl5z5GAuB_bd0wTNsxqclG
Time for 512×512 on a 3090: 3 minutes 42 seconds
Maximum resolution on a 24 GB 3090: Locked to 512×512
Description: A newer revision of Aleph2Image Delta that gives much sharper results. The resulting images tend to be similar to each other for each prompt text so not a lot of variety.

'a cartoon of love in the style of Claude Monet' Aleph2Image Delta v2 Text-to-Image
a cartoon of love in the style of Claude Monet

'a detailed painting of a rose' Aleph2Image Delta v2 Text-to-Image
a detailed painting of a rose

'a drawing of a volcano' Aleph2Image v2 Delta Text-to-Image
a drawing of a volcano

'a house' Aleph2Image v2 Delta Text-to-Image
a house

'a submarine' Aleph2Image v2 Delta Text-to-Image
a submarine


Name: Deep Daze Fourier
Author: Vadim Epstein
Original script: https://colab.research.google.com/gist/afiaka87/e018dfa86d8a716662d30c543ce1b78e/text2image-siren.ipynb
Time for 512×512 on a 3090: 4 minutes 54 seconds
Maximum resolution on a 24 GB 3090: 512×512 or 640×360
Description: Creates more collaged images with sharp, crisp bright colors.

'a pencil sketch of a vampire made of bones' Deep Daze Fourier Text-to-Image
a pencil sketch of a vampire made of bones

'H R Giger' Deep Daze Fourier Text-to-Image
H R Giger

'medusa made of wood' Deep Daze Fourier Text-to-Image
medusa made of wood

'Shrek eating pizza' Deep Daze Fourier Text-to-Image
Shrek eating pizza

'surrealist Homer Simpson' Deep Daze Fourier Text-to-Image
surrealist Homer Simpson


Name: Text2Image v2
Author: Denis Malimonov
Original script: https://colab.research.google.com/github/tg-bomze/collection-of-notebooks/blob/master/Text2Image_v2.ipynb
Time for 512×512 on a 3090: 1 minute 48 seconds
Maximum resolution on a 24 GB 3090: Locked to 512×512
Description: Can give more abstract results of the input phrase. Colors and details can be sharp, but not always. Good variety of output for each input phrase. Definitely worth a try.

'a fireplace made of voxels' Text2Image v2 Text-to-Image
a fireplace made of voxels

'a green tree frog in the style of M C Escher' Text2Image v2 Text-to-Image
a green tree frog in the style of M C Escher

'a pencil sketch of an evil alien' Text2Image v2 Text-to-Image
a pencil sketch of an evil alien

'a sea monster' Text2Image v2 Text-to-Image
a sea monster

'The Incredible Hulk made of silver' Text2Image v2 Text-to-Image
The Incredible Hulk made of silver


Name: The Big Sleep Customized
Author: NMKD
Original script: https://colab.research.google.com/drive/1Q2DIeMqYm_Sc5mlurnnurMMVqlgXpZNO
Time for 512×512 on a 3090: 1 minute 45 seconds
Maximum resolution on a 24 GB 3090: Locked to 512×512
Description: Another good one. Worth exploring further.

'a forest path' The Big Sleep Customized Text-to-Image
a forest path

'a watercolor painting of a colorful parrot in the style of Kandinsky' The Big Sleep Customized Text-to-Image
a watercolor painting of a colorful parrot in the style of Kandinsky

'a western town' The Big Sleep Customized Text-to-Image
a western town

'Christmas' The Big Sleep Customized Text-to-Image
Christmas

'medusa made of vines' The Big Sleep Customized Text-to-Image
medusa made of vines


Name: Big Sleep Minmax
Author: @!goose
Original script: https://colab.research.google.com/drive/12CnlS6lRGtieWujXs3GQ_OlghmFyl8ch
Time for 512×512 on a 3090: 1 minute 45 seconds
Maximum resolution on a 24 GB 3090: Locked to 512×512
Description: Another interesting Big Sleep variation. Allows a second phrase to be specified that is minimized in the output. For example if your prompt for a landscape painting has too many clouds you could specify clouds as the minimize prompt so the system outputs less clouds in the resulting image.

'a charcoal drawing of an eyeball' Big Sleep Minmax Text-to-Image
a charcoal drawing of an eyeball

'an ultrafine detailed painting of a crying person made of voxels' Big Sleep Minmax Text-to-Image
an ultrafine detailed painting of a crying person made of voxels

'dense woodland' Big Sleep Minmax Text-to-Image
dense woodland

'King Kong made of wrought iron in the style of Frida Kahlo' Big Sleep Minmax Text-to-Image
King Kong made of wrought iron in the style of Frida Kahlo

'Michael Myers' Big Sleep Minmax Text-to-Image
Michael Myers


Name: CLIP Pseudo Slime Mold
Author: hotgrits
Original script: https://discord.com/channels/729741769192767510/730484623028519072/850857930881892372
Time for 512×512 on a 3090: 2 minutes 57 seconds
Maximum resolution on a 24 GB 3090: Locked to 512×512
Description: This one gives unique output compared to the others. Really nicely defined sharp details. The colors come from any color palette you select (currently all the 3,479 palettes within Visions of Chaos can be used) so you can “tint” the resulting images with color shades you prefer.

'H R Giger' CLIP Pseudo Slime Mold Text-to-Image
H R Giger

'H R Giger' CLIP Pseudo Slime Mold Text-to-Image
H R Giger with a different color palette

'Shrek eating pizza' CLIP Pseudo Slime Mold Text-to-Image
Shrek eating pizza

'seascape painting' CLIP Pseudo Slime Mold Text-to-Image
seascape painting


Name: Aleph2Image Dall-E Remake
Author: Daniel Russell
Original script: https://colab.research.google.com/drive/17ZSyxCyHUnwI1BgZG22-UFOtCWFvqQjy
Time for 512×512 on a 3090: 3 minutes 42 seconds
Maximum resolution on a 24 GB 3090: 768×768
Description: Another Aleph2Image variant.

'a color pencil sketch of Jason Vorhees made of plastic' Aleph2Image Dall-E Remake Text-to-Image
a color pencil sketch of Jason Vorhees made of plastic

'a cubist painting of a science laboratory' Aleph2Image Dall-E Remake Text-to-Image
a cubist painting of a science laboratory

'a green tree frog in the style of Kandinsky' Aleph2Image Dall-E Remake Text-to-Image
a green tree frog in the style of Kandinsky

'a watercolor painting of Godzilla' Aleph2Image Dall-E Remake Text-to-Image
a watercolor painting of Godzilla

'an octopus' Aleph2Image Dall-E Remake Text-to-Image
an octopus


Name: VQGAN+CLIP v3
Author: Eleiber
Original script: https://colab.research.google.com/drive/1go6YwMFe5MX6XM9tv-cnQiSTU50N9EeT
Time for 512×512 on a 3090: 2 minutes 52 seconds
Maximum resolution on a 24 GB 3090: 768×768 or 1120×480
Description: “v3” because it is the third VQGAN system I have tried and it didn’t have a unique specific name. Gives clear sharp images. Can give very painterly results with visible brush strokes if you use “a painting of” before the prompt subject.

'a pencil sketch of a campfire in the style of Da Vinci' VQGAN+CLIP v3 Text-to-Image
a pencil sketch of a campfire in the style of Da Vinci

'a pop art painting of a lush rainforest' VQGAN+CLIP v3 Text-to-Image
a pop art painting of a lush rainforest

'a storybook illustration of a cityscape' VQGAN+CLIP v3 Text-to-Image
a storybook illustration of a cityscape

'an airbrush painting of frogs' VQGAN+CLIP v3 Text-to-Image
an airbrush painting of frogs

'the Amazon Rainforest' VQGAN+CLIP v3 Text-to-Image
the Amazon Rainforest

VQGAN+CLIP v3 allows specifying an image as the input starting point. If you take the output and repeatedly use it as the input with some minor image stretching each frame you can get a movie zooming into the Text-to-Image output. For this movie I used SRCNN Super Resolution to double the resolution of the frames and then Super Slo-Mo for optical flow frame interpolation (both SRCNN and Super Slo-Mo are included with Visions of Chaos). The VQGAN model was “vqgan_imagenet_f16_16384” and the CLIP model was “ViT-B/32”.

This next example movie is showing a “Self-Driven” zoom movie. As in a regular zoom movie the output frames are slightly stretched and fed back into the system each frame. The self-driven difference with this movie is that the Text-to-Image prompt text is automatically changed every 2 seconds by CLIP detecting what it “sees” in the current frame. This way the movie subjects are automatically changed and steered in new directions in a totally automated way. There is no human control except me setting the initial “A landscape” prompt. After that it was fully automated.

By default the CLIP Image Captioning script is very good at detecting what is in an image. Using the default accuracy resulted in a zoom movie that got stuck with a single topic or subject. One got stuck on a slight variation of a prompt dealing with kites, so as the zoom movie went deeper it only showed kites. Luckily after tweaking and decreasing the accuracy of the CLIP captioning the predicitons allow the resulting subjects to drift to new topics during the movie.


Name: VQGAN+CLIP v4
Author: crimeacs
Original script: https://colab.research.google.com/drive/1ZAus_gn2RhTZWzOWUpPERNC0Q8OhZRTZ
Time for 512×512 on a 3090: 2 minutes 37 seconds
Maximum resolution on a 24 GB 3090: 768×768 or 1120×480
Description: Another improved VQGAN system utilizing pooling. “v4” because it is the forth VQGAN system I have tried and it didn’t have a unique specific name.

'a fine art painting of a cozy den' VQGAN+CLIP v4 Text-to-Image
a fine art painting of a cozy den

'a king in the style of Kandinsky' VQGAN+CLIP v4 Text-to-Image
a king in the style of Kandinsky

'a nurse in the style of Edward Hopper' VQGAN+CLIP v4 Text-to-Image
a nurse in the style of Edward Hopper

'a pastel of a demon' VQGAN+CLIP v4 Text-to-Image
a pastel of a demon

'a watercolor painting of a mountain path' VQGAN+CLIP v4 Text-to-Image
a watercolor painting of a mountain path

VQGAN+CLIP v4 allows specifying an image as the input starting point. If you take the output and repeatedly use it as the input with some minor image stretching each frame you can get a movie zooming into the Text-to-Image output. For this movie I used SRCNN Super Resolution to double the resolution of the frames and then Super Slo-Mo for optical flow frame interpolation (both SRCNN and Super Slo-Mo are included with Visions of Chaos). The VQGAN model was “vqgan_imagenet_f16_16384” and the CLIP model was “ViT-B/32”.

The text prompts for each part came from an idea in a YouTube comment to try more non-specific terms to see what happens, so here are the results of “an image of fear”, “an image of humanity”, “an image of knowledge”, “an image of love”, “an image of morality” and “an image of serenity”.

Here is another example. This time using the prompt of various directors, ie “Stanley Kubrick imagery”, “David Lynch imagery” etc. No super resolution this time. Super Slo-Mo was used for optical flow. I wasn’t sure if YouTube would accept the potentially unsettling horror visuals and I do not want to risk the hassle of a strike, so being on the safe side I am hosting this one on my LBRY channel only. Click the following image to open the movie in a new window. Note that LBRY can be a lot slower to buffer, so you may need to pause it for a while to let the movie load in.

Directors Text-to-Image

If you find that too slow to buffer/load I also have a copy on my BitChute channel here.



Any Others I Missed?

Do you know of any other colabs and/or github Text-to-Image systems I have missed? Let me know and I will see if I can convert them to work with Visions of Chaos for a future release. If you know of any public Discords with other colabs being shared let me know too.

Jason.

Deep Daze Fourier Text-to-Image

NOTE: Make sure you also see this post that has a summary of all the Text-to-Image scripts supported by Visions of Chaos with example images.

More Fascinating Text-to-Image

This time “Deep Daze Fourier” from Vadim Epstein. Code available in this notebook.

Compared to the last Deep Daze that generated washed out and pastel shaded results this Deep Daze creates images with sharp, crisp bright colors.

Sample results

“Shrek eating pizza”

Deep Daze Fourier - Shrek Eating Pizza

Deep Daze Fourier - Shrek Eating Pizza

Deep Daze Fourier - Shrek Eating Pizza

Deep Daze Fourier - Shrek Eating Pizza

“H R Giger”

Deep Daze Fourier - H R Giger

Deep Daze Fourier - H R Giger

Deep Daze Fourier - H R Giger

Deep Daze Fourier - H R Giger

“Freddy Krueger”

Deep Daze Fourier - Freddy Krueger

Deep Daze Fourier - Freddy Krueger

Deep Daze Fourier - Freddy Krueger

Deep Daze Fourier - Freddy Krueger

“Surrealist Homer Simpson”

Deep Daze Fourier - Surrealist Homer Simpson

Deep Daze Fourier - Surrealist Homer Simpson

Deep Daze Fourier - Surrealist Homer Simpson

Deep Daze Fourier - Surrealist Homer Simpson

“rose bush”

Deep Daze Fourier - Rose Bush

Deep Daze Fourier - Rose Bush

Deep Daze Fourier - Rose Bush

Deep Daze Fourier - Rose Bush

Availability

This and the previous Text-to-Image systems I have experimented with (here, here and here) are now supported by a GUI front end in Visions of Chaos. As long as you install these prerequisites and have a decent GPU you will be able to run these systems yourself.

Text-to-Image GUI

For those who love to tinker I have now added a bunch more of the script parameters so you no longer have to edit the Python source code outside Visions of Chaos.

Other Text-to-Image

If you know of any other Text-to-Image systems (with sharable open-source code) then please let me know. All of the Text-to-Image systems I have tested so far all have their own unique behaviors and outputs so I will always be on the lookout for more new variations.

Jason.

Aleph2Image Text-to-Image

NOTE: Make sure you also see this post that has a summary of all the Text-to-Image scripts supported by Visions of Chaos with example images.

Previously I experimented with Big Sleep and other Text-to-Image systems.

This post covers variations of Aleph2Image Text-to_Image. Originally coded by Ryan Murdock.


Aleph2Image “Gamma”

Code from this colab. This one seems to evolve white blotches that grow and take over the entire image. Before the white out stage the images tend to have too much contrast. Previous results from Deep Daze were too washed out, this one is too “contrasty”. If they could both be pushed towards that “sweet spot” they would both look much better.

“surrealism”

Aleph2Image Gamma - Surrealism

Aleph2Image Gamma - Surrealism

Aleph2Image Gamma - Surrealism

Aleph2Image Gamma - Surrealism

“H R Giger”

Aleph2Image Gamma - H R Giger

Aleph2Image Gamma - H R Giger

Aleph2Image Gamma - H R Giger

Aleph2Image Gamma - H R Giger

“seascape oil painting”

Aleph2Image Gamma - Seascape Oil Painting

Aleph2Image Gamma - Seascape Oil Painting

Aleph2Image Gamma - Seascape Oil Painting

Aleph2Image Gamma - Seascape Oil Painting

“frogs in the rain”

Aleph2Image Gamma - Frogs In The Rain

Aleph2Image Gamma - Frogs In The Rain

Aleph2Image Gamma - Frogs In The Rain

Aleph2Image Gamma - Frogs In The Rain


Aleph2Image “Delta”

Code from this colab. A newer revision of Aleph2Image that doesn’t have the white out issues. The resulting images have much more vibrant colors.

“surrealism”

Aleph2Image Delta - Surrealism

Aleph2Image Delta - Surrealism

Aleph2Image Delta - Surrealism

Aleph2Image Delta - Surrealism

“H R Giger”

Aleph2Image Delta - H R Giger

Aleph2Image Delta - H R Giger

Aleph2Image Delta - H R Giger

Aleph2Image Delta - H R Giger

“seascape oil painting”

Aleph2Image Delta - Seascape Oil Painting

Aleph2Image Delta - Seascape Oil Painting

Aleph2Image Delta - Seascape Oil Painting

Aleph2Image Delta - Seascape Oil Painting

“frogs in the rain”

Aleph2Image Delta - Frogs In The Rain

Aleph2Image Delta - Frogs In The Rain

Aleph2Image Delta - Frogs In The Rain

Aleph2Image Delta - Frogs In The Rain


Improved Aleph2Image “Delta” v2

Code from this colab. A newer revision of Aleph2Image Delta that gives much better results, although the results tend to be similar to each other for each prompt text. This and Big Sleep would be the best 2 Text-to-Image systems I have experimented with so far.

“surrealism”

Aleph2Image Delta v2 - Surrealism

Aleph2Image Delta v2 - Surrealism

Aleph2Image Delta v2 - Surrealism

Aleph2Image Delta v2 - Surrealism

“H R Giger”

Aleph2Image Delta v2 - H R Giger

Aleph2Image Delta v2 - H R Giger

Aleph2Image Delta v2 - H R Giger

Aleph2Image Delta v2 - H R Giger

“seascape oil painting”

Aleph2Image Delta v2 - Seascape Oil Painting

Aleph2Image Delta v2 - Seascape Oil Painting

Aleph2Image Delta v2 - Seascape Oil Painting

Aleph2Image Delta v2 - Seascape Oil Painting

“frogs in the rain”

Aleph2Image Delta v2 - Frogs In The Rain

Aleph2Image Delta v2 - Frogs In The Rain

Aleph2Image Delta v2 - Frogs In The Rain

Aleph2Image Delta v2 - Frogs In The Rain


Easy GUI Front End

I include a simple GUI dialog front end for these Text-to-Image systems in Visions of Chaos. As long as you have the prerequisites installed you will be able to convert text prompts into single or multiple images.

Text-to-Image GUI

You do need a GPU with lots of VRAM for these to work (especially the 512×512 image models).

Jason.

Further Explorations Into Text-to-Image Machine Learning

NOTE: Make sure you also see this post that has a summary of all the Text-to-Image scripts supported by Visions of Chaos with example images.

After my initial experiments with Big Sleep Text-to-Image generation I looked around for some more examples to play with. I was really impressed with Big Sleep and you can see some examples of Big Sleep output in that original post. I still think Big Sleep is the best Text-to-Image code I have used so far and better than what is in this post.


Deep Daze

Deep Daze is by Phil Wang and the source code is available here.

Deep Daze tends to generate collage-like images. As the first example image shows the resulting images have a washed out or faded look. I put the rest of the example Deep Daze images through a quick Auto White Balance pass in GIMP.

“H R Giger”

DeepDaze - H R Giger

DeepDaze - H R Giger

“Rainforest”

DeepDaze - Rainforest

“night club”

DeepDaze - Night Club

“seascape painting”

DeepDaze - Seascape Painting

“flowing water”

DeepDaze - Flowing Water


VQGAN-CLIP z+quantize

VQGAN-CLIP using a z+quantize method is from Katherine Crowson. Source code is available here.

This method also has the option to use an image to seed the initial model rather than just random noise, but the following examples were all seeded with noise. The resulting images tend to be divided up into rectangular regions, but the resulting imagery is interesting.

“H R Giger”

VQGAN-CLIP z+quantize - H R Giger

VQGAN-CLIP z+quantize - H R Giger

VQGAN-CLIP z+quantize - H R Giger

VQGAN-CLIP z+quantize - H R Giger

“rainforest”

VQGAN-CLIP z+quantize - Rainforest

VQGAN-CLIP z+quantize - Rainforest

VQGAN-CLIP z+quantize - Rainforest

VQGAN-CLIP z+quantize - Rainforest

“night club”

VQGAN-CLIP z+quantize - Night Club

VQGAN-CLIP z+quantize - Night Club

VQGAN-CLIP z+quantize - Night Club

VQGAN-CLIP z+quantize - Night Club

“seascape painting”

VQGAN-CLIP z+quantize - Seascape Painting

VQGAN-CLIP z+quantize - Seascape Painting

VQGAN-CLIP z+quantize - Seascape Painting

VQGAN-CLIP z+quantize - Seascape Painting

“flowing water”

VQGAN-CLIP z+quantize - Flowing Water

VQGAN-CLIP z+quantize - Flowing Water

VQGAN-CLIP z+quantize - Flowing Water

VQGAN-CLIP z+quantize - Flowing Water


VQGAN-CLIP codebook

VQGAN-CLIP using a codebook method is also from Katherine Crowson. Source code is available here.

VQGAN-CLIP codebook seem to give very similar images for different seeds, so I have only shown two examples for each phrase.

“H R Giger”

VQGAN-CLIP codebook - H R Giger

VQGAN-CLIP codebook - H R Giger

“rainforest”

VQGAN-CLIP codebook - Rainforest

VQGAN-CLIP codebook - Rainforest

“night club”

VQGAN-CLIP codebook - Night Club

VQGAN-CLIP codebook - Night Club

“seascape painting”

VQGAN-CLIP codebook - Seascape Painting

VQGAN-CLIP codebook - Seascape Painting

“flowing water”

VQGAN-CLIP codebook - Flowing Water

VQGAN-CLIP codebook - Flowing Water


Other Text-to-Image Models?

If you know of any other available Text-to-Image systems (that are freely available and shareable) let me know.


Availability

You can follow the above links and download the Python code yourself if you are so inclined.

I do include a basic GUI front-end for these Text-to-Image generators in Visions of Chaos. As long as you have the prerequisites installed (which you would need to install to run these outside Visions of Chaos) then you can experiment with these models yourself without needing to use the command line.

Text-to-Image GUI

Jason.

Super Resolution

The Dream

For years now you would have seen scenes in TV shows like CSI or movies like Blade Runner the “enhance” functionality of software that allows details to be enhanced in images that are only a blur or a few pixels in size. In Blade Runner, Deckard’s system even allowed him to look around corners.

The Reality

I have recently been testing machine learning neural network enhancers (aka super resolution) models. They resize an image while trying to maintain or enhance details without losing detail (or with losing a lot less detail than if the image was zoomed with an image editing tool using linear or bicubic zoom).

Some of my results with these models follows. I am using the following test image from here.

Unprocessed Test Image

To best see the differences between the algorithms I recommend you open the x4 zoomed images in new tabs and switch between them.

SRCNN – Super-Resolution Convolutional Neural Network

To see the original paper on SRCNN, click here.
I am using the PyTorch script by Mirwaisse Djanbaz here.

SRCNN x4

SRCNN x4

SRRESNET

To see the original paper on SRRESNET, click here.
I am using the PyTorch script by Sagar Vinodababu here.

SRRESNET x4

SRRESNET x4

SRGAN – Super Resolution Generative Adversarial Network

To see the original paper on SRGAN, click here.
I am using the PyTorch script by Sagar Vinodababu here.

SRGAN x4

SRGAN x4

ESRGAN – Enhanced Super Resolution Generative Adversarial Network

I am using the PyTorch script by Xintao Wang et al here.

ESRGAN x4

ESRGAN x4

PSNR

I am using the PyTorch script by Xintao Wang et al here.

PSNR x4

PSNR x4

Real-ESRGAN

This is the best super sampler here. I am using the executable by Xintao Wang et al here.

Real-ESRGAN x4

Real-ESRGAN x4

Real-ESRNET

I am using the executable by Xintao Wang et al here.

Real-ESRNET x4

Real-ESRNET x4

SwinIR

Very nice results. May be equal to or better than Real-ESRGAN depending on the input image. I am using the code from this colab.

SwinIR x4

SwinIR x4

SPSR

Another method from here.

SPSR x4

SwinIR x4

Differences

Each of the algorithms gives different results. For an unknown source image it would probably be best to run it through them all and then see which gives you the best result. These are not the Hollywood or TV enhance magic fix just yet.

If you know of any other PyTorch implementations of super resolution I missed, let me know.

Availability

You can follow the links to the original GitHub repositories to get the software, but I have also added a simple GUI front end for these scripts in Visions of Chaos. That allows you to try the above algorithms on single images or batch process a directory of images.

Jason.

Text-to-Image Machine Learning

NOTE: Make sure you also see this post that has a summary of all the Text-to-Image scripts supported by Visions of Chaos with example images.

Text-to-Image

Input a short phrase or sentence into a neural network and see what image it creates.

I am using Big Sleep from Phil Wang (@lucidrains).

Phil used the code/models from Ryan Murdock (@advadnoun). Ryan has a blog post explaining the basics of how all the parts connect up here. Ryan has some newer Text-to-Image experiments but they are behind a Patreon paywall, so I have not played with them. Hopefully he (or anyone) releases the colabs publicly sometime in the future. I don’t want to experiment with a Text-to-Image system that I cannot share with everyone, otherwise it is just a tease.

The most simple explanation is that BigGAN generates images that try to satisfy CLIP which rates how closely the image matches the input phrase. BigGAN creates an image and CLIP looks at it and says “sorry, that does not look like a cat to me, try again”. As each repeated iteration is performed BigGAN gets better at generating an image that matches the desired phrase text.

Big Sleep Examples

Big Sleep uses a seed number which means you can have thousands/millions of different outputs for the same input phrase. Note there is an issue with the seed not always being able to create the same images though. From my testing, even with the torch_deterministic flag set to True and setting the CUDA envirnmental variable does not help. Every time Big Sleep is called it will generate a different image with the same seed. That means you will never be able to reproduce the same output in the future.

These images are 512×512 pixels square (the largest resolution Big Sleep supports) and took 4 minutes each to generate on an RTX 3090 GPU. The same code takes 6 minutes 45 seconds per image on an older 2080 Super GPU.

Also note that these are the “cherry picked” best results. Big Sleep is not going to create awesome art every time. For these examples or when experimenting with new phrases I usually run a batch of multiple images and then manually select the best 4 or 8 to show off (4 or 8 because that satisfies one or two tweets).

To start, these next four images were created from the prompt phrase “Gandalf and the Balrog”

Big Sleep - Gandalf and the Balrog

Big Sleep - Gandalf and the Balrog

Big Sleep - Gandalf and the Balrog

Big Sleep - Gandalf and the Balrog

Here are results from “disturbing flesh”. These are like early David Cronenberg nightmare visuals.

Big Sleep - Disturbing Flesh

Big Sleep - Disturbing Flesh

Big Sleep - Disturbing Flesh

Big Sleep - Disturbing Flesh

A suggestion from @MatthewKafker on Twitter “spatially ambiguous water lillies painting”

Big Sleep - Spatially Ambiguous Water Lillies Painting

Big Sleep - Spatially Ambiguous Water Lillies Painting

Big Sleep - Spatially Ambiguous Water Lillies Painting

Big Sleep - Spatially Ambiguous Water Lillies Painting

Big Sleep - Spatially Ambiguous Water Lillies Painting

Big Sleep - Spatially Ambiguous Water Lillies Painting

Big Sleep - Spatially Ambiguous Water Lillies Painting

Big Sleep - Spatially Ambiguous Water Lillies Painting

“stormy seascape”

Big Sleep - Stormy Seascape

Big Sleep - Stormy Seascape

Big Sleep - Stormy Seascape

Big Sleep - Stormy Seascape

After experimenting with acrylic pour painting in the past I wanted to see what BigSleep could generate from “acrylic pour painting”

Big Sleep - Acrylic Pour Painting

Big Sleep - Acrylic Pour Painting

Big Sleep - Acrylic Pour Painting

Big Sleep - Acrylic Pour Painting

I have always enjoyed David Lynch movies so let’s see what “david lynch visuals” results in. This one got a lot of surprises and worked great. These images really capture the feeling of a Lynchian cinematic look. A lot of these came out fairly dark so I have tweaked exposure in GIMP.

Big Sleep - David Lynch Visuals

Big Sleep - David Lynch Visuals

Big Sleep - David Lynch Visuals

Big Sleep - David Lynch Visuals

Big Sleep - David Lynch Visuals

Big Sleep - David Lynch Visuals

Big Sleep - David Lynch Visuals

Big Sleep - David Lynch Visuals

More from “david lynch visuals” but these are more portraits. The famous hair comes through.

Big Sleep - David Lynch Visuals

Big Sleep - David Lynch Visuals

Big Sleep - David Lynch Visuals

Big Sleep - David Lynch Visuals

“H.R.Giger”

Big Sleep - H.R.Giger

Big Sleep - H.R.Giger

Big Sleep - H.R.Giger

Big Sleep - H.R.Giger

Big Sleep - H.R.Giger

Big Sleep - H.R.Giger

Big Sleep - H.R.Giger

Big Sleep - H.R.Giger

“metropolis”

Big Sleep - Metropolis

Big Sleep - Metropolis

Big Sleep - Metropolis

Big Sleep - Metropolis

“surrealism”

Big Sleep - Surrealsim

Big Sleep - Surrealsim

Big Sleep - Surrealsim

Big Sleep - Surrealsim

“colorful surrealism”

Big Sleep - Colorful Surrealsim

Big Sleep - Colorful Surrealsim

Big Sleep - Colorful Surrealsim

Big Sleep - Colorful Surrealsim

Availability

I have now added a simple GUI front end for Big Sleep into Visions of Chaos, so once you have installed all the pre-requisites you can run these models on any prompt phrase you feed into them. The following images shows Big Sleep in the process of generating an image for the prompt text “cyberpunk aesthetic”.

Text-to-Image GUI

After spending a lot of time experimenting with Big Sleep over the last few days, I highly encourage anyone with a decent GPU to try these. The results are truly fascinating. This page says at least a 2070 8GB or greater is required, but Martin in the comments managed to generate a 128×128 image on a 1060 6GB GPU after 26 (!!) minutes.

Jason.