Text-to-Image Summary – Part 8

This is Part 8. There is also Part 1, Part 2, Part 3, Part 4, Part 5, Part 6 and Part 7.

This post continues listing the Text-to-Image scripts included with Visions of Chaos and some example outputs from each script.


Name: Deforum Stable Diffusion v0.4
Author: Original script by Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, Björn Ommer
Original script: https://colab.research.google.com/github/deforum/stable-diffusion/blob/main/Deforum_Stable_Diffusion.ipynb
Time for 512×512 on a 3090: 34 seconds
Maximum resolution on a 24 GB 3090: 1280×640
Maximum resolution on an 8GB 2080: 640×576
Description: Incredible. Latest and greatest. Beats all previous Text-to-Image systems. If you only use one, use this one. Deforum builds upon Stable Diffusion with animation support. v0.4 is the latest version.

'a canal' Deforum Stable Diffusion v0.4
a canal

'a forest path' Deforum Stable Diffusion v0.4
a forest path

'a loft' Deforum Stable Diffusion v0.4
a loft

'a matte painting of a river hyperdetailed and CryEngine' Deforum Stable Diffusion v0.4
a matte painting of a river hyperdetailed and CryEngine

'a painting of the tropics' Deforum Stable Diffusion v0.4
a painting of the tropics

'a pastel of a nightmare 4K HD realism and trending on Flickr' Deforum Stable Diffusion v0.4
a pastel of a nightmare 4K HD realism and trending on Flickr

'a photorealistic painting of Cookie Monster rendered in unreal engine and CGSociety' Deforum Stable Diffusion v0.4
a photorealistic painting of Cookie Monster rendered in unreal engine and CGSociety

'a tropical beach by Karl Hagedorn and Michalis Oikonomou' Deforum Stable Diffusion v0.4
a tropical beach by Karl Hagedorn and Michalis Oikonomou

'an etching of King Kong' Deforum Stable Diffusion v0.4
an etching of King Kong

'concept art of Gandalf CGSociety and 4K HD realism' Deforum Stable Diffusion v0.4
concept art of Gandalf CGSociety and 4K HD realism


Any Others I Missed?

Do you know of any other colabs and/or github Text-to-Image systems I have missed? Let me know and I will see if I can convert them to work with Visions of Chaos for a future release. If you know of any public Discords with other colabs being shared let me know too.

Jason.

Text-to-Image Summary – Part 6

This is Part 6. There is also Part 1, Part 2, Part 3, Part 4, Part 5, Part 7 and Part 8.

This post continues listing the Text-to-Image scripts included with Visions of Chaos and some example outputs from each script.


Name: Augmented CLIP Guided Diffusion
Author: Peter Baylies
Original script: https://github.com/pbaylies/Augmented_CLIP
Time for 512×512 on a 3090: 1 minutes 16 seconds
Maximum resolution on a 24 GB 3090: 1664×704
Maximum resolution on an 8GB 2080: 256×256 57 seconds
Description: Another CLIP Guided Diffusion script. Fast. Gives unique textured results.

'a detailed painting of people by Nicolette Macnamara' Augmented CLIP Guided Diffusion
a detailed painting of people by Nicolette Macnamara

'a diagram of a nightmare creature made of gold' Augmented CLIP Guided Diffusion
a diagram of a nightmare creature made of gold

'a nightmare creature' Augmented CLIP Guided Diffusion
a nightmare creature

'a painting of a cabin next to a stream in a secluded forest' Augmented CLIP Guided Diffusion
a painting of a cabin next to a stream in a secluded forest

'a storybook illustration of Jabba The Hutt by Carle Hessay' Augmented CLIP Guided Diffusion
a storybook illustration of Jabba The Hutt by Carle Hessay

'a werewolf by A R Middleton Todd' Augmented CLIP Guided Diffusion
a werewolf by A R Middleton Todd

'an oil painting of Big Bird' Augmented CLIP Guided Diffusion
an oil painting of Big Bird

'Gandalf trending on pixiv' Augmented CLIP Guided Diffusion
Gandalf trending on pixiv

'Lovecraftian horror' Augmented CLIP Guided Diffusion
Lovecraftian horror

'Lovecraftian horror' Augmented CLIP Guided Diffusion
poster art of the Las Vegas strip by George Passantino


Name: Princess Generator
Author: Dango233
Original script: https://colab.research.google.com/drive/1QgH9TvQMXR3PpEGBcHnghtEcwFDXLaYE
Time for 512×512 on a 3090: 2 minutes 38 seconds
Maximum resolution on a 24 GB 3090: 1664×704
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM.
Description: The latest update to “CLIP Guided Diffusion v6” from Dango233. Can give some superb results. Worth exploring and experimenting with further.

'a cloudy sunset' Princess Generator
a cloudy sunset

'a fireplace by Jacob More' Princess Generator
a fireplace by Jacob More

'a happy alien by James Jarvaise' Princess Generator
a happy alien by James Jarvaise

'a mountain path by Stephen Pace' Princess Generator
a mountain path by Stephen Pace

'a raytraced image of a western town' Princess Generator
a raytraced image of a western town

'a teddy bear' Princess Generator
a teddy bear

'Charmander made of wood by Hua Yan' Princess Generator
Charmander made of wood by Hua Yan

'dense woodland by Marie Angel' Princess Generator
dense woodland by Marie Angel

'paranoia by Floris van Dyck' Princess Generator
paranoia by Floris van Dyck

'portrait of Princess Victoria trending on artstation' Princess Generator
portrait of Princess Victoria trending on artstation


Name: Disco Diffusion v4.1
Author: @Somnai
Original script: https://colab.research.google.com/drive/1sHfRn5Y0YKYKi1k-ifUSBFRNJ8_1sa39
Time for 512×512 on a 3090: 1 minute 57 seconds
Maximum resolution on a 24 GB 3090: 2496×1088
Maximum resolution on an 8GB 2080: 1152×512. 4 minutes 39 seconds.
Description: The latest update to Disco Diffusion. Really nice detailed outputs. Low VRAM requirments allow huge sized images. I didn’t realise I had 3 zombie themed results in this random batch.

'a bronze sculpture of a zombie' Disco Diffusion v4.1
a bronze sculpture of a zombie

'a fantasy land' Disco Diffusion v4.1
a fantasy land

'a pencil sketch of Cthulhu by Rudolf Koller' Disco Diffusion v4.1
a pencil sketch of Cthulhu by Rudolf Koller

'a pop art painting of zombies' Disco Diffusion v4.1
a pop art painting of zombies

'a portrait of a young boy by Hendrick Cornelisz. van Vliet' Disco Diffusion v4.1
a portrait of a young boy by Hendrick Cornelisz. van Vliet

'a tree by Philips Wouwerman' Disco Diffusion v4.1
a tree by Philips Wouwerman

'a western town' Disco Diffusion v4.1
a western town

'a zombie' Disco Diffusion v4.1
a zombie

'Han Solo psychedelic' Disco Diffusion v4.1
Han Solo psychedelic

'vector art of the Amazon Rainforest' Disco Diffusion v4.1
vector art of the Amazon Rainforest


Name: Hypertron v2
Author: Philipuss
Original script: https://colab.research.google.com/drive/10fa8X6EsfZfda1dfhJ_BtfPZ7Te1WGoX
Time for 512×512 on a 3090: 1 minute 57 seconds
Maximum resolution on a 24 GB 3090: 1120×480.
Maximum resolution on an 8GB 2080: 256×256 2 minutes 18 seconds
Description: Version 2 of Hypertron. More models, more flavors. Works OK. Can give the “image in a sea of purple/grey” that previous MSE based scripts suffered from. Can give good results if you let it run a large random batch overnight.

'a bronze sculpture of a spooky forest by Herb Aach' Hypertron v2
a bronze sculpture of a spooky forest by Herb Aach

'a diamond made of flowers' Hypertron v2
a diamond made of flowers

'a gouache of an android by Wu Bin' Hypertron v2
a gouache of an android by Wu Bin

'a photo of a kitchen' Hypertron v2
a photo of a kitchen

'a photorealistic painting of a cemetery' Hypertron v2
a photorealistic painting of a cemetery

'a sketch of a haunted house' Hypertron v2
a sketch of a haunted house

'a tattoo of Squirtle made of clay' Hypertron v2
a tattoo of Squirtle made of clay

'an art deco painting of a human by Nicolas Lancret 8K 3D' Hypertron v2
an art deco painting of a human by Nicolas Lancret 8K 3D

'goldfish by Elfriede Lohse-Wächtler' Hypertron v2
goldfish by Elfriede Lohse-Wächtler

'Lovecraftian horror by Aileen Eagleton' Hypertron v2
Lovecraftian horror by Aileen Eagleton


Name: CC12M Diffusion
Author: Katherine Crowson
Original script: https://colab.research.google.com/drive/1TBo4saFn1BCSfgXsmREFrUl3zSQFg6CC
Time for 512×512 on a 3090: 1 minute 48 seconds
Maximum resolution on a 24 GB 3090: 1664×704.
Maximum resolution on an 8GB 2080: 832×512 2 minutes 59 seconds
Description: Can support higher resolutions, but the coherance really falls apart with anything over 256×256. It handles multiple images at once, so these examples are 4 256×256 results.

'a beachside resort' CC12M Diffusion
a beachside resort

'a bouquet of flowers' CC12M Diffusion
a bouquet of flowers

'a castle' CC12M Diffusion
a castle

'a cemetery' CC12M Diffusion
a cemetery

'a cephalopod by Walter Stuempfig super detailed' CC12M Diffusion
a cephalopod by Walter Stuempfig super detailed

'a color pencil sketch of a bedroom super detailed' CC12M Diffusion
a color pencil sketch of a bedroom super detailed

'a kitchen' CC12M Diffusion
a kitchen

'a mountainscape' CC12M Diffusion
a mountainscape

'a nightclub' CC12M Diffusion
a nightclub

'a vast city' CC12M Diffusion
a vast city


Name: Disco Diffusion v5
Authors: @Somnai and @Gandamu
Original script: https://colab.research.google.com/github/alembics/disco-diffusion/blob/main/Disco_Diffusion.ipynb
Time for 512×512 on a 3090: 2 minutes 02 seconds
Maximum resolution on a 24 GB 3090: 2496×1088
Maximum resolution on an 8GB 2080: 1152×512. 4 minutes 43 seconds.
Description: The latest update to Disco Diffusion.

'a cloudy sunset' Disco Diffusion v5
a cloudy sunset

'a crying person made of wrought iron by František Jakub Prokyš psychedelic' Disco Diffusion v5
a crying person made of wrought iron by František Jakub Prokyš psychedelic

'a flemish baroque of a school of tropical fish' Disco Diffusion v5
a flemish baroque of a school of tropical fish

'a low poly render of puppies' Disco Diffusion v5
a low poly render of puppies

'a morning landscape' Disco Diffusion v5
a morning landscape

'a mosaic of a worried man by Paul Lohse' Disco Diffusion v5
a mosaic of a worried man by Paul Lohse

'a thunder storm by Cornelis Claesz van Wieringen' Disco Diffusion v5
a thunder storm by Cornelis Claesz van Wieringen

'a tropical beach' Disco Diffusion v5
a tropical beach

'computer rendering of an evil alien 4K HD realism' Disco Diffusion v5
computer rendering of an evil alien 4K HD realism

'the human condition Flickr' Disco Diffusion v5
the human condition Flickr


Name: Disco Diffusion v5 Turbo Smooth
Authors: Chris Allen
Original script: https://colab.research.google.com/github/zippy731/disco-diffusion-turbo/blob/main/Disco_Diffusion_v5_Turbo_%5Bw_3D_animation%5D.ipynb
Time for 512×512 on a 3090: 1 minutes 14 seconds
Maximum resolution on a 24 GB 3090: 2496×1088
Maximum resolution on an 8GB 2080: 832×384. 2 minutes 21 seconds.
Description: An updated version of Disco Diffusion v5 that gives fast and smooth movie outputs.

'a black and white photo of a lush rainforest trending on Flickr' Disco Diffusion v5 Turbo Smooth
a black and white photo of a lush rainforest trending on Flickr

'a detailed matte painting of a factory' Disco Diffusion v5 Turbo Smooth
a detailed matte painting of a factory

'a hacker by Mykola Burachek' Disco Diffusion v5 Turbo Smooth
a hacker by Mykola Burachek

'a sea monster CGSociety' Disco Diffusion v5 Turbo Smooth
a sea monster CGSociety

'a surrealist painting of a happy person' Disco Diffusion v5 Turbo Smooth
a surrealist painting of a happy person

'a tardigrade by Cosmo Alexander' Disco Diffusion v5 Turbo Smooth
a tardigrade by Cosmo Alexander

'an anime drawing of an evening landscape by Daphne Fedarb photorealistic' Disco Diffusion v5 Turbo Smooth
an anime drawing of an evening landscape by Daphne Fedarb photorealistic

'an art deco painting of a happy person by John Uzzell Edwards' Disco Diffusion v5 Turbo Smooth
an art deco painting of a happy person by John Uzzell Edwards

'chalk art of a bouquet of flowers' Disco Diffusion v5 Turbo Smooth
chalk art of a bouquet of flowers

'the human condition' Disco Diffusion v5 Turbo Smooth
the human condition


Name: Augmented CLIP Guided Diffusion v2
Author: Peter Baylies
Original script: https://github.com/pbaylies/Augmented_CLIP
Time for 512×512 on a 3090: 2 minutes 48 seconds
Maximum resolution on a 24 GB 3090: 1664×704
Maximum resolution on an 8GB 2080: 512×512 4 minutes 56 seconds
Description: Updaterd version of the Augmented CLIP Guided Diffusion script.

'a bungalow 4K HD realism' Augmented CLIP Guided Diffusion v2
a bungalow 4K HD realism

'a forest fire' Augmented CLIP Guided Diffusion v2
a forest fire

'a lush rainforest CryEngine' Augmented CLIP Guided Diffusion v2
a lush rainforest CryEngine

'a painting of a kitchen by Betye Saar' Augmented CLIP Guided Diffusion v2
a painting of a kitchen by Betye Saar

'a portrait of a princess trending on artstation' Augmented CLIP Guided Diffusion v2
a portrait of a princess trending on artstation

'a spooky forest' Augmented CLIP Guided Diffusion v2
a spooky forest

'a tattoo of a zombie' Augmented CLIP Guided Diffusion v2
a tattoo of a zombie

'a werewolf by David Cooke Gibson' Augmented CLIP Guided Diffusion v2
a werewolf by David Cooke Gibson

'an oil painting of a lake' Augmented CLIP Guided Diffusion v2
an oil painting of a lake

'an ugly man' Augmented CLIP Guided Diffusion v2
an ugly man


Name: v-diffusion
Author: Katherine Crowson
Original script: https://github.com/crowsonkb/v-diffusion-pytorch
Time for 512×512 on a 3090: 3 minutes 57 seconds
Maximum resolution on a 24 GB 3090: 896×512 or 640×640.
Maximum resolution on an 8GB 2080: 128×128 1 minute 19 seconds
Description: Updated version of Velocity-Diffusion. Tends to make incoherant collage images over 256×256.

'a black and white photo of a portrait of a young girl' v-diffusion Text-to-Image
a black and white photo of a portrait of a young girl

'a cityscape by Lujo Bezeredi' v-diffusion Text-to-Image
a cityscape by Lujo Bezeredi

'a cloudy sunset' v-diffusion Text-to-Image
a cloudy sunset

'a hologram of a sad face by Josef Šíma' v-diffusion Text-to-Image
a hologram of a sad face by Josef Šíma

'a lounge room by Riad Beyrouti IMAX' v-diffusion Text-to-Image
a lounge room by Riad Beyrouti IMAX

'a mountain path' v-diffusion Text-to-Image
a mountain path

'a portrait of a young boy made of metal' v-diffusion Text-to-Image
a portrait of a young boy made of metal

'a portrait of a young girl' v-diffusion Text-to-Image
a portrait of a young girl

'a space nebula' v-diffusion Text-to-Image
a space nebula

'an acrylic painting of a mountain range' v-diffusion Text-to-Image
an acrylic painting of a mountain range


Name: GLID-3
Author: Jack Qiao
Original script: https://github.com/Jack000/glid-3
Time for 512×512 on a 3090: 35 seconds
Maximum resolution on a 24 GB 3090: 768×768.
Maximum resolution on an 8GB 2080: 512×512 50 seconds
Description: Great textures and lighting. Poor image coherency.

'a cemetery' GLID-3 Text-to-Image
a cemetery

'a drawing of a cloudy sunset' GLID-3 Text-to-Image
a drawing of a cloudy sunset

'a drawing of a human lens flare' GLID-3 Text-to-Image
a drawing of a human lens flare

'a lake' GLID-3 Text-to-Image
a lake

'a large waterfall made of silver' GLID-3 Text-to-Image
a large waterfall made of silver

'a marina' GLID-3 Text-to-Image
a marina

'a minimalist painting of a teddy bear by Johann Ludwig Bleuler' GLID-3 Text-to-Image
a minimalist painting of a teddy bear by Johann Ludwig Bleuler

'a renaissance painting of paranoia made of vines' GLID-3 Text-to-Image
a renaissance painting of paranoia made of vines

'an abbey by Cornelis Pietersz' GLID-3 Text-to-Image
an abbey by Cornelis Pietersz

'an art deco painting of a rose' GLID-3 Text-to-Image
an art deco painting of a rose


Name: Disco Diffusion v5.1
Authors: @Somnai, @Gandamu and Chris Allen
Original script: https://colab.research.google.com/github/alembics/disco-diffusion/blob/main/Disco_Diffusion.ipynb
Time for 512×512 on a 3090: 2 minutes 05 seconds
Maximum resolution on a 24 GB 3090: 2496×1088
Maximum resolution on an 8GB 2080: 1152×512. 4 minutes 40 seconds.
Description: Latest version of Disco Diffusion incorporating the “Turbo” features of v5 that gives fast and smooth movie outputs.

'a flemish baroque of a sunset' Disco Diffusion v5 Turbo Smooth
a flemish baroque of a sunset

'a marsh' Disco Diffusion v5 Turbo Smooth
a marsh

'a mid-nineteenth century engraving of New York City' Disco Diffusion v5 Turbo Smooth
a mid-nineteenth century engraving of New York City

'a minimalist painting of a cephalopod' Disco Diffusion v5 Turbo Smooth
a minimalist painting of a cephalopod

'a photo of Dracula' Disco Diffusion v5 Turbo Smooth
a photo of Dracula

'a watercolor painting of a knight' Disco Diffusion v5 Turbo Smooth
a watercolor painting of a knight

'an ugly person by Samuel Colman trending on ArtStation' Disco Diffusion v5 Turbo Smooth
an ugly person by Samuel Colman trending on ArtStation

'chalk art of Gandalf' Disco Diffusion v5 Turbo Smooth
chalk art of Gandalf

'lineart of a zombie' Disco Diffusion v5 Turbo Smooth
lineart of a zombie

'the Amazon Rainforest 4K HD realism' Disco Diffusion v5 Turbo Smooth
the Amazon Rainforest 4K HD realism


Name: Latent Diffusion LAION_400M
Authors: @multimodalart
Original script: https://colab.research.google.com/github/multimodalart/latent-diffusion-notebook/blob/main/Latent_Diffusion_LAION_400M_model_text_to_image.ipynb
Time for 512×512 on a 3090: 57 seconds
Maximum resolution on a 24 GB 3090: 1152×512 or 768×768
Maximum resolution on an 8GB 2080: 256×256. 1 minute 12 seconds.
Description: A new script based on the newly trained LAION_400M moidel. Impressive results at 256×256. Loses coherency at larger sizes. These examples are 4 256×256 images of each prompt.

'a black and white photo of a nightmare creature' Latent Diffusion LAION_400M
a black and white photo of a nightmare creature

'a futuristic city' Latent Diffusion LAION_400M
a futuristic city

'a hyperrealistic painting of a queen made of flowers' Latent Diffusion LAION_400M
a hyperrealistic painting of a queen made of flowers

'a painting of a happy clown' Latent Diffusion LAION_400M
a painting of a happy clown

'a skeleton' Latent Diffusion LAION_400M
a skeleton

'a stained glass window 4K HD realism' Latent Diffusion LAION_400M
a stained glass window 4K HD realism

'a watercolor painting of a lounge room' Latent Diffusion LAION_400M
a watercolor painting of a lounge room

'an eagle' Latent Diffusion LAION_400M
an eagle

'an ultrafine detailed painting of Harry Potter' Latent Diffusion LAION_400M
an ultrafine detailed painting of Harry Potter

'vector art of a zombie by Oskar Kokoschka' Latent Diffusion LAION_400M
vector art of a zombie by Oskar Kokoschka


Name: JAX CLIP Guided Diffusion v2.7
Author: nshepperd
Original script: https://colab.research.google.com/drive/1nmtcbQsE8sTjfLJ1u3Y4d6vi9ZTAvQph
Time for 512×512 on a 3090: 2 minutes 37 seconds
Maximum resolution on a 24 GB 3090: 2496×1088
Maximum resolution on an 8GB 2080: 512×512. 3 minutes 59 seconds.
Description: ANother diffusion based script. Can give very nice high detail results.

'a Dalek made of feathers' JAX CLIP Guided Diffusion v2.7
a Dalek made of feathers

'a haunted house' JAX CLIP Guided Diffusion v2.7
a haunted house

'a picture of a chateau by Odhise Paskali' JAX CLIP Guided Diffusion v2.7
a picture of a chateau by Odhise Paskali

'a refinery' JAX CLIP Guided Diffusion v2.7
a refinery

'a studio by Allan Ramsay trending on ArtStation' JAX CLIP Guided Diffusion v2.7
a studio by Allan Ramsay trending on ArtStation

'a sunset' JAX CLIP Guided Diffusion v2.7
a sunset

'a thunder storm' JAX CLIP Guided Diffusion v2.7
a thunder storm

'a watercolor painting of a fire breathing dragon' JAX CLIP Guided Diffusion v2.7
a watercolor painting of a fire breathing dragon

'a witch made of mist' JAX CLIP Guided Diffusion v2.7
a witch made of mist

'the tropics by Thomas de Keyser' JAX CLIP Guided Diffusion v2.7
the tropics by Thomas de Keyser


Name: GLID-3-XL
Author: Jack Qiao
Original script: https://github.com/Jack000/glid-3-xl
Time for 512×512 on a 3090: 1 minute 04 seconds
Maximum resolution on a 24 GB 3090: 512×512.
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM.
Description: Improved/updated version of GLID-3. Uses CLIP for better accuracy. Great textures and lighting. Poor image coherency when over 256×256.

'a demon' GLID-3-XL Text-to-Image
a demon

'a detailed matte painting of a bouquet of flowers' GLID-3-XL Text-to-Image
a detailed matte painting of a bouquet of flowers

'a kitchen' GLID-3-XL Text-to-Image
a kitchen

'a photorealistic painting of a movie monster hyperrealistic' GLID-3-XL Text-to-Image
a photorealistic painting of a movie monster hyperrealistic

'a picture of The Incredible Hulk by Kazimir Malevich' GLID-3-XL Text-to-Image
a picture of The Incredible Hulk by Kazimir Malevich

'a pop art painting of an angry woman' GLID-3-XL Text-to-Image
a pop art painting of an angry woman

'a spooky forest' GLID-3-XL Text-to-Image
a spooky forest

'an abbey' GLID-3-XL Text-to-Image
an abbey

'New York City by Marie Courtois' GLID-3-XL Text-to-Image
New York City by Marie Courtois

'poster art of Gandalf vivid colors' GLID-3-XL Text-to-Image
poster art of Gandalf vivid colors


Name: ruDALL-E Aspect Ratio
Author: Alex Shonenkov
Original script: https://github.com/shonenkov-AI/rudalle-aspect-ratio
Time for 512×512 on a 3090: N/A
Maximum resolution on a 24 GB 3090: N/A
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM.
Description: Version of ruDALL-E that generates wide and/or tall aspect ratio images. The shorter side is limited to 256 pixels. Results can be very nice. Will generate multiple images at once, so these sample images have 4 results per prompt.

'a black and white photo of a werewolf' ruDALL-E Aspect Ratio Text-to-Image
a black and white photo of a werewolf

'a cartoon of a swamp' ruDALL-E Aspect Ratio Text-to-Image
a cartoon of a swamp

'a large waterfall made of metal' ruDALL-E Aspect Ratio Text-to-Image
a large waterfall made of metal

'a lounge room' ruDALL-E Aspect Ratio Text-to-Image
a lounge room

'a matte painting of a townhouse' ruDALL-E Aspect Ratio Text-to-Image
a matte painting of a townhouse

'a palace made of mist' ruDALL-E Aspect Ratio Text-to-Image
a palace made of mist

'a photo of an ugly woman' ruDALL-E Aspect Ratio Text-to-Image
a photo of an ugly woman

'a tropical beach' ruDALL-E Aspect Ratio Text-to-Image
a tropical beach

'an evil clown' ruDALL-E Aspect Ratio Text-to-Image
an evil clown

'dense woodland' ruDALL-E Aspect Ratio Text-to-Image
dense woodland

Any Others I Missed?

Do you know of any other colabs and/or github Text-to-Image systems I have missed? Let me know and I will see if I can convert them to work with Visions of Chaos for a future release. If you know of any public Discords with other colabs being shared let me know too.

Jason.

Text-to-Image Summary – Part 5

This is Part 5. There is also Part 1, Part 2, Part 3, Part 4, Part 6, Part 7 and Part 8.

This post continues listing the Text-to-Image scripts included with Visions of Chaos and some example outputs from each script.


Name: Multi-Perceptor CLIP Guided Diffusion Secondary Model Method
Author: SOMNAI
Original script: https://colab.research.google.com/drive/1Pf5F84FzWe9iAKNbiPaEo_v4hvQZ9SqS
Time for 512×512 on a 3090: 7 minutes 23 seconds
Maximum resolution on a 24 GB 3090: 1792×768 or 2048×640.
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: The winner for the longest name so far. Needs tweaking as the addition of the secondary model here reduces the usual excellent quality of the Multi-Perceptor CLIP Guided Diffusion. Still shows a lot of potential.

'a 3D render of Robocop' Multi-Perceptor CLIP Guided Diffusion Secondary Model Method Text-to-Image
a 3D render of Robocop

'a futuristic city IMAX' Multi-Perceptor CLIP Guided Diffusion Secondary Model Method Text-to-Image
a futuristic city IMAX

'a matte painting of trypophobia' Multi-Perceptor CLIP Guided Diffusion Secondary Model Method Text-to-Image
a matte painting of trypophobia

'a renaissance painting of a cloudy sunset trending on ArtStation' Multi-Perceptor CLIP Guided Diffusion Secondary Model Method Text-to-Image
a renaissance painting of a cloudy sunset trending on ArtStation

'a woman 4K photo' Multi-Perceptor CLIP Guided Diffusion Secondary Model Method Text-to-Image
a woman 4K photo

'an evil clown Flickr' Multi-Perceptor CLIP Guided Diffusion Secondary Model Method Text-to-Image
an evil clown Flickr

'an oil painting of a nightmare creature by Louis Janmot' Multi-Perceptor CLIP Guided Diffusion Secondary Model Method Text-to-Image
an oil painting of a nightmare creature by Louis Janmot

'Indiana Jones' Multi-Perceptor CLIP Guided Diffusion Secondary Model Method Text-to-Image
Indiana Jones

'reflective spheres' Multi-Perceptor CLIP Guided Diffusion Secondary Model Method Text-to-Image
reflective spheres

'zombies filmic' Multi-Perceptor CLIP Guided Diffusion Secondary Model Method Text-to-Image
zombies filmic


Name: Multi-Perceptor VQGAN+CLIP v2
Author: Remi Durant
Original script: https://colab.research.google.com/drive/1peZ98vBihDD9A1v7JdH5VvHDUuW5tcRK
Time for 512×512 on a 3090: 3 minutes 45 seconds
Maximum resolution on a 24 GB 3090: 1120×480.
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: Version 2 of Remi’s Multi-Perceptor VQGAN+CLIP script.

'a babbling brook by Zhou Wenjing' Multi-Perceptor VQGAN+CLIP v2 Text-to-Image
a babbling brook by Zhou Wenjing

'a bedroom by Francesco Furini' Multi-Perceptor VQGAN+CLIP v2 Text-to-Image
a bedroom by Francesco Furini

'a computer by Édouard Detaille' Multi-Perceptor VQGAN+CLIP v2 Text-to-Image
a computer by Édouard Detaille

'a cross stitch of a landscape vivid colors' Multi-Perceptor VQGAN+CLIP v2 Text-to-Image
a cross stitch of a landscape vivid colors

'a kitchen filmic' Multi-Perceptor VQGAN+CLIP v2 Text-to-Image
a kitchen filmic

'a matte painting of halloween' Multi-Perceptor VQGAN+CLIP v2 Text-to-Image
a matte painting of halloween

'a pastel of a peacock' Multi-Perceptor VQGAN+CLIP v2 Text-to-Image
a pastel of a peacock

'a storybook illustration of a kitchen by Lena Alexander' Multi-Perceptor VQGAN+CLIP v2 Text-to-Image
a storybook illustration of a kitchen by Lena Alexander

'an oil on canvas painting of a zombie made of voxels' Multi-Perceptor VQGAN+CLIP v2 Text-to-Image
an oil on canvas painting of a zombie made of voxels

'vector art of Darth Vader' Multi-Perceptor VQGAN+CLIP v2 Text-to-Image
vector art of Darth Vader


Name: 360Diffusion
Author: @sadly_existent
Original script: https://colab.research.google.com/github/sadnow/360Diffusion/blob/main/360Diffusion_Public.ipynb
Time for 512×512 on a 3090: 2 minutes 50 seconds
Maximum resolution on a 24 GB 3090: 1120×480.
Maximum resolution on an 8GB 2080: 256×256 2 minutes 28 seconds
Description: A new diffusion based script. Capable of some interesting results

'a bronze sculpture of a crying person by Auguste BaudBovy' 360Diffusion Text-to-Image
a bronze sculpture of a crying person by Auguste BaudBovy

'a flemish baroque of a bouquet of flowers' 360Diffusion Text-to-Image
a flemish baroque of a bouquet of flowers

'a haunted house trending on ArtStation' 360Diffusion Text-to-Image
a haunted house trending on ArtStation

'a hyperrealistic painting of trypophobia by Xia Gui' 360Diffusion Text-to-Image
a hyperrealistic painting of trypophobia by Xia Gui

'a nightmare creature' 360Diffusion Text-to-Image
a nightmare creature

'a space nebula rendered in Cinema4D' 360Diffusion Text-to-Image
a space nebula rendered in Cinema4D

'a tentacle monster 4K HD realism' 360Diffusion Text-to-Image
a tentacle monster 4K HD realism

'an oil on canvas painting of Danny Trejo by Pablo Rey' 360Diffusion Text-to-Image
an oil on canvas painting of Danny Trejo by Pablo Rey

'Frankenstein' 360Diffusion Text-to-Image
Frankenstein

'heaven 8K 3D' 360Diffusion Text-to-Image
heaven 8K 3D


Name: Multi-Perceptor VQGAN+CLIP v3
Author: Remi Durant
Original script: https://colab.research.google.com/drive/1peZ98vBihDD9A1v7JdH5VvHDUuW5tcRK
Time for 512×512 on a 3090: 3 minutes 38 seconds
Maximum resolution on a 24 GB 3090: 1120×480.
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: Version 3 of Remi’s Multi-Perceptor VQGAN+CLIP script.

'a bronze sculpture of Gandalf' Multi-Perceptor VQGAN+CLIP v3 Text-to-Image
a bronze sculpture of Gandalf

'a clown made of clay' Multi-Perceptor VQGAN+CLIP v3 Text-to-Image
a clown made of clay

'a detailed painting of a desert oasis' Multi-Perceptor VQGAN+CLIP v3 Text-to-Image
a detailed painting of a desert oasis

'a house by Kathleen Guthrie' Multi-Perceptor VQGAN+CLIP v3 Text-to-Image
a house by Kathleen Guthrie

'a peacock made of metal' Multi-Perceptor VQGAN+CLIP v3 Text-to-Image
a peacock made of metal

'a tilt shift photo of the Las Vegas strip' Multi-Perceptor VQGAN+CLIP v3 Text-to-Image
a tilt shift photo of the Las Vegas strip

'a watercolor painting of reflective spheres 8K 3D' Multi-Perceptor VQGAN+CLIP v3 Text-to-Image
a watercolor painting of reflective spheres 8K 3D

'an art deco painting of an amusement park' Multi-Perceptor VQGAN+CLIP v3 Text-to-Image
an art deco painting of an amusement park

'lineart of Big Bird by Alesso Baldovinetti' Multi-Perceptor VQGAN+CLIP v3 Text-to-Image
lineart of Big Bird by Alesso Baldovinetti

'vector art of a forest fire' Multi-Perceptor VQGAN+CLIP v3 Text-to-Image
vector art of a forest fire


Name: FuseDream
Author: Xingchao Liu et al
Original script: https://github.com/gnobitab/FuseDream
Time for 512×512 on a 3090: 3 minutes 38 seconds
Maximum resolution on a 24 GB 3090: Locked to 512×512.
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: Gives some unique outputs compared to all the previous scripts.

'a clown' FuseDream Text-to-Image
a clown

'a king' FuseDream Text-to-Image
a king

'a matte painting of New York City by Robin Guthrie' FuseDream Text-to-Image
a matte painting of New York City by Robin Guthrie

'a portrait of a young girl' FuseDream Text-to-Image
a portrait of a young girl

'a rough seascape' FuseDream Text-to-Image
a rough seascape

'a sea monster' FuseDream Text-to-Image
a sea monster

'a teddy bear' FuseDream Text-to-Image
a teddy bear

'a werewolf' FuseDream Text-to-Image
a werewolf

'an airbrush painting of an angry woman' FuseDream Text-to-Image
an airbrush painting of an angry woman

'an attractive woman' FuseDream Text-to-Image
an attractive woman


Name: Looking Glass
Author: bearsharktopus
Original script: https://colab.research.google.com/drive/11vdS9dpcZz2Q2efkOjcwyax4oob6N40G
Time for 265×256 on a 3090: 1 minute 19 seconds
Maximum resolution on a 24 GB 3090: Locked to 256×256.
Maximum resolution on an 8GB 2080: 256×256 2 minutes 03 seconds
Description: A variation on ruDALL-E that added support for training the output with a single image or directory of images. It does seem to create better results than the raw ruDALL-E scripts (starting from a single image of random Perlin noise).

'a cemetery trending on pixiv' Looking Glass Text-to-Image
a cemetery trending on pixiv

'a colorful parrot' Looking Glass Text-to-Image
a colorful parrot

'a photo of a house' Looking Glass Text-to-Image
a photo of a house

'a rough seascape' Looking Glass Text-to-Image
a rough seascape

'an alien city' Looking Glass Text-to-Image
an alien city

'an angry person by Eric Auld' Looking Glass Text-to-Image
an angry person by Eric Auld

'an angry woman' Looking Glass Text-to-Image
an angry woman

'an ugly woman' Looking Glass Text-to-Image
an ugly woman

'monkeys' Looking Glass Text-to-Image
monkeys

'Yoda' Looking Glass Text-to-Image
Yoda


Name: Velocity Diffusion
Author: Katherine Crowson
Original script: https://github.com/crowsonkb/v-diffusion-pytorch
Time for 512×512 on a 3090: 3 minutes 57 seconds
Maximum resolution on a 24 GB 3090: 896×512 or 640×640.
Maximum resolution on an 8GB 2080: 128×128 1 minute 19 seconds
Description: The latest script from Katherine Crowson. Unique results compared to her previous diffusion based scripts. Worth experimenting with further.

'a detailed matte painting of traffic' Velocity Diffusion Text-to-Image
a detailed matte painting of traffic

'a detailed painting of Jason Vorhees' Velocity Diffusion Text-to-Image
a detailed painting of Jason Vorhees

'a Ghostbuster' Velocity Diffusion Text-to-Image
a Ghostbuster

'a manga drawing of a lounge room by Yayoi Kusama' Velocity Diffusion Text-to-Image
a manga drawing of a lounge room by Yayoi Kusama

'a mountain range CryEngine' Velocity Diffusion Text-to-Image
a mountain range CryEngine

'a portrait of a young girl made of feathers rendered in unreal engine' Velocity Diffusion Text-to-Image
a portrait of a young girl made of feathers rendered in unreal engine

'a zombie' Velocity Diffusion Text-to-Image
a zombie

'lineart of a Rubiks cube' Velocity Diffusion Text-to-Image
lineart of a Rubiks cube

'The Grinch' Velocity Diffusion Text-to-Image
The Grinch

'vector art of Emporer Palpatine' Velocity Diffusion Text-to-Image
vector art of Emporer Palpatine


Name: ruDALL-E Arbitrary Resolution v1
Author: @nev
Original script: https://colab.research.google.com/drive/1DbqOIUIVBPOrJ4MeaV4YkAlb7ilWQjKZ
Time for 512×512 on a 3090: 4 minutes 40 seconds
Maximum resolution on a 24 GB 3090: 1024×1024
Maximum resolution on an 8GB 2080: 768×768 16 minutes 34 seconds
Description: Allows larger resolution images using the ruDALL-E model. Very nice results and supports larger resolutions on GPUs with less VRAM.

'a color pencil sketch of a werewolf' ruDALL-E Arbitrary Resolution v1 Text-to-Image
a color pencil sketch of a werewolf

'a colorful parrot' ruDALL-E Arbitrary Resolution v1 Text-to-Image
a colorful parrot

'a gorilla' ruDALL-E Arbitrary Resolution v1 Text-to-Image
a gorilla

'a painting of a cabin next to a stream in a secluded forest' ruDALL-E Arbitrary Resolution v1 Text-to-Image
a painting of a cabin next to a stream in a secluded forest

'a portrait of a girl with a dragon tattoo' ruDALL-E Arbitrary Resolution v1 Text-to-Image
a portrait of a girl with a dragon tattoo

'a rose vivid colors' ruDALL-E Arbitrary Resolution v1 Text-to-Image
a rose vivid colors

'a sketch of an ugly man' ruDALL-E Arbitrary Resolution v1 Text-to-Image
a sketch of an ugly man

'a surrealist sculpture of a submarine' ruDALL-E Arbitrary Resolution v1 Text-to-Image
a surrealist sculpture of a submarine

'dense woodland' ruDALL-E Arbitrary Resolution v1 Text-to-Image
dense woodland

'medusa' ruDALL-E Arbitrary Resolution v1 Text-to-Image
medusa


Name: ruDALL-E Arbitrary Resolution v2
Author: @nev
Original script: https://colab.research.google.com/drive/1DbqOIUIVBPOrJ4MeaV4YkAlb7ilWQjKZ
Time for 512×512 on a 3090: 4 minutes 40 seconds
Maximum resolution on a 24 GB 3090: 1024×1024
Maximum resolution on an 8GB 2080: 768×768 15 minutes 48 seconds
Description: v2 of the ruDALL-E Arbitrary Resolution script. Allows larger resolution images using the ruDALL-E model. Very nice results and supports larger resolutions on GPUs with less VRAM.

'a bouquet of flowers' ruDALL-E Arbitrary Resolution v2 Text-to-Image
a bouquet of flowers

'a cross stitch of a well kept garden' ruDALL-E Arbitrary Resolution v2 Text-to-Image
a cross stitch of a well kept garden

'a futuristic city' ruDALL-E Arbitrary Resolution v2 Text-to-Image
a futuristic city

'a large waterfall' ruDALL-E Arbitrary Resolution v2 Text-to-Image
a large waterfall

'a minimalist painting of a castle in the mountains' ruDALL-E Arbitrary Resolution v2 Text-to-Image
a minimalist painting of a castle in the mountains

'a photocopy of a monkey vivid colors' ruDALL-E Arbitrary Resolution v2 Text-to-Image
a photocopy of a monkey vivid colors

'a spooky forest by Laura Muntz Lyall' ruDALL-E Arbitrary Resolution v2 Text-to-Image
a spooky forest by Laura Muntz Lyall

'a teddy bear made of wrought iron' ruDALL-E Arbitrary Resolution v2 Text-to-Image
a teddy bear made of wrought iron

'dense woodland' ruDALL-E Arbitrary Resolution v2 Text-to-Image
dense woodland

'God' ruDALL-E Arbitrary Resolution v2 Text-to-Image
God


Name: GLIDE
Author: Unknown
Original script: https://colab.research.google.com/github/openai/glide-text2im/blob/main/notebooks/text2im.ipynb
Time for 256×256 on a 3090: 23 seconds
Maximum resolution on a 24 GB 3090: Locked to 256×256
Maximum resolution on an 8GB 2080: Locked to 256×256
Description: Images are rendered tiny at 64×64 and then upscaled internally within the script to 256×256 for ouput. The model has been “trimmed” so it cannot do anything human related and only does well for subjects it knows about. Hopefully they release the full model and/or train a larger resolutioon model in the future. Nothing to get excited about yet.

'a cathedral' GLIDE Text-to-Image
a cathedral

'a color pencil sketch of a fire breathing dragon by Erwin Bowien' GLIDE Text-to-Image
a color pencil sketch of a fire breathing dragon by Erwin Bowien

'a gorilla' GLIDE Text-to-Image
a gorilla

'a library' GLIDE Text-to-Image
a library

'a mosaic of monkeys' GLIDE Text-to-Image
a mosaic of monkeys

'a painting of a cabin next to a stream in a secluded forest' GLIDE Text-to-Image
a painting of a cabin next to a stream in a secluded forest

'an elephant' GLIDE Text-to-Image
an elephant

'dinosaurs' GLIDE Text-to-Image
dinosaurs

'goldfish' GLIDE Text-to-Image
goldfish

'the Sydney Harbour Bridge lens flare' GLIDE Text-to-Image
the Sydney Harbour Bridge lens flare


Name: Disco Diffusion
Author: @Somnai
Original script: https://colab.research.google.com/drive/1bItz4NdhAPHg5-u87KcH-MmJZjK-XqHN
Time for 512×512 on a 3090: 3 minutes 18 seconds
Maximum resolution on a 24 GB 3090: 2496×1088 11 minutes 50 seconds
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: Diffusion script that includes all the latest features. Capable of rendering some very nice large resolution images (it may even do better at larger sized images than smaller resolutions like these samples).

'a cute creature' Disco Diffusion Text-to-Image
a cute creature

'a detailed matte painting of a morning landscape' Disco Diffusion Text-to-Image
a detailed matte painting of a morning landscape

'a peacock made of mist by Reinier Nooms' Disco Diffusion Text-to-Image
a peacock made of mist by Reinier Nooms

'a Pokemon character by William Etty' Disco Diffusion Text-to-Image
a Pokemon character by William Etty

'a polaroid photo of an angry woman' Disco Diffusion Text-to-Image
a polaroid photo of an angry woman

'a rough seascape' Disco Diffusion Text-to-Image
a rough seascape

'a watercolor painting of a mountain path by Mark A Brennan rendered in Cinema4D' Disco Diffusion Text-to-Image
a watercolor painting of a mountain path by Mark A Brennan rendered in Cinema4D

'an attractive woman' Disco Diffusion Text-to-Image
an attractive woman

'computer rendering of a desert oasis rendered in unreal engine' Disco Diffusion Text-to-Image
computer rendering of a desert oasis rendered in unreal engine

\

'the Amazon Rainforest by Qian Du' Disco Diffusion Text-to-Image
the Amazon Rainforest by Qian Du


Name: Infinite Diffusion
Author: https://github.com/crowsonkb/v-diffusion-pytorch
Original script: https://colab.research.google.com/drive/1VJrfInU5RbciXXD_8jzY-FntFqiyj6au
Time for 512×512 on a 3090: 3 minutes 32 seconds
Maximum resolution on a 24 GB 3090: 512×512
Maximum resolution on an 8GB 2080: 256×256 3 minutes 15 seconds
Description: Diffusion basecd script. Very VRAM hungry. Renders some unique images compared to the other methods.

'cookie monster eating a cookie' Infinite Diffusion Text-to-Image
cookie monster eating a cookie

'a renaissance painting of a farm by Bernardo Strozzi' Infinite Diffusion Text-to-Image
a renaissance painting of a farm by Bernardo Strozzi

'a silk screen of God' Infinite Diffusion Text-to-Image
a silk screen of God

'a storybook illustration of a cute monster trending on pixiv' Infinite Diffusion Text-to-Image
a storybook illustration of a cute monster trending on pixiv

'a surrealist painting of Frankenstein' Infinite Diffusion Text-to-Image
a surrealist painting of Frankenstein

'a watercolor painting of Yoda' Infinite Diffusion Text-to-Image
a watercolor painting of Yoda

'a worried woman made of clay lens flare' Infinite Diffusion Text-to-Image
a worried woman made of clay lens flare

'an art deco painting of Luke Skywalker' Infinite Diffusion Text-to-Image
an art deco painting of Luke Skywalker

'an oil painting of Buzz Lightyear' Infinite Diffusion Text-to-Image
an oil painting of Buzz Lightyear

'Chewbacca' Infinite Diffusion Text-to-Image
Chewbacca


Name: minDALL-E
Author: Kakao Brain Corp
Original script: https://github.com/kakaobrain/minDALL-E
Time for 256×256 on a 3090: 1 minutes 59 seconds
Maximum resolution on a 24 GB 3090: Locked to 256×256
Maximum resolution on an 8GB 2080: 256×256 1 minute 59 seconds
Description: Another DALL-E variation script. Locked to 256×256 but can geenrate multiple images each run.

'a cozy den' minDALL-E Text-to-Image
a cozy den

'a digital painting of Chewbacca by Willem van de Velde the Elder' minDALL-E Text-to-Image
a digital painting of Chewbacca by Willem van de Velde the Elder

'a sad person' minDALL-E Text-to-Image
a sad person

'a skull' minDALL-E Text-to-Image
a skull

'a storybook illustration of a happy clown by Gwen Barnard' minDALL-E Text-to-Image
a storybook illustration of a happy clown by Gwen Barnard

'a tree by Colin Gill' minDALL-E Text-to-Image
a tree by Colin Gill

'Bugs Bunny' minDALL-E Text-to-Image
Bugs Bunny

'fireworks by Károly Lotz' minDALL-E Text-to-Image
fireworks by Károly Lotz

'The Grand Canyon' minDALL-E Text-to-Image
The Grand Canyon

'Yoda' minDALL-E Text-to-Image
Yoda


Name: ruDOLPH
Author: SBER AI
Original script: https://github.com/sberbank-ai/ru-dolph
Time for 128×128 on a 3090: 1 minutes 15 seconds
Maximum resolution on a 24 GB 3090: Locked to 128×128
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: Another ruDALL-E variation script. Locked to a tiny 128×128 resolution for now until they train the larger models. These examples were 4x upscaled with Real ESRGAN.

'a castle' ruDOLPH Text-to-Image
a castle

'a colorful parrot' ruDOLPH Text-to-Image
a colorful parrot

'a fine art painting of an ugly woman' ruDOLPH Text-to-Image
a fine art painting of an ugly woman

'a kitchen' ruDOLPH Text-to-Image
a kitchen

'a pastel of spirals made of plastic' ruDOLPH Text-to-Image
a pastel of spirals made of plastic

'a photorealistic painting of a cityscape' ruDOLPH Text-to-Image
a photorealistic painting of a cityscape

'a portrait of a woman' ruDOLPH Text-to-Image
a portrait of a woman

'a sad person by Ramon Casas i CarbÃ' ruDOLPH Text-to-Image
a sad person by Ramon Casas i CarbÃ

'kittens' ruDOLPH Text-to-Image
kittens

'vector art of a woman' ruDOLPH Text-to-Image
vector art of a woman


Name: CLIP Guided Deep Image Prior
Author: Daniel Russell
Original script: https://colab.research.google.com/drive/1_oqIK8A67EgtJDdfsuJojc5ukNzirdle
Time for 512×512 on a 3090: 1 minutes 45 seconds
Maximum resolution on a 24 GB 3090: 1024×1024 or 1680×720
Maximum resolution on an 8GB 2080: 512×512 (5 minutes 7 seconds) or 640×360
Description: Interesting script that has decent coherency. If only the output was slightly sharper and the colors slightly richer it would be a winner. Still good for unique outputs that the other methods cannot achieve.

'a flemish baroque of a shrine' CLIP Guided Deep Image Prior
a flemish baroque of a shrine

'a statue of a tardigrade made of clay' CLIP Guided Deep Image Prior
a statue of a tardigrade made of clay

'a surrealist painting of a Pixar character' CLIP Guided Deep Image Prior
a surrealist painting of a Pixar character

'a surrealist painting of an evening landscape 4K photo' CLIP Guided Deep Image Prior
a surrealist painting of an evening landscape 4K photo

'an abstract sculpture of an evil clown by Han Gan' CLIP Guided Deep Image Prior
an abstract sculpture of an evil clown by Han Gan

'an ambient occlusion render of Bugs Bunny made of wood' CLIP Guided Deep Image Prior
an ambient occlusion render of Bugs Bunny made of wood

'Cookie Monster' CLIP Guided Deep Image Prior
Cookie Monster

'Jabba The Hutt by Shūbun Tenshō' CLIP Guided Deep Image Prior
Jabba The Hutt by Shūbun Tenshō

'tentacles by Johanna Marie Fosie' CLIP Guided Deep Image Prior
tentacles by Johanna Marie Fosie

'vector art of heaven' CLIP Guided Deep Image Prior
vector art of heaven


Any Others I Missed?

Do you know of any other colabs and/or github Text-to-Image systems I have missed? Let me know and I will see if I can convert them to work with Visions of Chaos for a future release. If you know of any public Discords with other colabs being shared let me know too.

Jason.

Text-to-Image Summary – Part 1

This is Part 1. There is also Part 2, Part 3, Part 4, Part 5, Part 6, Part 7 and Part 8.

What Are Text-to-Image Systems

Text-to-Image systems/models/scripts/networks (what is the official correct term for these?) are machine learning based models that take a descriptive phrase as input and attempt to generate images that match the input phrase.

Requirements

You do need a decent NVIDIA GPU. 3090 recommended for 768×768 resolution, 2080 for smaller 256×256 images, 10xx possibly for tiny images or if you want to try reduced settings and wait ages for results. If you have a commercial grade GPU with more memory you will be able to push these resolutions higher. VRAM matters more than GPU model, ie you can get 3090s with only 16GB of VRAM and others with 24GB. You may see a laptop with an advertised 3080 GPU, but the total VRAM will likely be much smaller than a desktop 3080. I have now updated these posts with the maximum resolution and times for a 2080 SUPER with 8GB VRAM to give people an idea of what an 8GB VRAM GPU will do.

To run these scripts from Visions of Chaos you need to have installed these prerequisites. Once you get all the prerequisites setup it really is as simple as typing your prompt text and clicking a button. I do include a lot of other settings so you can tweak the script parameters as you do more experimentation.

Text-to-Image GUI

Visions of Chaos Text-to-Image Tutorial

You can watch the following tutorial video to get an idea of how the Text-to-Image mode works in Visions of Chaos.

Text-to-Image Scripts Included With Visions of Chaos

The rest of this blog post (and other parts) lists the 106 (so far) Text-to-Image scripts that I have been able to get working with Visions of Chaos.

If you are the author of one of these scripts then many thanks to you for sharing the code publicly. If you are a creator of a script I do not include here, please leave a comment with a link or send me an email so I can try it out. If you are a better coder than I am and improve any of these also let me know and I will share your fixes with the world.

I have included sample image outputs from each script. Most of the text prompts for these samples come from a prompt builder I include with Visions of Chaos that randomly combines subjects, adjectives, styles and artists.

Note also that these samples all use the default settings for GAN and CLIP models. Most of the included scripts allow tweaking of settings and different models to alter the outputs. There is a much wider range of output images possible. Download Visions of Chaos to experiment with all the combinations of scripts, models, prompts and settings.


Name: Deep Daze
Author: Phil Wang
Original script: https://github.com/lucidrains/deep-daze
Time for 512×512 on a 3090: 1 minutes 53 seconds.
Maximum resolution on a 24 GB 3090: 1024×1024
Maximum resolution on an 8GB 2080: 256×256 1 minute 9 seconds
Description: This was the first Text-to-Image script I ever found and tested. The output images from the original script are very washed out and pastel shaded, but after adding some torchvision transforms for brightness, contrast and sharpness tweaks they are a little better. Very abstract output compared to the other scripts.

'a bronze sculpture of a colorful parrot in the style of Kandinsky' Deep Daze Text-to-Image
a bronze sculpture of a colorful parrot in the style of Kandinsky

'a crying person' Deep Daze Text-to-Image
a crying person

'a desert oasis' Deep Daze Text-to-Image
a desert oasis

'a surrealist painting of the Terminator made of silver' Deep Daze Text-to-Image
a surrealist painting of the Terminator made of silver

'a zombie in the style of Turner' Deep Daze Text-to-Image
a zombie in the style of Turner


Name: Big Sleep
Author: Phil Wang
Original script: https://github.com/lucidrains/big-sleep
Time for 512×512 on a 3090: 4 minutes 0 seconds
Maximum resolution on a 24 GB 3090: 512×512
Maximum resolution on an 8GB 2080: 512×512 6 minutes 39 seconds
Description: Can give a good variety of images for any prompt text and does not suffer from the coloring or tiled image issues some of the other methods do. See here for my older post with a lot of Big Sleep examples. If you give it a chance and run repeated batches of the same prompt you can get some very nice results.

'H R Giger' Big Sleep Text-to-Image
H R Giger

'surrealism' Big Sleep Text-to-Image
surrealism

'colorful surrealism' Big Sleep Text-to-Image
colorful surrealism

'a charcoal drawing of a landscape' Big Sleep Text-to-Image
a charcoal drawing of a landscape


Name: VQGAN+CLIP z-quantize
Author: Katherine Crowson
Original script: https://colab.research.google.com/drive/1L8oL-vLJXVcRzCFbPwOoMkPKJ8-aYdPN
Time for 512×512 on a 3090: 3 minutes 11 seconds
Maximum resolution on a 24 GB 3090: 768×768 or 1120×480
Maximum resolution on an 8GB 2080: 256×256 6 minutes 39 seconds
Description: The outputs tend to be divided up into rectangular regions, but the resulting imagery can be interesting.

'a drawing of a bouquet of flowers made of cardboard' VQGAN+CLIP z-quantize Text-to-Image
a drawing of a bouquet of flowers made of cardboard

'a rose made of silver' VQGAN+CLIP z-quantize Text-to-Image
a rose made of silver

'a tilt shift photo of traffic' VQGAN+CLIP z-quantize Text-to-Image
a tilt shift photo of traffic

'an abstract painting of a house made of crystals' VQGAN+CLIP z-quantize Text-to-Image
an abstract painting of a house made of crystals

'an abstract painting of a skull' VQGAN+CLIP z-quantize Text-to-Image
an abstract painting of a skull

VQGAN+CLIP z-quantize allows specifying an image as the input starting point. If you take the output, stretch it very slightly, and then feed it back into the system each frame you get a movie zooming in. For this movie I used SRCNN Super Resolution to double the resolution of the frames and then Super Slo-Mo for optical flow frame interpolation (both SRCNN and Super Slo-Mo are included with Visions of Chaos). The VQGAN model was “vqgan_imagenet_f16_16384” and the CLIP model was “ViT-B/32”. The prompts were the seven deadly sins, ie “a watercolor painting depicting pride”, “a watercolor painting depicting greed” etc.

The more astute viewers among you will notice there are only 6 of the sins in the previous video. What happened to “lust”? A while back one of my uploads was flagged as porn by the YouTube robots. Their (what I assume is) machine learning based system detected my upload as porn when there was no porn in it. An appeal was met with instant denial and so I now have a permanent “warning” on my channel with no way to talk to a person who could spend 1 minute looking at the video to tell it isn’t porn. Another warning would lead to a strike, so I am being overly cautious and omitting the lust part from the YouTube video. Those who want to see the full 7 part movie can click the following link to watch it on my LBRY channel.

https://open.lbry.com/@Softology:5/Seven-Deadly-Sins:6

Thanks LBRY!


Name: VQGAN+CLIP codebook
Author: Katherine Crowson
Original script: https://colab.research.google.com/drive/15UwYDsnNeldJFHJ9NdgYBYeo6xPmSelP
Time for 512×512 on a 3090: 3 minutes 19 seconds
Maximum resolution on a 24 GB 3090: 768×768 or 1120×480
Maximum resolution on an 8GB 2080: 256×256 3 minutes 46 seconds
Description: VQGAN-CLIP codebook seem to give very similar images for the same prompt phrase, so repeatedly running the script (with different seed values) does not give a wide variety of resulting images. Still gives interesting results.

'a happy alien' VQGAN+CLIP codebook Text-to-Image
a happy alien

'a library' VQGAN+CLIP codebook Text-to-Image
a library

'a teddy bear' VQGAN+CLIP codebook Text-to-Image
a teddy bear

'digital art of a colorful parrot' VQGAN+CLIP codebook Text-to-Image
digital art of a colorful parrot

'digital art of an amusement park' VQGAN+CLIP codebook Text-to-Image
digital art of an amusement park


Name: Aleph2Image Gamma
Author: Ryan Murdock
Original script: https://colab.research.google.com/drive/1VAO22MNQekkrVq8ey2pCRznz4A0_jY29
Time for 512×512 on a 3090: 2 minutes 1 second
Maximum resolution on a 24 GB 3090: Locked to 512×512
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: This one seems to evolve white blotches that grow and take over the entire image. Before the white out stage the images tend to have too much contrast.

'H R Giger' Aleph2Image Gamma Text-to-Image
H R Giger

'surrealism' Aleph2Image Gamma Text-to-Image
surrealism

'seascape painting' Aleph2Image Gamma Text-to-Image
seascape painting


Name: Aleph2Image Delta
Author: Ryan Murdock
Original script: https://colab.research.google.com/drive/1oA1fZP7N1uPBxwbGIvOEXbTsq2ORa9vb
Time for 512×512 on a 3090: 2 minutes 1 second
Maximum resolution on a 24 GB 3090: Locked to 512×512
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: A newer revision of Aleph2Image that doesn’t have the white out issues. The resulting images have much more vibrant colors and that may be a good or bad point depending on your preferences.

'a sketch of an angry person' Aleph2Image Delta Text-to-Image
a sketch of an angry person

'a spooky forest' Aleph2Image Delta Text-to-Image
a spooky forest

'a sunset in the style of Rembrandt' Aleph2Image Delta Text-to-Image
a sunset in the style of Rembrandt

'a surrealist painting of a forest path' Aleph2Image Delta Text-to-Image
a surrealist painting of a forest path

'a tropical beach' Aleph2Image Delta Text-to-Image
a tropical beach


Name: Aleph2Image Delta v2
Author: Ryan Murdock
Original script: https://colab.research.google.com/drive/1NGM9L8qP0gwl5z5GAuB_bd0wTNsxqclG
Time for 512×512 on a 3090: 3 minutes 42 seconds
Maximum resolution on a 24 GB 3090: Locked to 512×512
Maximum resolution on an 8GB 2080: 512×512 7 minutes 05 seconds
Description: A newer revision of Aleph2Image Delta that gives much sharper results. The resulting images tend to be similar to each other for each prompt text so not a lot of variety.

'a cartoon of love in the style of Claude Monet' Aleph2Image Delta v2 Text-to-Image
a cartoon of love in the style of Claude Monet

'a detailed painting of a rose' Aleph2Image Delta v2 Text-to-Image
a detailed painting of a rose

'a drawing of a volcano' Aleph2Image v2 Delta Text-to-Image
a drawing of a volcano

'a house' Aleph2Image v2 Delta Text-to-Image
a house

'a submarine' Aleph2Image v2 Delta Text-to-Image
a submarine


Name: Deep Daze Fourier
Author: Vadim Epstein
Original script: https://colab.research.google.com/gist/afiaka87/e018dfa86d8a716662d30c543ce1b78e/text2image-siren.ipynb
Time for 512×512 on a 3090: 4 minutes 54 seconds
Maximum resolution on a 24 GB 3090: 512×512 or 640×360
Maximum resolution on an 8GB 2080: 128×128 2 minutes 59 seconds
Description: Creates more collaged images with sharp, crisp bright colors.

'a pencil sketch of a vampire made of bones' Deep Daze Fourier Text-to-Image
a pencil sketch of a vampire made of bones

'H R Giger' Deep Daze Fourier Text-to-Image
H R Giger

'medusa made of wood' Deep Daze Fourier Text-to-Image
medusa made of wood

'Shrek eating pizza' Deep Daze Fourier Text-to-Image
Shrek eating pizza

'surrealist Homer Simpson' Deep Daze Fourier Text-to-Image
surrealist Homer Simpson


Name: Text2Image v2
Author: Denis Malimonov
Original script: https://colab.research.google.com/github/tg-bomze/collection-of-notebooks/blob/master/Text2Image_v2.ipynb
Time for 512×512 on a 3090: 1 minute 48 seconds
Maximum resolution on a 24 GB 3090: Locked to 512×512
Maximum resolution on an 8GB 2080: 512×512 3 minutes 12 seconds
Description: Can give more abstract results of the input phrase. Colors and details can be sharp, but not always. Good variety of output for each input phrase. Definitely worth a try.

'a fireplace made of voxels' Text2Image v2 Text-to-Image
a fireplace made of voxels

'a green tree frog in the style of M C Escher' Text2Image v2 Text-to-Image
a green tree frog in the style of M C Escher

'a pencil sketch of an evil alien' Text2Image v2 Text-to-Image
a pencil sketch of an evil alien

'a sea monster' Text2Image v2 Text-to-Image
a sea monster

'The Incredible Hulk made of silver' Text2Image v2 Text-to-Image
The Incredible Hulk made of silver


Name: The Big Sleep Customized
Author: NMKD
Original script: https://colab.research.google.com/drive/1Q2DIeMqYm_Sc5mlurnnurMMVqlgXpZNO
Time for 512×512 on a 3090: 1 minute 45 seconds
Maximum resolution on a 24 GB 3090: Locked to 512×512
Maximum resolution on an 8GB 2080: 512×512 3 minutes 09 seconds
Description: Another good one. Worth exploring further.

'a forest path' The Big Sleep Customized Text-to-Image
a forest path

'a watercolor painting of a colorful parrot in the style of Kandinsky' The Big Sleep Customized Text-to-Image
a watercolor painting of a colorful parrot in the style of Kandinsky

'a western town' The Big Sleep Customized Text-to-Image
a western town

'Christmas' The Big Sleep Customized Text-to-Image
Christmas

'medusa made of vines' The Big Sleep Customized Text-to-Image
medusa made of vines


Name: Big Sleep Minmax
Author: @!goose
Original script: https://colab.research.google.com/drive/12CnlS6lRGtieWujXs3GQ_OlghmFyl8ch
Time for 512×512 on a 3090: 1 minute 45 seconds
Maximum resolution on a 24 GB 3090: Locked to 512×512
Maximum resolution on an 8GB 2080: 512×512 3 minutes 10 seconds
Description: Another interesting Big Sleep variation. Allows a second phrase to be specified that is minimized in the output. For example if your prompt for a landscape painting has too many clouds you could specify clouds as the minimize prompt so the system outputs less clouds in the resulting image.

'a charcoal drawing of an eyeball' Big Sleep Minmax Text-to-Image
a charcoal drawing of an eyeball

'an ultrafine detailed painting of a crying person made of voxels' Big Sleep Minmax Text-to-Image
an ultrafine detailed painting of a crying person made of voxels

'dense woodland' Big Sleep Minmax Text-to-Image
dense woodland

'King Kong made of wrought iron in the style of Frida Kahlo' Big Sleep Minmax Text-to-Image
King Kong made of wrought iron in the style of Frida Kahlo

'Michael Myers' Big Sleep Minmax Text-to-Image
Michael Myers


Name: CLIP Pseudo Slime Mold
Author: hotgrits
Original script: https://discord.com/channels/729741769192767510/730484623028519072/850857930881892372
Time for 512×512 on a 3090: 2 minutes 57 seconds
Maximum resolution on a 24 GB 3090: Locked to 512×512
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: This one gives unique output compared to the others. Really nicely defined sharp details. The colors come from any color palette you select (currently all the 3,479 palettes within Visions of Chaos can be used) so you can “tint” the resulting images with color shades you prefer.

'H R Giger' CLIP Pseudo Slime Mold Text-to-Image
H R Giger

'H R Giger' CLIP Pseudo Slime Mold Text-to-Image
H R Giger with a different color palette

'Shrek eating pizza' CLIP Pseudo Slime Mold Text-to-Image
Shrek eating pizza

'seascape painting' CLIP Pseudo Slime Mold Text-to-Image
seascape painting


Name: Aleph2Image Dall-E Remake
Author: Daniel Russell
Original script: https://colab.research.google.com/drive/17ZSyxCyHUnwI1BgZG22-UFOtCWFvqQjy
Time for 512×512 on a 3090: 3 minutes 42 seconds
Maximum resolution on a 24 GB 3090: 768×768
Maximum resolution on an 8GB 2080: 256×256 3 minutes 02 seconds
Description: Another Aleph2Image variant.

'a color pencil sketch of Jason Vorhees made of plastic' Aleph2Image Dall-E Remake Text-to-Image
a color pencil sketch of Jason Vorhees made of plastic

'a cubist painting of a science laboratory' Aleph2Image Dall-E Remake Text-to-Image
a cubist painting of a science laboratory

'a green tree frog in the style of Kandinsky' Aleph2Image Dall-E Remake Text-to-Image
a green tree frog in the style of Kandinsky

'a watercolor painting of Godzilla' Aleph2Image Dall-E Remake Text-to-Image
a watercolor painting of Godzilla

'an octopus' Aleph2Image Dall-E Remake Text-to-Image
an octopus


Name: VQGAN+CLIP v3
Author: Eleiber
Original script: https://colab.research.google.com/drive/1go6YwMFe5MX6XM9tv-cnQiSTU50N9EeT
Time for 512×512 on a 3090: 2 minutes 52 seconds
Maximum resolution on a 24 GB 3090: 768×768 or 1120×480
Maximum resolution on an 8GB 2080: 256×256 3 minutes 53 seconds
Description: “v3” because it is the third VQGAN system I have tried and it didn’t have a unique specific name. Gives clear sharp images. Can give very painterly results with visible brush strokes if you use “a painting of” before the prompt subject.

'a pencil sketch of a campfire in the style of Da Vinci' VQGAN+CLIP v3 Text-to-Image
a pencil sketch of a campfire in the style of Da Vinci

'a pop art painting of a lush rainforest' VQGAN+CLIP v3 Text-to-Image
a pop art painting of a lush rainforest

'a storybook illustration of a cityscape' VQGAN+CLIP v3 Text-to-Image
a storybook illustration of a cityscape

'an airbrush painting of frogs' VQGAN+CLIP v3 Text-to-Image
an airbrush painting of frogs

'the Amazon Rainforest' VQGAN+CLIP v3 Text-to-Image
the Amazon Rainforest

VQGAN+CLIP v3 allows specifying an image as the input starting point. If you take the output and repeatedly use it as the input with some minor image stretching each frame you can get a movie zooming into the Text-to-Image output. For this movie I used SRCNN Super Resolution to double the resolution of the frames and then Super Slo-Mo for optical flow frame interpolation (both SRCNN and Super Slo-Mo are included with Visions of Chaos). The VQGAN model was “vqgan_imagenet_f16_16384” and the CLIP model was “ViT-B/32”.

This next example movie is showing a “Self-Driven” zoom movie. As in a regular zoom movie the output frames are slightly stretched and fed back into the system each frame. The self-driven difference with this movie is that the Text-to-Image prompt text is automatically changed every 2 seconds by CLIP detecting what it “sees” in the current frame. This way the movie subjects are automatically changed and steered in new directions in a totally automated way. There is no human control except me setting the initial “A landscape” prompt. After that it was fully automated.

By default the CLIP Image Captioning script is very good at detecting what is in an image. Using the default accuracy resulted in a zoom movie that got stuck with a single topic or subject. One got stuck on a slight variation of a prompt dealing with kites, so as the zoom movie went deeper it only showed kites. Luckily after tweaking and decreasing the accuracy of the CLIP captioning the predicitons allow the resulting subjects to drift to new topics during the movie.


Name: VQGAN+CLIP v4
Author: crimeacs
Original script: https://colab.research.google.com/drive/1ZAus_gn2RhTZWzOWUpPERNC0Q8OhZRTZ
Time for 512×512 on a 3090: 2 minutes 37 seconds
Maximum resolution on a 24 GB 3090: 768×768 or 1120×480
Maximum resolution on an 8GB 2080: 256×256 3 minutes 05 seconds
Description: Another improved VQGAN system utilizing pooling. “v4” because it is the forth VQGAN system I have tried and it didn’t have a unique specific name.

'a fine art painting of a cozy den' VQGAN+CLIP v4 Text-to-Image
a fine art painting of a cozy den

'a king in the style of Kandinsky' VQGAN+CLIP v4 Text-to-Image
a king in the style of Kandinsky

'a nurse in the style of Edward Hopper' VQGAN+CLIP v4 Text-to-Image
a nurse in the style of Edward Hopper

'a pastel of a demon' VQGAN+CLIP v4 Text-to-Image
a pastel of a demon

'a watercolor painting of a mountain path' VQGAN+CLIP v4 Text-to-Image
a watercolor painting of a mountain path

VQGAN+CLIP v4 allows specifying an image as the input starting point. If you take the output and repeatedly use it as the input with some minor image stretching each frame you can get a movie zooming into the Text-to-Image output. For this movie I used SRCNN Super Resolution to double the resolution of the frames and then Super Slo-Mo for optical flow frame interpolation (both SRCNN and Super Slo-Mo are included with Visions of Chaos). The VQGAN model was “vqgan_imagenet_f16_16384” and the CLIP model was “ViT-B/32”.

The text prompts for each part came from an idea in a YouTube comment to try more non-specific terms to see what happens, so here are the results of “an image of fear”, “an image of humanity”, “an image of knowledge”, “an image of love”, “an image of morality” and “an image of serenity”.

Here is another example. This time using the prompt of various directors, ie “Stanley Kubrick imagery”, “David Lynch imagery” etc. No super resolution this time. Super Slo-Mo was used for optical flow. I wasn’t sure if YouTube would accept the potentially unsettling horror visuals and I do not want to risk the hassle of a strike, so being on the safe side I am hosting this one on my LBRY channel only. Click the following image to open the movie in a new window. Note that LBRY can be a lot slower to buffer, so you may need to pause it for a while to let the movie load in.

Directors Text-to-Image

If you find that too slow to buffer/load I also have a copy on my BitChute channel here.



Any Others I Missed?

Do you know of any other colabs and/or github Text-to-Image systems I have missed? Let me know and I will see if I can convert them to work with Visions of Chaos for a future release. If you know of any public Discords with other colabs being shared let me know too.

Jason.

Deep Daze Fourier Text-to-Image

NOTE: Make sure you also see this post that has a summary of all the Text-to-Image scripts supported by Visions of Chaos with example images.

More Fascinating Text-to-Image

This time “Deep Daze Fourier” from Vadim Epstein. Code available in this notebook.

Compared to the last Deep Daze that generated washed out and pastel shaded results this Deep Daze creates images with sharp, crisp bright colors.

Sample results

“Shrek eating pizza”

Deep Daze Fourier - Shrek Eating Pizza

Deep Daze Fourier - Shrek Eating Pizza

Deep Daze Fourier - Shrek Eating Pizza

Deep Daze Fourier - Shrek Eating Pizza

“H R Giger”

Deep Daze Fourier - H R Giger

Deep Daze Fourier - H R Giger

Deep Daze Fourier - H R Giger

Deep Daze Fourier - H R Giger

“Freddy Krueger”

Deep Daze Fourier - Freddy Krueger

Deep Daze Fourier - Freddy Krueger

Deep Daze Fourier - Freddy Krueger

Deep Daze Fourier - Freddy Krueger

“Surrealist Homer Simpson”

Deep Daze Fourier - Surrealist Homer Simpson

Deep Daze Fourier - Surrealist Homer Simpson

Deep Daze Fourier - Surrealist Homer Simpson

Deep Daze Fourier - Surrealist Homer Simpson

“rose bush”

Deep Daze Fourier - Rose Bush

Deep Daze Fourier - Rose Bush

Deep Daze Fourier - Rose Bush

Deep Daze Fourier - Rose Bush

Availability

This and the previous Text-to-Image systems I have experimented with (here, here and here) are now supported by a GUI front end in Visions of Chaos. As long as you install these prerequisites and have a decent GPU you will be able to run these systems yourself.

Text-to-Image GUI

For those who love to tinker I have now added a bunch more of the script parameters so you no longer have to edit the Python source code outside Visions of Chaos.

Other Text-to-Image

If you know of any other Text-to-Image systems (with sharable open-source code) then please let me know. All of the Text-to-Image systems I have tested so far all have their own unique behaviors and outputs so I will always be on the lookout for more new variations.

Jason.

Aleph2Image Text-to-Image

NOTE: Make sure you also see this post that has a summary of all the Text-to-Image scripts supported by Visions of Chaos with example images.

Previously I experimented with Big Sleep and other Text-to-Image systems.

This post covers variations of Aleph2Image Text-to_Image. Originally coded by Ryan Murdock.


Aleph2Image “Gamma”

Code from this colab. This one seems to evolve white blotches that grow and take over the entire image. Before the white out stage the images tend to have too much contrast. Previous results from Deep Daze were too washed out, this one is too “contrasty”. If they could both be pushed towards that “sweet spot” they would both look much better.

“surrealism”

Aleph2Image Gamma - Surrealism

Aleph2Image Gamma - Surrealism

Aleph2Image Gamma - Surrealism

Aleph2Image Gamma - Surrealism

“H R Giger”

Aleph2Image Gamma - H R Giger

Aleph2Image Gamma - H R Giger

Aleph2Image Gamma - H R Giger

Aleph2Image Gamma - H R Giger

“seascape oil painting”

Aleph2Image Gamma - Seascape Oil Painting

Aleph2Image Gamma - Seascape Oil Painting

Aleph2Image Gamma - Seascape Oil Painting

Aleph2Image Gamma - Seascape Oil Painting

“frogs in the rain”

Aleph2Image Gamma - Frogs In The Rain

Aleph2Image Gamma - Frogs In The Rain

Aleph2Image Gamma - Frogs In The Rain

Aleph2Image Gamma - Frogs In The Rain


Aleph2Image “Delta”

Code from this colab. A newer revision of Aleph2Image that doesn’t have the white out issues. The resulting images have much more vibrant colors.

“surrealism”

Aleph2Image Delta - Surrealism

Aleph2Image Delta - Surrealism

Aleph2Image Delta - Surrealism

Aleph2Image Delta - Surrealism

“H R Giger”

Aleph2Image Delta - H R Giger

Aleph2Image Delta - H R Giger

Aleph2Image Delta - H R Giger

Aleph2Image Delta - H R Giger

“seascape oil painting”

Aleph2Image Delta - Seascape Oil Painting

Aleph2Image Delta - Seascape Oil Painting

Aleph2Image Delta - Seascape Oil Painting

Aleph2Image Delta - Seascape Oil Painting

“frogs in the rain”

Aleph2Image Delta - Frogs In The Rain

Aleph2Image Delta - Frogs In The Rain

Aleph2Image Delta - Frogs In The Rain

Aleph2Image Delta - Frogs In The Rain


Improved Aleph2Image “Delta” v2

Code from this colab. A newer revision of Aleph2Image Delta that gives much better results, although the results tend to be similar to each other for each prompt text. This and Big Sleep would be the best 2 Text-to-Image systems I have experimented with so far.

“surrealism”

Aleph2Image Delta v2 - Surrealism

Aleph2Image Delta v2 - Surrealism

Aleph2Image Delta v2 - Surrealism

Aleph2Image Delta v2 - Surrealism

“H R Giger”

Aleph2Image Delta v2 - H R Giger

Aleph2Image Delta v2 - H R Giger

Aleph2Image Delta v2 - H R Giger

Aleph2Image Delta v2 - H R Giger

“seascape oil painting”

Aleph2Image Delta v2 - Seascape Oil Painting

Aleph2Image Delta v2 - Seascape Oil Painting

Aleph2Image Delta v2 - Seascape Oil Painting

Aleph2Image Delta v2 - Seascape Oil Painting

“frogs in the rain”

Aleph2Image Delta v2 - Frogs In The Rain

Aleph2Image Delta v2 - Frogs In The Rain

Aleph2Image Delta v2 - Frogs In The Rain

Aleph2Image Delta v2 - Frogs In The Rain


Easy GUI Front End

I include a simple GUI dialog front end for these Text-to-Image systems in Visions of Chaos. As long as you have the prerequisites installed you will be able to convert text prompts into single or multiple images.

Text-to-Image GUI

You do need a GPU with lots of VRAM for these to work (especially the 512×512 image models).

Jason.

Further Explorations Into Text-to-Image Machine Learning

NOTE: Make sure you also see this post that has a summary of all the Text-to-Image scripts supported by Visions of Chaos with example images.

After my initial experiments with Big Sleep Text-to-Image generation I looked around for some more examples to play with. I was really impressed with Big Sleep and you can see some examples of Big Sleep output in that original post. I still think Big Sleep is the best Text-to-Image code I have used so far and better than what is in this post.


Deep Daze

Deep Daze is by Phil Wang and the source code is available here.

Deep Daze tends to generate collage-like images. As the first example image shows the resulting images have a washed out or faded look. I put the rest of the example Deep Daze images through a quick Auto White Balance pass in GIMP.

“H R Giger”

DeepDaze - H R Giger

DeepDaze - H R Giger

“Rainforest”

DeepDaze - Rainforest

“night club”

DeepDaze - Night Club

“seascape painting”

DeepDaze - Seascape Painting

“flowing water”

DeepDaze - Flowing Water


VQGAN-CLIP z+quantize

VQGAN-CLIP using a z+quantize method is from Katherine Crowson. Source code is available here.

This method also has the option to use an image to seed the initial model rather than just random noise, but the following examples were all seeded with noise. The resulting images tend to be divided up into rectangular regions, but the resulting imagery is interesting.

“H R Giger”

VQGAN-CLIP z+quantize - H R Giger

VQGAN-CLIP z+quantize - H R Giger

VQGAN-CLIP z+quantize - H R Giger

VQGAN-CLIP z+quantize - H R Giger

“rainforest”

VQGAN-CLIP z+quantize - Rainforest

VQGAN-CLIP z+quantize - Rainforest

VQGAN-CLIP z+quantize - Rainforest

VQGAN-CLIP z+quantize - Rainforest

“night club”

VQGAN-CLIP z+quantize - Night Club

VQGAN-CLIP z+quantize - Night Club

VQGAN-CLIP z+quantize - Night Club

VQGAN-CLIP z+quantize - Night Club

“seascape painting”

VQGAN-CLIP z+quantize - Seascape Painting

VQGAN-CLIP z+quantize - Seascape Painting

VQGAN-CLIP z+quantize - Seascape Painting

VQGAN-CLIP z+quantize - Seascape Painting

“flowing water”

VQGAN-CLIP z+quantize - Flowing Water

VQGAN-CLIP z+quantize - Flowing Water

VQGAN-CLIP z+quantize - Flowing Water

VQGAN-CLIP z+quantize - Flowing Water


VQGAN-CLIP codebook

VQGAN-CLIP using a codebook method is also from Katherine Crowson. Source code is available here.

VQGAN-CLIP codebook seem to give very similar images for different seeds, so I have only shown two examples for each phrase.

“H R Giger”

VQGAN-CLIP codebook - H R Giger

VQGAN-CLIP codebook - H R Giger

“rainforest”

VQGAN-CLIP codebook - Rainforest

VQGAN-CLIP codebook - Rainforest

“night club”

VQGAN-CLIP codebook - Night Club

VQGAN-CLIP codebook - Night Club

“seascape painting”

VQGAN-CLIP codebook - Seascape Painting

VQGAN-CLIP codebook - Seascape Painting

“flowing water”

VQGAN-CLIP codebook - Flowing Water

VQGAN-CLIP codebook - Flowing Water


Other Text-to-Image Models?

If you know of any other available Text-to-Image systems (that are freely available and shareable) let me know.


Availability

You can follow the above links and download the Python code yourself if you are so inclined.

I do include a basic GUI front-end for these Text-to-Image generators in Visions of Chaos. As long as you have the prerequisites installed (which you would need to install to run these outside Visions of Chaos) then you can experiment with these models yourself without needing to use the command line.

Text-to-Image GUI

Jason.

Super Resolution

The Dream

For years now you would have seen scenes in TV shows like CSI or movies like Blade Runner the “enhance” functionality of software that allows details to be enhanced in images that are only a blur or a few pixels in size. In Blade Runner, Deckard’s system even allowed him to look around corners.

The Reality

I have recently been testing machine learning neural network enhancers (aka super resolution) models. They resize an image while trying to maintain or enhance details without losing detail (or with losing a lot less detail than if the image was zoomed with an image editing tool using linear or bicubic zoom).

Some of my results with these models follows. I am using the following test image from here.

Unprocessed Test Image

To best see the differences between the algorithms I recommend you open the x4 zoomed images in new tabs and switch between them.

SRCNN – Super-Resolution Convolutional Neural Network

To see the original paper on SRCNN, click here.
I am using the PyTorch script by Mirwaisse Djanbaz here.

SRCNN x4

SRCNN x4

SRRESNET

To see the original paper on SRRESNET, click here.
I am using the PyTorch script by Sagar Vinodababu here.

SRRESNET x4

SRRESNET x4

SRGAN – Super Resolution Generative Adversarial Network

To see the original paper on SRGAN, click here.
I am using the PyTorch script by Sagar Vinodababu here.

SRGAN x4

SRGAN x4

ESRGAN – Enhanced Super Resolution Generative Adversarial Network

I am using the PyTorch script by Xintao Wang et al here.

ESRGAN x4

ESRGAN x4

PSNR

I am using the PyTorch script by Xintao Wang et al here.

PSNR x4

PSNR x4

Real-ESRGAN

This is the best super sampler here. I am using the executable by Xintao Wang et al here.

Real-ESRGAN x4

Real-ESRGAN x4

Real-ESRNET

I am using the executable by Xintao Wang et al here.

Real-ESRNET x4

Real-ESRNET x4

SwinIR

Very nice results. May be equal to or better than Real-ESRGAN depending on the input image. I am using the code from this colab.

SwinIR x4

SwinIR x4

SPSR

Another method from here.

SPSR x4

SPSR x4

ruDALL-E Real-ESRGAN

From here.

ruDALL-E Real-ESRGAN x4

SPSR x4

Differences

Each of the algorithms gives different results. For an unknown source image it would probably be best to run it through them all and then see which gives you the best result. These are not the Hollywood or TV enhance magic fix just yet.

If you know of any other PyTorch implementations of super resolution I missed, let me know.

Availability

You can follow the links to the original GitHub repositories to get the software, but I have also added a simple GUI front end for these scripts in Visions of Chaos. That allows you to try the above algorithms on single images or batch process a directory of images.

Jason.

Text-to-Image Machine Learning

NOTE: Make sure you also see this post that has a summary of all the Text-to-Image scripts supported by Visions of Chaos with example images.

Text-to-Image

Input a short phrase or sentence into a neural network and see what image it creates.

I am using Big Sleep from Phil Wang (@lucidrains).

Phil used the code/models from Ryan Murdock (@advadnoun). Ryan has a blog post explaining the basics of how all the parts connect up here. Ryan has some newer Text-to-Image experiments but they are behind a Patreon paywall, so I have not played with them. Hopefully he (or anyone) releases the colabs publicly sometime in the future. I don’t want to experiment with a Text-to-Image system that I cannot share with everyone, otherwise it is just a tease.

The most simple explanation is that BigGAN generates images that try to satisfy CLIP which rates how closely the image matches the input phrase. BigGAN creates an image and CLIP looks at it and says “sorry, that does not look like a cat to me, try again”. As each repeated iteration is performed BigGAN gets better at generating an image that matches the desired phrase text.

Big Sleep Examples

Big Sleep uses a seed number which means you can have thousands/millions of different outputs for the same input phrase. Note there is an issue with the seed not always being able to create the same images though. From my testing, even with the torch_deterministic flag set to True and setting the CUDA envirnmental variable does not help. Every time Big Sleep is called it will generate a different image with the same seed. That means you will never be able to reproduce the same output in the future.

These images are 512×512 pixels square (the largest resolution Big Sleep supports) and took 4 minutes each to generate on an RTX 3090 GPU. The same code takes 6 minutes 45 seconds per image on an older 2080 Super GPU.

Also note that these are the “cherry picked” best results. Big Sleep is not going to create awesome art every time. For these examples or when experimenting with new phrases I usually run a batch of multiple images and then manually select the best 4 or 8 to show off (4 or 8 because that satisfies one or two tweets).

To start, these next four images were created from the prompt phrase “Gandalf and the Balrog”

Big Sleep - Gandalf and the Balrog

Big Sleep - Gandalf and the Balrog

Big Sleep - Gandalf and the Balrog

Big Sleep - Gandalf and the Balrog

Here are results from “disturbing flesh”. These are like early David Cronenberg nightmare visuals.

Big Sleep - Disturbing Flesh

Big Sleep - Disturbing Flesh

Big Sleep - Disturbing Flesh

Big Sleep - Disturbing Flesh

A suggestion from @MatthewKafker on Twitter “spatially ambiguous water lillies painting”

Big Sleep - Spatially Ambiguous Water Lillies Painting

Big Sleep - Spatially Ambiguous Water Lillies Painting

Big Sleep - Spatially Ambiguous Water Lillies Painting

Big Sleep - Spatially Ambiguous Water Lillies Painting

Big Sleep - Spatially Ambiguous Water Lillies Painting

Big Sleep - Spatially Ambiguous Water Lillies Painting

Big Sleep - Spatially Ambiguous Water Lillies Painting

Big Sleep - Spatially Ambiguous Water Lillies Painting

“stormy seascape”

Big Sleep - Stormy Seascape

Big Sleep - Stormy Seascape

Big Sleep - Stormy Seascape

Big Sleep - Stormy Seascape

After experimenting with acrylic pour painting in the past I wanted to see what BigSleep could generate from “acrylic pour painting”

Big Sleep - Acrylic Pour Painting

Big Sleep - Acrylic Pour Painting

Big Sleep - Acrylic Pour Painting

Big Sleep - Acrylic Pour Painting

I have always enjoyed David Lynch movies so let’s see what “david lynch visuals” results in. This one got a lot of surprises and worked great. These images really capture the feeling of a Lynchian cinematic look. A lot of these came out fairly dark so I have tweaked exposure in GIMP.

Big Sleep - David Lynch Visuals

Big Sleep - David Lynch Visuals

Big Sleep - David Lynch Visuals

Big Sleep - David Lynch Visuals

Big Sleep - David Lynch Visuals

Big Sleep - David Lynch Visuals

Big Sleep - David Lynch Visuals

Big Sleep - David Lynch Visuals

More from “david lynch visuals” but these are more portraits. The famous hair comes through.

Big Sleep - David Lynch Visuals

Big Sleep - David Lynch Visuals

Big Sleep - David Lynch Visuals

Big Sleep - David Lynch Visuals

“H.R.Giger”

Big Sleep - H.R.Giger

Big Sleep - H.R.Giger

Big Sleep - H.R.Giger

Big Sleep - H.R.Giger

Big Sleep - H.R.Giger

Big Sleep - H.R.Giger

Big Sleep - H.R.Giger

Big Sleep - H.R.Giger

“metropolis”

Big Sleep - Metropolis

Big Sleep - Metropolis

Big Sleep - Metropolis

Big Sleep - Metropolis

“surrealism”

Big Sleep - Surrealsim

Big Sleep - Surrealsim

Big Sleep - Surrealsim

Big Sleep - Surrealsim

“colorful surrealism”

Big Sleep - Colorful Surrealsim

Big Sleep - Colorful Surrealsim

Big Sleep - Colorful Surrealsim

Big Sleep - Colorful Surrealsim

Availability

I have now added a simple GUI front end for Big Sleep into Visions of Chaos, so once you have installed all the pre-requisites you can run these models on any prompt phrase you feed into them. The following images shows Big Sleep in the process of generating an image for the prompt text “cyberpunk aesthetic”.

Text-to-Image GUI

After spending a lot of time experimenting with Big Sleep over the last few days, I highly encourage anyone with a decent GPU to try these. The results are truly fascinating. This page says at least a 2070 8GB or greater is required, but Martin in the comments managed to generate a 128×128 image on a 1060 6GB GPU after 26 (!!) minutes.

Jason.

Adding PyTorch support to Visions of Chaos

TensorFlow 2

Recently after getting a new 3090 GPU I needed to update TensorFlow to version 2. Going from TensorFlow version 1 to TensorFlow version 2 had way too many code breaking changes for me. Looking at other github examples for TensorFlow 2 code (eg an updated Style Transfer script) gave me all sorts of errors. Not just one git repo either, lots of supposed TensorFlow 2 code would not work for me. If it is a pain for me it is going to be a bigger annoyance for my users. I already get enough emails saying “I followed your TensorFlow instructions exactly, but it doesn’t work”. I am in no way an expert in Python, TensorFlow or PyTorch, so I need something that for most of the time “just works”.

I did manage to get the current TensorFlow 1 scripts in Visions of Chaos running under TensorFlow 2, so at least the existing TensorFlow functionality will still work.

PyTorch

After having a look around and watching some YouTube videos I wanted to give PyTorch a go.

The install is one pip command they build for you on their home page after you select OS, CUDA, etc. So for my current TensorFlow tutorial (maybe I now need to change that to “Machine Learning Tutorial”) all I needed to do was add 1 more line to the pip install section.


pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 torchaudio===0.8.1 -f https://download.pytorch.org/whl/torch_stable.html

PyTorch Style Transfer

First Google hit is the PyTorch tutorial here. After spending most of a day banging my head against the wall with TensorFlow 2 errors, that single self contained Python script using PyTorch “just worked”! The settings do seem harder to tweak to get a good looking output compared to the previous TensorFlow Style Transfer script I use. After making the following examples I may need to look for another PyTorch Style Transfer script.

Here are some example results using Biscuit as the source image.

Biscuit

Biscuit Style Transfer

Biscuit Style Transfer

Biscuit Style Transfer

Biscuit Style Transfer

PyTorch DeepDream

Next up was ProGamerGov’s PyTorch DeepDream implementation. Again, worked fine. I have used ProGamerGov‘s TensorFlow DeepDream code in the past and it worked just as well this time. It gives a bunch of other models to use too, so more different DeepDream outputs for Visions of Chaos are now available.

Biscuit DeepDream

Biscuit DeepDream

Biscuit DeepDream

Biscuit DeepDream

PyTorch StyleGAN2 ADA

Using NVIDIA’s official PyTorch implentation from here. Also easy to get working. You can quickly generate images from existing models.

StyleGAN2 ADA

Metropolitan Museum of Art Faces – NVIDIA – metfaces.pkl

StyleGAN2 ADA

Trypophobia – Sid Black – trypophobia.pkl

StyleGAN2 ADA

Alfred E Neuman – Unknown – AlfredENeuman24_ADA.pkl

StyleGAN2 ADA

Trypophobia – Sid Black – trypophobia.pkl

I include the option to train your own models from a bunch of images. Pro tip: if you do not want to have nightmares do not experiment with training a model based on a bunch of naked women photos.

Going Forward

After these early experiments with PyTorch, I am going to use PyTorch from now on wherever possible.

Jason.