A quick post showing some steps to get NeRF going in Visions of Chaos to help first time users.
Step 1 – Training
1. Create a new empty directory for your trained data eg D:\Nerf Test\
2. Create a directory under that called images eg D:\Nerf Test\images\
3. If you have a series of images you know will work for training, put them under images. Otherwise, you can copy the images from C:\Users\YourUserName\AppData\Roaming\Visions of Chaos\Examples\MachineLearning\Instant Neural Graphics Primitives\data\nerf\fox\images\.
4. Start Visions of Chaos and select Mode->Machine Learning->3D->Instant Neural Graphics Primitives
5. Set the source to be D:\Nerf Test and click Train.
6. Wait for the training to finish. For the fox images on a 3090 it took around 3 minutes.
Step 2 – Viewing
With the Source location still pointing to D:\Nerf Test you can now click View to start the viewer GUI.
If you used the fox images you will see the point cloud of the trained data like the following. Middle mouse button click and drag to slide the model around. Left click and drag to rotate.
Step 3 – Creating a Movie
Lastly you can now create a movie of a virtual camera moving around the 3D point object.
1. Let the points accumulate enough to see a reasonable image that is not too noisy.
2. Scroll down in the settings dialog and expand Snapshot.
3. Click Save.
Now to make the camera path. By default the path dialog is hidden behind the main dialog, so click and drag the main dialog out of the way.
When you have the Camera Path dialog showing, move the camera (middle click and drag, left click and drag) to the position you want your movie to start at.
1. Click Add from cam to add that point.
2. Rotate and zoom to another location and once again click Add from cam.
3. Do this another few times to create the camera key frames.
4. Once you added all the points click Save to save the path.
5. You can now close the GUI.
6. With the Source directory still set to D:\Nerf Test click Movie.
By default it will create a 15 second movie at 30 fps at a size of 1280×720. You can change these settings if you wish.
The movie frames will be created …
…and the movie will play when finished.
The movie is saved under your specified Scene directory.
Train your own images
See the fox images as an idea of images to use. You want a series of images rotating around the subject showing it from all sides you want to see in the final movie.
You can also use a movie to train from of your subject rotating. The movie frames will be extracted for you and then trained as normal.
This post continues listing the Text-to-Image scripts included with Visions of Chaos and some example outputs from each script.
Name: Deforum Stable Diffusion v0.4
Author: Original script by Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, Björn Ommer
Original script: https://colab.research.google.com/github/deforum/stable-diffusion/blob/main/Deforum_Stable_Diffusion.ipynb
Time for 512×512 on a 3090: 34 seconds
Maximum resolution on a 24 GB 3090: 1280×640
Maximum resolution on an 8GB 2080: 640×576
Description: Incredible. Latest and greatest. Beats all previous Text-to-Image systems. If you only use one, use this one. Deforum builds upon Stable Diffusion with animation support. v0.4 is the latest version.
a canal
a forest path
a loft
a matte painting of a river hyperdetailed and CryEngine
a painting of the tropics
a pastel of a nightmare 4K HD realism and trending on Flickr
a photorealistic painting of Cookie Monster rendered in unreal engine and CGSociety
a tropical beach by Karl Hagedorn and Michalis Oikonomou
an etching of King Kong
concept art of Gandalf CGSociety and 4K HD realism
lovecraftian cthulhu tentacle horrors by giger and beksinski, highly textured, 8K 4K HD
roses in the rain, rosebuds, rain drops, 8K 4K HD
Name: Deforum Stable Diffusion v0.5
Author: Original script by Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, Björn Ommer
Original script: https://colab.research.google.com/github/deforum/stable-diffusion/blob/main/Deforum_Stable_Diffusion.ipynb
Time for 512×512 on a 3090: 34 seconds
Maximum resolution on a 24 GB 3090: 1280×640
Maximum resolution on an 8GB 2080: 640×576
Description: Incredible. Latest and greatest. Beats all previous Text-to-Image systems. If you only use one, use this one. Deforum builds upon Stable Diffusion with animation support. v0.5 is the latest version.
a castle
a cute monster
a fine art painting of humans rendered in unreal engine and trending on pixiv
a pop art painting of Frankenstein by Kim Hwan-gi and Zha Shibiao
a sorceress by Adolf Fényes and Rodolfo Morales for sale on Facebook Marketplace and trending on ArtStation
a watercolor painting of a farm by József Breznay and John Zephaniah Bell
an eagle made of feathers and silver
an ugly face
puppies
street art of Jason Vorhees
colorful surrealism by dali, giger, beksinski and haeckel
nebula galaxy planets hubble
Name: Deforum Stable Diffusion v0.6
Author: Original script by Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, Björn Ommer
Original script: https://colab.research.google.com/github/deforum-art/deforum-stable-diffusion/blob/main/Deforum_Stable_Diffusion.ipynb
Time for 512×512 on a 3090: 34 seconds
Maximum resolution on a 24 GB 3090: 1280×640
Maximum resolution on an 8GB 2080: 640×576
Description: Incredible. Latest and greatest. Beats all previous Text-to-Image systems. If you only use one, use this one. Deforum builds upon Stable Diffusion with animation support. v0.6 is the latest version.
a bedroom
a bronze sculpture of Robert DeNiro rendered in unreal engine and trending on Flickr
a chinese painting of a peacock by Agnes Lawrence Pelton and Bob Thompson
a cute girl 4K HD realism and 8K 3D
a fine art painting of a palace made of mist
a green tree frog
a lion
a storybook illustration of the Australian outback
ballpoint pen art of Frankenstein
Brad Pitt by Rhea Carmi and Robert Bechtle
beauty, 4K, 8K, HD, hyper detailed, high detail, surrealism
an oil painting by Picasso and van Gogh, 4K, 8K, HD, hyper detailed, high detail, surrealism
Name: Stable Diffusion v2
Author: Original script by Robin Rombach et al
Original script: https://github.com/Stability-AI/stablediffusion
Time for 768×768 on a 3090: 42 seconds
Maximum resolution on a 24 GB 3090: 1664×704
Maximum resolution on an 8GB 2080: Unable to run on an 8GB GPU.
Description: Uses a newly trained version of the Stable Diffusion model that renders native at 768×768. The following examples show 768×768 sized output.
a cave
a detailed painting of fear IMAX and Flickr
a digital rendering of a human made of chrome and gold
a mansion
a portrait of a sad clown
a spooky forest
a storybook illustration of a lush rainforest for sale on Facebook Marketplace and #film
an etching of a babbling brook
an oil painting of a castle in the mountains
Yoda
Name: Stable Diffusion v2.1
Author: Original script by Robin Rombach et al
Original script: https://github.com/Stability-AI/stablediffusion
Time for 768×768 on a 3090: 42 seconds
Maximum resolution on a 24 GB 3090: 1664×704
Maximum resolution on an 8GB 2080: Unable to run on an 8GB GPU.
Description: Updated Stable Diffusion model. The following examples show 768×768 sized output.
a detailed drawing of Frankenstein
a forest clearing
a frog hyperrealistic and photorealistic
a mountain cabin
a sad clown
a surrealist sculpture of eyeballs
a swamp hyperdetailed and rendered in unreal engine
a townhouse photorealistic and lens flare
an ink drawing of Al Pacino
an ugly creature
Name: Deforum Stable Diffusion v0.7
Author: Original script by Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, Björn Ommer
Original script: https://colab.research.google.com/github/deforum-art/deforum-stable-diffusion/blob/main/Deforum_Stable_Diffusion.ipynb
Time for 768×768 on a 3090: 2 minutes 50 seconds
Maximum resolution on a 24 GB 3090: 2496×1088
Maximum resolution on an 8GB 2080: 640×576
Description: Now supports Stable Diffusion v2.1 model for 768×768 resolution.
a cave
a forest clearing
a lion
a pastel of Big Bird by John Blair and Christoph Ludwig Agricola CryEngine and 4K HD realism
a tributary
a watercolor painting of a western town trending on ArtStation and Tri-X 400 TX
a werewolf
an abstract painting of Gandalf
an alien forest IMAX and vivid colors
an engraving of a cute girl
a hyperrealistic matte painting of melting color, 4K, 8K, HD, high detail, hyper detailed
a hyperrealistic matte painting of a lush rainforest, 4K, 8K, HD, high detail, hyper detailed
a hyperrealistic matte painting of a magical glowing mushroom forest at night, 4K, 8K, HD, high detail, hyper detailed
Name: Kandinsky v2.1
Author: Original script by AI Forever
Original script: https://github.com/ai-forever/Kandinsky-2
Time for 768×768 on a 3090: 1 minute 14 seconds
Maximum resolution on a 24 GB 3090: 1664×704
Maximum resolution on an 8GB 2080: Unable to run on an 8GB GPU.
Description: A new alternative script to Stable Diffusion and other models. Definitely worth a try.
a cabin
a fireman
a hyperrealistic painting of an ocean by Ella Guru and Walter Emerson Baum rendered in unreal engine and photorealistic
a lineart illustration of goldfish 4K photo and vivid colors
a portrait of a beautiful young girl in a garden at dusk
a robot
an impressionist painting of a happy family
an ugly monster
Harry Potter
Spiderman
Name: DeepFloyd IF
Author: Original script by DeepFloyd AI Research Band
Original script: https://github.com/deep-floyd/IF
Time for 1024×1024 on a 3090: 1 minute 17 seconds
Maximum resolution on a 24 GB 3090: 1024×1024 only
Maximum resolution on an 8GB 2080: Unable to run on an 8GB GPU.
Description: A new alternative script to Stable Diffusion and other models. 1024×1024 native resolution is nice.
Click to see these samples in 1024×1024 resolution.
a babbling brook
a cathedral
a collage painting of a vast city lens flare and 8K 3D
a cove
a mountain cabin
a still life of a mountain path
a teddy bear
a worried man made of bones and wire
an allegory of Charmander
gorillas
Any Others I Missed?
Do you know of any other colabs and/or github Text-to-Image systems I have missed? Let me know and I will see if I can convert them to work with Visions of Chaos for a future release. If you know of any public Discords with other colabs being shared let me know too.
This post continues listing the Text-to-Image scripts included with Visions of Chaos and some example outputs from each script.
Name: Multi-Perceptor VQGAN+CLIP v4
Author: Remi Durant
Original script: https://colab.research.google.com/drive/1peZ98vBihDD9A1v7JdH5VvHDUuW5tcRK
Time for 512×512 on a 3090: 2 minutes 36 seconds
Maximum resolution on a 24 GB 3090: 1120×480
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: Version 4 of Remi’s Multi-Perceptor VQGAN+CLIP script.
a bronze sculpture of a garden
a church by Tadeusz Kantor
a color pencil sketch of a monkey hyperdetailed
a comic book panel of a lush rainforest
a matte painting of a witch by William Geissler
a peninsula by Ei-Q CGSociety
a surrealist sculpture of hell
an eyeball made of flowers
cyberpunk art of a canyon
lineart of dense woodland
Name: V-Majesty Diffusion v1.2
Authors: Original script by Dango233 and multimodalart
Original script: https://colab.research.google.com/github/multimodalart/MajestyDiffusion/blob/main/v.ipynb
Time for 512×512 on a 3090: 3 minutes 08 seconds
Maximum resolution on a 24 GB 3090: 1664×704.
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: A new diffusion based script.
a black and white photo of war
a brownstone by Oliver Sin super detailed
a cute creature
a doctor
a drawing of a babbling brook photorealistic
a hill
a ninja by Michael Ford psychedelic
a photo of a beautiful young girl in a summer garden at dusk
a storybook illustration of a cozy den
New York City
Name: Latent Majesty Diffusion v1.3
Authors: Original script by Dango233 and multimodalart
Original script: https://colab.research.google.com/github/multimodalart/MajestyDiffusion/blob/main/latent.ipynb
Time for 512×512 on a 3090: 2 minutes 24 seconds
Maximum resolution on a 24 GB 3090: 512×512 (when using GFPGAN upscaling)
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: Starts with a smaller resolution image (usually 256×256 pixels), upscales it with GFPGAN, and then does a few more diffusion passes. GFPGAN can really help get better coherency in faces.
a hyperrealistic painting of a cute creature
a hyperrealistic painting of an evil clown
a picture of a tree
a surrealist painting of kittens
an engraving of an angry woman made of voxels
an oil painting of an attractive woman by Eileen Aldridge
an ultrafine detailed painting of Bruce Willis 4K HD realism
Robert DeNiro ZBrush
Tweety Pie
Yoda
Name: Huemin JAX Diffusion v2.7
Author: Huemin
Original script: https://colab.research.google.com/github/huemin-art/jax-guided-diffusion/blob/v2.7/Huemin_Jax_Diffusion_2_7.ipynb
Time for 512×512 on a 3090: 3 minutes 55 seconds
Maximum resolution on a 24 GB 3090: 2496×1088
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: Starts with a smaller resolution image (usually 256×256 pixels), upscales it with GFPGAN, and then does a few more diffusion passes. GFPGAN can really help get better coherency in faces.
a babbling brook
a mountain path Flickr
a river CGSociety
a spooky forest by John F. Peto
a storybook illustration of a mansion by Donald Roller Wilson
a surrealist sculpture of an alien forest
a watercolor painting of fear made of bones ZBrush
Name: DALL-E Mini
Author: Original script by Boris Dayma
Original script: https://colab.research.google.com/github/borisdayma/dalle-mini/blob/main/tools/inference/inference_pipeline.ipynb
Time for 512×512 on a 3090: Locked to 256×256 – 1 minute 13 seconds
Maximum resolution on a 24 GB 3090: 256×256
Maximum resolution on an 8GB 2080: 256×256
Description: Capable of rendering multiple images in one pass. Very nice results. Limited to 256×256 at this time. These examples show a 4×4 grid of 16 images for each prompt.
a fine art painting of a fire breathing dragon
a hyperrealistic painting of an ugly monster
a planet
a river
a rose
a surrealist painting of a king
an airbrush painting of satan
an engraving of a frog
an ultrafine detailed painting of fear
Darth Vader trending on pixiv
Name: Latent Majesty Diffusion v1.6
Authors: Original script by Dango233 and multimodalart
Original script: https://colab.research.google.com/github/multimodalart/MajestyDiffusion/blob/main/latent.ipynb
Time for 512×512 on a 3090: 2 minutes 07 seconds
Maximum resolution on a 24 GB 3090: 512×512 (when using GFPGAN upscaling)
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: The latest amazing update to Latent Diffusion. Awesome colors, textures, lighting, details, coherency. Highly recommended.
a colorful parrot
a detailed matte painting of puppies
a gallery
a mountain cabin by Tom Palin 4K HD realism
a renaissance painting of a spooky forest
a school of tropical fish by Jane Carpanini
an ugly creature
the Amazon Rainforest photorealistic
The Grinch
Yoda trending on Flickr
Name: Disco Diffusion v5.4
Authors: Original script by @somnai, @gandamu, @zippy731 and @devdef
Original script: https://colab.research.google.com/github/alembics/disco-diffusion/blob/main/Disco_Diffusion.ipynb
Time for 512×512 on a 3090: 2 minutes 20 seconds
Maximum resolution on a 24 GB 3090: 2496×1088
Maximum resolution on an 8GB 2080: 768×768
Description: Latest version of Disco Diffusion.
a 3D render of the Grand Canyon by Dóra Keresztes
a hyperrealistic painting of a zombie made of cheese and feathers CryEngine and rendered in Cinema4D
a macro photograph of an ugly creature
a matte painting of a monument
a picture of a vast city
a skyscraper
a thunder storm
Jason Vorhees by Chen Chi
reflective spheres hyperrealistic
the country by Robert Thomas and Chen Jiru rendered in unreal engine and 4K photo
a collage painting of a lush rainforest by Doc Hammer and Alexander Ivanov hyperrealistic and CryEngine
a cubist painting of a lion and a sunset CryEngine and trending on pixiv
a fine art painting of a zombie
a gulf by I Ketut Soki and Alfons von Czibulka
a monastery trending on Flickr and #film
a morning landscape
a prairie CGSociety and CryEngine
a werewolf
ballpoint pen art of a monument
cyberpunk art of heaven filmic and CryEngine
Name: CLIP Guided k-diffusion
Author: Original script by Katherine Crowson
Original script: https://colab.research.google.com/drive/1w0HQqxOKCk37orHATPxV8qb0wb4v-qa0
Time for 512×512 on a 3090: 6 minutes 56 seconds
Maximum resolution on a 24 GB 3090: Fixed to 512×512 resolution.
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: A new script by Katherine. Seems to generate more abstract results and these example images needed a long run of random prompts to select from.
a jigsaw puzzle of paranoia by Petr Brandl and Sasha Putrya
a landscape vivid colors
a pastel of Cookie Monster by Ren Bonian and Ángel Botello for sale on Facebook Marketplace and CryEngine
a reef
a renaissance painting of Al Pacino
a statue of a submarine made of metal and crystals by James Sessions American painter and Elfriede Lohse-Wächtler
an airbrush painting of a nightmare creature vivid colors and rendered in Cinema4D
an oil painting of a cephalopod made of paper and mist
an ugly person and an area 4K HD realism and trending on pixiv
conceptual art of an ugly monster
Name: CLIP Prior + VQGAN (MSE method)
Author: Original script by Katherine Crowson
Original script: https://colab.research.google.com/drive/1yOpCY9eXvzELHppvh-o0DevhxVYOGr5i
Time for 512×512 on a 3090: 3 minutes 31 seconds
Maximum resolution on a 24 GB 3090: 832×512
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: A new script by Katherine. Can give some interesting details but coherence may suffer at larger resolutions.
a collage painting of a tiger vivid colors and photorealistic
a cove 4K photo and CryEngine
a cute creature
a glacier
a space nebula
a townhouse
a valley
an oil painting of a peacock by Wu Hong and Eve Ryder
Cthulhu
digital art of a wetland made of cheese and timber by Jacob Duck and Jacob Gerritsz Cuyp
Name: Latent Diffusion LAION_400M v2
Author: Original script by pesser
Original script: https://github.com/pesser/stable-diffusion
Time for 16 256×256 images on a 3090: 49 seconds
Maximum resolution on a 24 GB 3090: 512×512
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: Renders multiple images quickly. Coherency is best at 256×256 so these example images are 2×2 tiled results. Each took 35 seconds on a 3090.
a babbling brook
a colorful parrot
a fine art painting of a castle
a matte painting of a rose
a pencil sketch of a cave 4K photo and hyperrealistic
a photorealistic painting of Cthulhu for sale on Facebook Marketplace and Flickr
a surrealist painting of a cloudy sunset
a surrealist painting of a monkey
an illustration of of a tiger by Stanley Twardowicz and Antoni Pitxot
an impressionist painting of a cottage
Name: Stable Diffusion
Author: Original script by pesser
Original script: https://github.com/CompVis/stable-diffusion
Time for 512×512 on a 3090: 34 seconds
Maximum resolution on a 24 GB 3090: 1280×640
Maximum resolution on an 8GB 2080: 640×576
Description: Incredible. Latest and greatest. Beats all previous Text-to-Image systems. If you only use one, use this one.
a black and white photo of puppies
a cathedral rendered in unreal engine and super detailed
a city made of mist trending on ArtStation and trending on Flickr
a detailed matte painting of a lush rainforest made of crystals and feathers
a king
a polaroid photo of a clown vivid colors and 8K 3D
an airbrush painting of the Terminator CryEngine and for sale on Facebook Marketplace
an ambient occlusion render of a wetland by William Forsyth and Victorine Foot trending on pixiv and CryEngine
poster art of a farm by Frederic Leighton and Yang Borun rendered in unreal engine and 8K 3D
the Australian outback
Name: Deforum Stable Diffusion v0.3
Author: Original script by Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, Björn Ommer
Original script: https://colab.research.google.com/github/deforum/stable-diffusion/blob/main/Deforum_Stable_Diffusion.ipynb
Time for 512×512 on a 3090: 34 seconds
Maximum resolution on a 24 GB 3090: 1280×640
Maximum resolution on an 8GB 2080: 640×576
Description: Incredible. Latest and greatest. Beats all previous Text-to-Image systems. If you only use one, use this one. Deforum builds upon Stable Diffusion with animation support.
a babbling brook
a forest path
a photo of a lake
a ranch by Nikolai Alekseyevich Kasatkin and Harriet Zeitlin
a watercolor painting of a rectory hyperdetailed and trending on ArtStation
Al Pacino by Lam Qua and George Frederick Harris
an angry person by Kazys Varnelis and Dóra Keresztes
Do you know of any other colabs and/or github Text-to-Image systems I have missed? Let me know and I will see if I can convert them to work with Visions of Chaos for a future release. If you know of any public Discords with other colabs being shared let me know too.
This post continues listing the Text-to-Image scripts included with Visions of Chaos and some example outputs from each script.
Name: Augmented CLIP Guided Diffusion
Author: Peter Baylies
Original script: https://github.com/pbaylies/Augmented_CLIP
Time for 512×512 on a 3090: 1 minutes 16 seconds
Maximum resolution on a 24 GB 3090: 1664×704
Maximum resolution on an 8GB 2080: 256×256 57 seconds
Description: Another CLIP Guided Diffusion script. Fast. Gives unique textured results.
a detailed painting of people by Nicolette Macnamara
a diagram of a nightmare creature made of gold
a nightmare creature
a painting of a cabin next to a stream in a secluded forest
a storybook illustration of Jabba The Hutt by Carle Hessay
a werewolf by A R Middleton Todd
an oil painting of Big Bird
Gandalf trending on pixiv
Lovecraftian horror
poster art of the Las Vegas strip by George Passantino
Name: Princess Generator
Author: Dango233
Original script: https://colab.research.google.com/drive/1QgH9TvQMXR3PpEGBcHnghtEcwFDXLaYE
Time for 512×512 on a 3090: 2 minutes 38 seconds
Maximum resolution on a 24 GB 3090: 1664×704
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM.
Description: The latest update to “CLIP Guided Diffusion v6” from Dango233. Can give some superb results. Worth exploring and experimenting with further.
a cloudy sunset
a fireplace by Jacob More
a happy alien by James Jarvaise
a mountain path by Stephen Pace
a raytraced image of a western town
a teddy bear
Charmander made of wood by Hua Yan
dense woodland by Marie Angel
paranoia by Floris van Dyck
portrait of Princess Victoria trending on artstation
Name: Disco Diffusion v4.1
Author: @Somnai
Original script: https://colab.research.google.com/drive/1sHfRn5Y0YKYKi1k-ifUSBFRNJ8_1sa39
Time for 512×512 on a 3090: 1 minute 57 seconds
Maximum resolution on a 24 GB 3090: 2496×1088
Maximum resolution on an 8GB 2080: 1152×512. 4 minutes 39 seconds.
Description: The latest update to Disco Diffusion. Really nice detailed outputs. Low VRAM requirments allow huge sized images. I didn’t realise I had 3 zombie themed results in this random batch.
a bronze sculpture of a zombie
a fantasy land
a pencil sketch of Cthulhu by Rudolf Koller
a pop art painting of zombies
a portrait of a young boy by Hendrick Cornelisz. van Vliet
a tree by Philips Wouwerman
a western town
a zombie
Han Solo psychedelic
vector art of the Amazon Rainforest
Name: Hypertron v2
Author: Philipuss
Original script: https://colab.research.google.com/drive/10fa8X6EsfZfda1dfhJ_BtfPZ7Te1WGoX
Time for 512×512 on a 3090: 1 minute 57 seconds
Maximum resolution on a 24 GB 3090: 1120×480.
Maximum resolution on an 8GB 2080: 256×256 2 minutes 18 seconds
Description: Version 2 of Hypertron. More models, more flavors. Works OK. Can give the “image in a sea of purple/grey” that previous MSE based scripts suffered from. Can give good results if you let it run a large random batch overnight.
a bronze sculpture of a spooky forest by Herb Aach
a diamond made of flowers
a gouache of an android by Wu Bin
a photo of a kitchen
a photorealistic painting of a cemetery
a sketch of a haunted house
a tattoo of Squirtle made of clay
an art deco painting of a human by Nicolas Lancret 8K 3D
goldfish by Elfriede Lohse-Wächtler
Lovecraftian horror by Aileen Eagleton
Name: CC12M Diffusion
Author: Katherine Crowson
Original script: https://colab.research.google.com/drive/1TBo4saFn1BCSfgXsmREFrUl3zSQFg6CC
Time for 512×512 on a 3090: 1 minute 48 seconds
Maximum resolution on a 24 GB 3090: 1664×704.
Maximum resolution on an 8GB 2080: 832×512 2 minutes 59 seconds
Description: Can support higher resolutions, but the coherance really falls apart with anything over 256×256. It handles multiple images at once, so these examples are 4 256×256 results.
a black and white photo of a lush rainforest trending on Flickr
a detailed matte painting of a factory
a hacker by Mykola Burachek
a sea monster CGSociety
a surrealist painting of a happy person
a tardigrade by Cosmo Alexander
an anime drawing of an evening landscape by Daphne Fedarb photorealistic
an art deco painting of a happy person by John Uzzell Edwards
chalk art of a bouquet of flowers
the human condition
Name: Augmented CLIP Guided Diffusion v2
Author: Peter Baylies
Original script: https://github.com/pbaylies/Augmented_CLIP
Time for 512×512 on a 3090: 2 minutes 48 seconds
Maximum resolution on a 24 GB 3090: 1664×704
Maximum resolution on an 8GB 2080: 512×512 4 minutes 56 seconds
Description: Updaterd version of the Augmented CLIP Guided Diffusion script.
a bungalow 4K HD realism
a forest fire
a lush rainforest CryEngine
a painting of a kitchen by Betye Saar
a portrait of a princess trending on artstation
a spooky forest
a tattoo of a zombie
a werewolf by David Cooke Gibson
an oil painting of a lake
an ugly man
Name: v-diffusion
Author: Katherine Crowson
Original script: https://github.com/crowsonkb/v-diffusion-pytorch
Time for 512×512 on a 3090: 3 minutes 57 seconds
Maximum resolution on a 24 GB 3090: 896×512 or 640×640.
Maximum resolution on an 8GB 2080: 128×128 1 minute 19 seconds
Description: Updated version of Velocity-Diffusion. Tends to make incoherant collage images over 256×256.
a black and white photo of a portrait of a young girl
a cityscape by Lujo Bezeredi
a cloudy sunset
a hologram of a sad face by Josef Šíma
a lounge room by Riad Beyrouti IMAX
a mountain path
a portrait of a young boy made of metal
a portrait of a young girl
a space nebula
an acrylic painting of a mountain range
Name: GLID-3
Author: Jack Qiao
Original script: https://github.com/Jack000/glid-3
Time for 512×512 on a 3090: 35 seconds
Maximum resolution on a 24 GB 3090: 768×768.
Maximum resolution on an 8GB 2080: 512×512 50 seconds
Description: Great textures and lighting. Poor image coherency.
a cemetery
a drawing of a cloudy sunset
a drawing of a human lens flare
a lake
a large waterfall made of silver
a marina
a minimalist painting of a teddy bear by Johann Ludwig Bleuler
a hyperrealistic painting of a queen made of flowers
a painting of a happy clown
a skeleton
a stained glass window 4K HD realism
a watercolor painting of a lounge room
an eagle
an ultrafine detailed painting of Harry Potter
vector art of a zombie by Oskar Kokoschka
Name: JAX CLIP Guided Diffusion v2.7
Author: nshepperd
Original script: https://colab.research.google.com/drive/1nmtcbQsE8sTjfLJ1u3Y4d6vi9ZTAvQph
Time for 512×512 on a 3090: 2 minutes 37 seconds
Maximum resolution on a 24 GB 3090: 2496×1088
Maximum resolution on an 8GB 2080: 512×512. 3 minutes 59 seconds.
Description: ANother diffusion based script. Can give very nice high detail results.
a Dalek made of feathers
a haunted house
a picture of a chateau by Odhise Paskali
a refinery
a studio by Allan Ramsay trending on ArtStation
a sunset
a thunder storm
a watercolor painting of a fire breathing dragon
a witch made of mist
the tropics by Thomas de Keyser
Name: GLID-3-XL
Author: Jack Qiao
Original script: https://github.com/Jack000/glid-3-xl
Time for 512×512 on a 3090: 1 minute 04 seconds
Maximum resolution on a 24 GB 3090: 512×512.
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM.
Description: Improved/updated version of GLID-3. Uses CLIP for better accuracy. Great textures and lighting. Poor image coherency when over 256×256.
a demon
a detailed matte painting of a bouquet of flowers
a kitchen
a photorealistic painting of a movie monster hyperrealistic
a picture of The Incredible Hulk by Kazimir Malevich
a pop art painting of an angry woman
a spooky forest
an abbey
New York City by Marie Courtois
poster art of Gandalf vivid colors
Name: ruDALL-E Aspect Ratio
Author: Alex Shonenkov
Original script: https://github.com/shonenkov-AI/rudalle-aspect-ratio
Time for 512×512 on a 3090: N/A
Maximum resolution on a 24 GB 3090: N/A
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM.
Description: Version of ruDALL-E that generates wide and/or tall aspect ratio images. The shorter side is limited to 256 pixels. Results can be very nice. Will generate multiple images at once, so these sample images have 4 results per prompt.
a black and white photo of a werewolf
a cartoon of a swamp
a large waterfall made of metal
a lounge room
a matte painting of a townhouse
a palace made of mist
a photo of an ugly woman
a tropical beach
an evil clown
dense woodland
Any Others I Missed?
Do you know of any other colabs and/or github Text-to-Image systems I have missed? Let me know and I will see if I can convert them to work with Visions of Chaos for a future release. If you know of any public Discords with other colabs being shared let me know too.
This post continues listing the Text-to-Image scripts included with Visions of Chaos and some example outputs from each script.
Name: Multi-Perceptor CLIP Guided Diffusion Secondary Model Method
Author: SOMNAI
Original script: https://colab.research.google.com/drive/1Pf5F84FzWe9iAKNbiPaEo_v4hvQZ9SqS
Time for 512×512 on a 3090: 7 minutes 23 seconds
Maximum resolution on a 24 GB 3090: 1792×768 or 2048×640.
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: The winner for the longest name so far. Needs tweaking as the addition of the secondary model here reduces the usual excellent quality of the Multi-Perceptor CLIP Guided Diffusion. Still shows a lot of potential.
a 3D render of Robocop
a futuristic city IMAX
a matte painting of trypophobia
a renaissance painting of a cloudy sunset trending on ArtStation
a woman 4K photo
an evil clown Flickr
an oil painting of a nightmare creature by Louis Janmot
Indiana Jones
reflective spheres
zombies filmic
Name: Multi-Perceptor VQGAN+CLIP v2
Author: Remi Durant
Original script: https://colab.research.google.com/drive/1peZ98vBihDD9A1v7JdH5VvHDUuW5tcRK
Time for 512×512 on a 3090: 3 minutes 45 seconds
Maximum resolution on a 24 GB 3090: 1120×480.
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: Version 2 of Remi’s Multi-Perceptor VQGAN+CLIP script.
a babbling brook by Zhou Wenjing
a bedroom by Francesco Furini
a computer by Édouard Detaille
a cross stitch of a landscape vivid colors
a kitchen filmic
a matte painting of halloween
a pastel of a peacock
a storybook illustration of a kitchen by Lena Alexander
an oil on canvas painting of a zombie made of voxels
a bronze sculpture of a crying person by Auguste BaudBovy
a flemish baroque of a bouquet of flowers
a haunted house trending on ArtStation
a hyperrealistic painting of trypophobia by Xia Gui
a nightmare creature
a space nebula rendered in Cinema4D
a tentacle monster 4K HD realism
an oil on canvas painting of Danny Trejo by Pablo Rey
Frankenstein
heaven 8K 3D
Name: Multi-Perceptor VQGAN+CLIP v3
Author: Remi Durant
Original script: https://colab.research.google.com/drive/1peZ98vBihDD9A1v7JdH5VvHDUuW5tcRK
Time for 512×512 on a 3090: 3 minutes 38 seconds
Maximum resolution on a 24 GB 3090: 1120×480.
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: Version 3 of Remi’s Multi-Perceptor VQGAN+CLIP script.
a bronze sculpture of Gandalf
a clown made of clay
a detailed painting of a desert oasis
a house by Kathleen Guthrie
a peacock made of metal
a tilt shift photo of the Las Vegas strip
a watercolor painting of reflective spheres 8K 3D
an art deco painting of an amusement park
lineart of Big Bird by Alesso Baldovinetti
vector art of a forest fire
Name: FuseDream
Author: Xingchao Liu et al
Original script: https://github.com/gnobitab/FuseDream
Time for 512×512 on a 3090: 3 minutes 38 seconds
Maximum resolution on a 24 GB 3090: Locked to 512×512.
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: Gives some unique outputs compared to all the previous scripts.
a clown
a king
a matte painting of New York City by Robin Guthrie
a portrait of a young girl
a rough seascape
a sea monster
a teddy bear
a werewolf
an airbrush painting of an angry woman
an attractive woman
Name: Looking Glass
Author: bearsharktopus
Original script: https://colab.research.google.com/drive/11vdS9dpcZz2Q2efkOjcwyax4oob6N40G
Time for 265×256 on a 3090: 1 minute 19 seconds
Maximum resolution on a 24 GB 3090: Locked to 256×256.
Maximum resolution on an 8GB 2080: 256×256 2 minutes 03 seconds
Description: A variation on ruDALL-E that added support for training the output with a single image or directory of images. It does seem to create better results than the raw ruDALL-E scripts (starting from a single image of random Perlin noise).
a cemetery trending on pixiv
a colorful parrot
a photo of a house
a rough seascape
an alien city
an angry person by Eric Auld
an angry woman
an ugly woman
monkeys
Yoda
Name: Velocity Diffusion
Author: Katherine Crowson
Original script: https://github.com/crowsonkb/v-diffusion-pytorch
Time for 512×512 on a 3090: 3 minutes 57 seconds
Maximum resolution on a 24 GB 3090: 896×512 or 640×640.
Maximum resolution on an 8GB 2080: 128×128 1 minute 19 seconds
Description: The latest script from Katherine Crowson. Unique results compared to her previous diffusion based scripts. Worth experimenting with further.
a detailed matte painting of traffic
a detailed painting of Jason Vorhees
a Ghostbuster
a manga drawing of a lounge room by Yayoi Kusama
a mountain range CryEngine
a portrait of a young girl made of feathers rendered in unreal engine
a zombie
lineart of a Rubiks cube
The Grinch
vector art of Emporer Palpatine
Name: ruDALL-E Arbitrary Resolution v1
Author: @nev
Original script: https://colab.research.google.com/drive/1DbqOIUIVBPOrJ4MeaV4YkAlb7ilWQjKZ
Time for 512×512 on a 3090: 4 minutes 40 seconds
Maximum resolution on a 24 GB 3090: 1024×1024
Maximum resolution on an 8GB 2080: 768×768 16 minutes 34 seconds
Description: Allows larger resolution images using the ruDALL-E model. Very nice results and supports larger resolutions on GPUs with less VRAM.
a color pencil sketch of a werewolf
a colorful parrot
a gorilla
a painting of a cabin next to a stream in a secluded forest
a portrait of a girl with a dragon tattoo
a rose vivid colors
a sketch of an ugly man
a surrealist sculpture of a submarine
dense woodland
medusa
Name: ruDALL-E Arbitrary Resolution v2
Author: @nev
Original script: https://colab.research.google.com/drive/1DbqOIUIVBPOrJ4MeaV4YkAlb7ilWQjKZ
Time for 512×512 on a 3090: 4 minutes 40 seconds
Maximum resolution on a 24 GB 3090: 1024×1024
Maximum resolution on an 8GB 2080: 768×768 15 minutes 48 seconds
Description: v2 of the ruDALL-E Arbitrary Resolution script. Allows larger resolution images using the ruDALL-E model. Very nice results and supports larger resolutions on GPUs with less VRAM.
a bouquet of flowers
a cross stitch of a well kept garden
a futuristic city
a large waterfall
a minimalist painting of a castle in the mountains
a photocopy of a monkey vivid colors
a spooky forest by Laura Muntz Lyall
a teddy bear made of wrought iron
dense woodland
God
Name: GLIDE
Author: Unknown
Original script: https://colab.research.google.com/github/openai/glide-text2im/blob/main/notebooks/text2im.ipynb
Time for 256×256 on a 3090: 23 seconds
Maximum resolution on a 24 GB 3090: Locked to 256×256
Maximum resolution on an 8GB 2080: Locked to 256×256
Description: Images are rendered tiny at 64×64 and then upscaled internally within the script to 256×256 for ouput. The model has been “trimmed” so it cannot do anything human related and only does well for subjects it knows about. Hopefully they release the full model and/or train a larger resolutioon model in the future. Nothing to get excited about yet.
a cathedral
a color pencil sketch of a fire breathing dragon by Erwin Bowien
a gorilla
a library
a mosaic of monkeys
a painting of a cabin next to a stream in a secluded forest
an elephant
dinosaurs
goldfish
the Sydney Harbour Bridge lens flare
Name: Disco Diffusion
Author: @Somnai
Original script: https://colab.research.google.com/drive/1bItz4NdhAPHg5-u87KcH-MmJZjK-XqHN
Time for 512×512 on a 3090: 3 minutes 18 seconds
Maximum resolution on a 24 GB 3090: 2496×1088 11 minutes 50 seconds
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: Diffusion script that includes all the latest features. Capable of rendering some very nice large resolution images (it may even do better at larger sized images than smaller resolutions like these samples).
a cute creature
a detailed matte painting of a morning landscape
a peacock made of mist by Reinier Nooms
a Pokemon character by William Etty
a polaroid photo of an angry woman
a rough seascape
a watercolor painting of a mountain path by Mark A Brennan rendered in Cinema4D
an attractive woman
computer rendering of a desert oasis rendered in unreal engine
a renaissance painting of a farm by Bernardo Strozzi
a silk screen of God
a storybook illustration of a cute monster trending on pixiv
a surrealist painting of Frankenstein
a watercolor painting of Yoda
a worried woman made of clay lens flare
an art deco painting of Luke Skywalker
an oil painting of Buzz Lightyear
Chewbacca
Name: minDALL-E
Author: Kakao Brain Corp
Original script: https://github.com/kakaobrain/minDALL-E
Time for 256×256 on a 3090: 1 minutes 59 seconds
Maximum resolution on a 24 GB 3090: Locked to 256×256
Maximum resolution on an 8GB 2080: 256×256 1 minute 59 seconds
Description: Another DALL-E variation script. Locked to 256×256 but can geenrate multiple images each run.
a cozy den
a digital painting of Chewbacca by Willem van de Velde the Elder
a sad person
a skull
a storybook illustration of a happy clown by Gwen Barnard
a tree by Colin Gill
Bugs Bunny
fireworks by Károly Lotz
The Grand Canyon
Yoda
Name: ruDOLPH
Author: SBER AI
Original script: https://github.com/sberbank-ai/ru-dolph
Time for 128×128 on a 3090: 1 minutes 15 seconds
Maximum resolution on a 24 GB 3090: Locked to 128×128
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: Another ruDALL-E variation script. Locked to a tiny 128×128 resolution for now until they train the larger models. These examples were 4x upscaled with Real ESRGAN.
a castle
a colorful parrot
a fine art painting of an ugly woman
a kitchen
a pastel of spirals made of plastic
a photorealistic painting of a cityscape
a portrait of a woman
a sad person by Ramon Casas i CarbÃ
kittens
vector art of a woman
Name: CLIP Guided Deep Image Prior
Author: Daniel Russell
Original script: https://colab.research.google.com/drive/1_oqIK8A67EgtJDdfsuJojc5ukNzirdle
Time for 512×512 on a 3090: 1 minutes 45 seconds
Maximum resolution on a 24 GB 3090: 1024×1024 or 1680×720
Maximum resolution on an 8GB 2080: 512×512 (5 minutes 7 seconds) or 640×360
Description: Interesting script that has decent coherency. If only the output was slightly sharper and the colors slightly richer it would be a winner. Still good for unique outputs that the other methods cannot achieve.
a flemish baroque of a shrine
a statue of a tardigrade made of clay
a surrealist painting of a Pixar character
a surrealist painting of an evening landscape 4K photo
an abstract sculpture of an evil clown by Han Gan
an ambient occlusion render of Bugs Bunny made of wood
Cookie Monster
Jabba The Hutt by Shūbun Tenshō
tentacles by Johanna Marie Fosie
vector art of heaven
Any Others I Missed?
Do you know of any other colabs and/or github Text-to-Image systems I have missed? Let me know and I will see if I can convert them to work with Visions of Chaos for a future release. If you know of any public Discords with other colabs being shared let me know too.
This post continues listing the Text-to-Image scripts included with Visions of Chaos and some example outputs from each script.
Name: PixelDraw
Author: dribnet
Original script: https://colab.research.google.com/github/dribnet/clipit/blob/master/demos/PixelDrawer.ipynb
Time for 512×512 on a 3090: 1 minutes 59 seconds
Maximum resolution on a 24 GB 3090: Huge. 4096×4096 and beyond.
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: Generates “pixel art” images. I had a lot of requests to add support for this one.
a cartoon of a peacock
a cloudy sunset
a gorilla
a morning landscape
a watercolor painting of a castle
an art deco painting of Al Pacino
Hell
Shrek
Name: DirectVisions
Author: Jens Goldberg
Original script: https://colab.research.google.com/drive/127lKSsQjx-UDDUSvIkLL6mREfZ0KQu5D
Time for 512×512 on a 3090: 2 minutes 39 seconds
Maximum resolution on a 24 GB 3090: Huge. 4096×4096 and beyond.
Maximum resolution on an 8GB 2080: 4096×4096
Description: Interesting detailed images. Can create huge resolution results.
a color pencil sketch of a western town
a detailed painting of a cephalopod
a digital rendering of an ugly face
a pencil sketch of Buzz Lightyear
a rough seascape by Pinchus Kremegne
a stock photo of a president
a sunset
an alien city
an alien forest by Helen Berman
an evening landscape
Name: Pixel Direct
Author: Unknown
Original script: https://colab.research.google.com/drive/1F9ZOZnpV3uBPRDSESaAXYwzNZJQRJT75
Time for 512×512 on a 3090: 1 minutes 03 seconds
Maximum resolution on a 24 GB 3090: Huge. 4096×4096 and beyond.
Maximum resolution on an 8GB 2080: 2048×2048 1 minute 51 seconds
Description: Another “Pixel Art” script. More abstract results than the PixelDraw script above.
a bronze sculpture of a nightmare creature
a cartoon of Al Pacino
a nightclub
a silk screen of a bouquet of flowers
an etching of a worried woman
an illustration of of a thunder storm
Name: FourierVisions
Author: Unknown
Original script: https://colab.research.google.com/drive/1nGNBjhbYnDHSumGPjpFHjDOsaZFAqGgF
Time for 512×512 on a 3090: 1 minutes 40 seconds
Maximum resolution on a 24 GB 3090: Huge. 4096×4096 and beyond.
Maximum resolution on an 8GB 2080: 1024×1024 4 minutes 07 seconds
Description: Detailed images. The default script generates washed out pastel images, but with some gamma and brightness tweaks they can be improved (still not ideal, but better). Allows very large resolution images.
a cathedral
a charcoal drawing of zombies
a detailed painting of a sunset by Thomas Cantrell Dugdale
a ghost made of mist
a kitchen
a movie monster
a pencil sketch of a sad clown
a werewolf
an evil clown by Viktor Oliva
an ink drawing of an ugly monster
Name: PyramidVisions
Author: Unknown
Original script: https://colab.research.google.com/drive/1dpAS_wK34y7c6s-CatAFmBtbkjGT_erM
Time for 512×512 on a 3090: 3 minutes 08 seconds
Maximum resolution on a 24 GB 3090: Huge. 4096×4096 and beyond.
Maximum resolution on an 8GB 2080: 1024×1024 10 minutes 48 seconds
Description: Very detailed images. Not the fastest script, but gives some very nice results. Lower VRAM requirements so good for lesser spec GPUs. Definitely one of the better scripts worth exploring.
a desert oasis
a lush rainforest
a marble sculpture of an angry person
a minimalist painting of the Amazon Rainforest
a nightmare creature
a pastel of a computer made of paper
an abstract sculpture of a sad clown
an acrylic painting of an alien forest | vivid colors
Medusa
vector art of an ugly woman
Name: Visions of AI v1
Author: Jason Rampe
Original script: Included with Visions of Chaos. No colab.
Time for 512×512 on a 3090: 1 minutes 32 seconds
Maximum resolution on a 24 GB 3090: 768×768 or 1120×480.
Maximum resolution on an 8GB 2080: 256×256 1 minute 33 seconds
Description: My first attempt at actually creating a Text-to-Image script. Based on the excellent example from Jonathan Whitaker‘s AIAIArt Lesson 3 tutorial. Gives some very nice fine detail in some areas, but suffers the non coherance of other scripts in that it creates multiple copies of the subject throughout the image. After actually trying to write my own script I only have more respect for those who can do this. Hopefully I can improve these results for a version 2. In the meantime, here are some sample from the current Visions of AI script.
a cartoon of the human condition by Judy Takács
a cubist painting of an evening landscape
a digital rendering of frogs
a fire breathing dragon
a hyperrealistic painting of a movie monster
a morning landscape
a shark
a woodcut of an ugly man
an airbrush painting of C-3PO
Frankenstein
Name: Visions of AI v2
Author: Jason Rampe
Original script: Included with Visions of Chaos. No colab.
Time for 512×512 on a 3090: 2 minutes 35 seconds
Maximum resolution on a 24 GB 3090: 768×768 or 1120×480.
Maximum resolution on an 8GB 2080: 256×256 2 minutes 36 seconds
Description: An attempt to improve the coherency of the previous script. The first 30 iterations zoom into the image every 10 frames. This results in larger shapes/blobs for the rest of the script to work from. The idea is that it will give larger subjects compared to the v1 script. Kind of works. Gives blurrier results. To be fixed in the next version?
a morning landscape by William Gear
a raytraced image of a nightclub lens flare
a tentacle monster by Carlo Crivelli
a woodcut of a worried woman by Li Keran
an illustration of of a cave made of cheese
Cthulhu
cyberpunk art of a futuristic city
goldfish
reflective spheres
the Australian outback
Name: Multi-Perceptor CLIP Guided Diffusion
Author: Varkarrus
Original script: https://colab.research.google.com/drive/1y3Vt39A5KSNFRa6Z2bCqDHxteZSVH9NC
Time for 512×512 on a 3090: 3 minutes 08 seconds
Maximum resolution on a 24 GB 3090: 896×512 or 1152×384 (dimensions must be divisible by 128).
Maximum resolution on an 8GB 2080: 128×128 1 minute 56 seconds
Description: Builds upon previous CLIP Guided Diffusion scripts. Like the previous script by Dango233 it uses three CLIP models simultaneously to “rate” the generated images, and I have added options to use up to six different CLIP models. The resulting image accuracy compared to the prompt, and the resulting image coherence seem to be much better than previous CLIP Guided Diffusion scripts that could almost have random outputs sometimes. This script is superb and highly recommended. Great lighting, textures and brushstrokes. Normally with these blog posts I do a batch run of random prompts overnight and then pick the best 10 images. In this case I had nearly 50 images in my “good” folder after going through the batch results. So, for this script I am showing 20 sample images.
a cute creature | TriX 400 TX
a digital painting of Frankenstein by Kanzan Shimomura
a morning landscape by János SaxonSzász
a nightmare creature
a photorealistic painting of a teddy bear
a portrait of a young girl
a space nebula | IMAX
a worried man
a zombie by Nathaniel Hone
an acrylic painting of a spider by Abram Arkhipov
an airbrush painting of a monkey by Jeremy Henderson
an alien landscape
an ugly creature made of insects
an ultrafine detailed painting of a sad person | ZBrush
Arnold Schwarzenegger | trending on ArtStation
concept art of Robocop
dinosaurs
Dracula | CGSociety
flesh made of insects
God by William Simpson
Name: Pixel MultiColors
Author: Remi Durant
Original script: https://colab.research.google.com/drive/17c-13cl_VQKpHq2rDrnFVi6ZT-CHeZNn
Time for 512×512 on a 3090: 0 minutes 44 seconds
Maximum resolution on a 24 GB 3090: 4096×4096.
Maximum resolution on an 8GB 2080: 2048×2048 7 minutes 45 seconds
Description: Very noisy/pixelated/abstract results. The default script gives dark images which some tweaks to brightness and contrast can help. Maybe a little bit of blur could help too in a future revision. It is fast though, and can support huge image sizes.
a charcoal drawing of a cute creature made of metal
a hyperrealistic painting of Chewbacca by Edith Grace Wheatley
a low poly render of Pikachu
a man
a rose
a stock photo of puppies
egyptian art of a portrait of a woman
Harry Potter
Indiana Jones
Robocop made of gold
Yoda
Name: ruVQGAN+CLIP
Author: nev
Original script: https://colab.research.google.com/drive/1wAnIHocDYFAbWtA7rk8C7cFEUdRyLzwZ
Time for 512×512 on a 3090: 1 minute 28 seconds
Maximum resolution on a 24 GB 3090: 1120×480.
Maximum resolution on an 8GB 2080: 256×256 1 minute 27 seconds
Description: Creates fairly blurry results. Even with post process sharpening. If anyone could get these results crisper it would be really improve the output.
a 3D render of a wizard by Gertrude Greene
a cubist painting of a Pokemon character
a cute creature
a matte painting of halloween by Carlos Trillo Name
a photorealistic painting of an alien landscape by Jacob Ochtervelt
a rough seascape filmic
a sea monster
a woodcut of a skull by Gu Hongzhong trending on ArtStation
Cthulhu
trypophobia
Name: Multi-Perceptor VQGAN+CLIP
Author: Remi Durant
Original script: https://colab.research.google.com/drive/1peZ98vBihDD9A1v7JdH5VvHDUuW5tcRK
Time for 512×512 on a 3090: 2 minute 30 seconds
Maximum resolution on a 24 GB 3090: 1120×480.
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: As with the previous Multi-Perceptor CLIP Guided Diffusion scripts this one allows two different CLIP models to be used to rate the VQGAN output images. VQGAN is not going to beat diffusion for image coherance, but this script can give some very nice lighting and fine details in images.
a bronze sculpture of an evil clown made of clay by Dionisio Baixeras Verdaguer
a fantasy land by Shigeru Aoki
a hyperrealistic painting of puppies
a midnineteenth century engraving of the Sydney Opera House
a statue of reflective spheres
a surrealist painting of a tropical beach
an alien city CGSociety
an oil painting of a fire breathing dragon
computer rendering of a well kept garden by Norman Garstin ZBrush
war CryEngine
Name: Hypertron
Author: Philipuss
Original script: https://colab.research.google.com/drive/10fa8X6EsfZfda1dfhJ_BtfPZ7Te1WGoX
Time for 512×512 on a 3090: 2 minute 00 seconds
Maximum resolution on a 24 GB 3090: 1120×480.
Maximum resolution on an 8GB 2080: 256×256 1 minute 35 seconds
Description: Another VQGAN based script. Has various “flavors” to give different results. Works OK. Can give the “image in a sea of purple/grey” that previous MSE based scripts suffered from. Still worth a try.
a black and white photo of a fireman
a cute monster by Józef Mehoffer
a matte painting of a forest clearing
a pop art painting of a human
a renaissance painting of a ghost by Jan van de Cappelle film
a sea monster made of metal
a tattoo of a zombie
a watercolor painting of a dragon Flickr
an art deco painting of a haunted house by Mary Cameron
concept art of a mountainscape by Maximilian Cercha
Name: CLIP Guided Diffusion Secondary Model Method
Author: Katherine Crowson
Original script: https://colab.research.google.com/drive/1mpkrhOjoyzPeSWy2r7T8EYRaU7amYOOi
Time for 512×512 on a 3090: 2 minute 28 seconds
Maximum resolution on a 24 GB 3090: 1792×768 or 2048×640.
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: A new diffusion based script from Katherine Crowson including a new “secondary model” she trained. Capable of some unique results with good textures and lighting.
a detailed painting of Fozzy Bear by LeConte Stewart
a flemish baroque of a happy person trending on pixiv
a flock of birds
a Ghostbuster CGSociety
a kitchen made of cheese
a nightmare creature
a photorealistic painting of The Grinch
a portrait of a woman
an art deco painting of a sad clown
an oil painting of a nightmare
Any Others I Missed?
Do you know of any other colabs and/or github Text-to-Image systems I have missed? Let me know and I will see if I can convert them to work with Visions of Chaos for a future release. If you know of any public Discords with other colabs being shared let me know too.
This post continues listing the Text-to-Image scripts included with Visions of Chaos and some example outputs from each script.
Name: CLIP Guided Diffusion v4
Author: Katherine Crowson
Original script: https://colab.research.google.com/drive/1V66mUeJbXrTuQITvJunvnWVn96FEbSI3
Time for 512×512 on a 3090: 3 minutes 05 seconds
Maximum resolution on a 24 GB 3090: Locked to 512×512
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: Another CLIP Guided Diffusion script. Locked to 512×512 resolution. Like the other CLIP Diffusion scripts, some of the results can be very detailed and interesting, but a lot of time it is hit and miss to get a result that reliably matches the input phrase. When it gets a “hit” it can create very detailed impressive results, but the amount of “misses” stops it from getting a great rating. Still worth a try if you have the patience to run a large batch of images waiting for the best results. The following samples came hand picked from a large batch run of random prompt phrases.
a forest clearing
a storybook illustration of a nightmare
an impressionist painting of a cemetery
Harry Potter in the style of Rembrandt
a detailed painting of a witch
a babbling brook
a desert oasis
a hyperrealistic painting of an android
eyeballs
a cross stitch of Buzz Lightyear
Name: CLIP Guided Decision Transformer
Author: Katherine Crowson
Original script: https://colab.research.google.com/drive/1V66mUeJbXrTuQITvJunvnWVn96FEbSI3
Time for 512×512 on a 3090: 1 minutes 13 seconds
Maximum resolution on a 24 GB 3090: Locked to 384×384
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: Another one from Katherine Crowson. Some of the results can be very detailed and interesting, but a lot of time it is hit and miss to get a result that reliably matches the input phrase. When it gets a “hit” it can create very detailed impressive results, but the amount of “misses” stops it from getting a great rating. The following samples came hand picked from a large batch run of random prompt phrases.
Another good point for CLIP Decsision Transformer is that it will generate a batch of images from each run. So rather than a single image for the prompt text you can specify (for example) 8 images to be generated from the prompt. This allows a much larger set of images to be quickly generated to find those great outputs in.
For these images I have enhanced the resolution 4x using Real-ESRGAN (the thumnails are the original output images and the clicked images are resized x4).
a detailed painting of a palace by Thomas Kinkade
a drawing of Chewbacca
a forest path
a renaissance painting of a mountain range
a rough seascape
a rough seascape
a spooky forest
an oil on canvas painting of a western town
Frankenstein
The Grand Canyon
Name: CLIPIT
Author: dribnet
Original script: https://github.com/dribnet/clipit
Time for 512×512 on a 3090: 2 minutes 38 seconds
Maximum resolution on a 24 GB 3090: 768×768 or 1120×480
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: Another GAN+CLIP script. Gives nice results that tend to match the prompt text more closely. This one is heavy on VAM usage.
a silk screen of a tropical beach in the style of Kandinsky
a woodcut of a nightmare creature
an illustration of of a mountainscape
an ultrafine detailed painting of a green tree frog as created by Craig Mullins
Dracula
Planets
Name: VQGAN+CLIP v5
Author: Max Woolf
Original script: https://colab.research.google.com/drive/1wkF67ThUz37T2_oPIuSwuO4e_-0vjaLs
Time for 512×512 on a 3090: 2 minutes 13 seconds
Maximum resolution on a 24 GB 3090: 768×768 or 1120×480
Maximum resolution on an 8GB 2080: 256×256 2 minutes 02 seconds
Description: Another VQGAN+CLIP scipt. More abstract results from this one.
a desert oasis in the style of Salvador Dali
a hyperrealistic painting of a dragon
Big Bird
Cthulhu
Robert DeNiro
Yoda “hmmm, abstract I am”
Name: Zoetrope 5.5
Author: Bearsharktopusdev
Original script: https://colab.research.google.com/drive/1LpEbICv1mmta7Qqic1IcRTsRsq7UKRHM
Time for 512×512 on a 3090: 3 minutes 27 seconds
Maximum resolution on a 24 GB 3090: 768×768 or 1120×720
Maximum resolution on an 8GB 2080: 256×256 3 minutes 23 seconds
Description: Updated version of Zoetrope 5. Supports more VQGAN models, CLIP models and optimizers compared to Zoetrope 5.
a cephalopod
a flemish baroque of a demon
a photo of a submarine in the style of Vincent van Gogh
a sketch of a Pokemon character in the style of Odilon Redon
a watercolor painting of dense woodland
Name: Experimental VQGAN
Author: Various
Original script: https://colab.research.google.com/drive/1jx3klUxlGbYUwvtqzC9SYl4XZKHL3R81
Time for 512×512 on a 3090: 1 minutes 12 seconds
Maximum resolution on a 24 GB 3090: 768×768 or 1120×720
Maximum resolution on an 8GB 2080: 256×256 0 minutes 52 seconds
Description: Very nice smooth results from this one.
a desert oasis in the style of Craig Mullins
a dragon
a manga drawing of a happy alien
a nightmare
a surrealist painting of love
a watercolor painting of a lighthouse
an airbrush painting of a well kept garden by Piet Mondiran
Cookie Monster
Name: SlideShowVisions
Author: Active Galaxy
Original script: https://colab.research.google.com/drive/1IihC4ZJvCh_tOgBVd900BzHX-ulPEFsa
Time for 512×512 on a 3090: 2 minutes 25 seconds
Maximum resolution on a 24 GB 3090: 768×768 or 1120×720
Maximum resolution on an 8GB 2080: 128×128 1 minute 56 seconds
Description: Tends to give more abstract paper cutout looks.
a happy child
a house vivid colors
a sea monster
a thunder storm
a tree
a woodcut of war
an engraving of zombies
Han Solo
Name: Quick CLIP Guided Diffusion
Author: Daniel Russell
Original script: https://colab.research.google.com/drive/1FuOobQOmDJuG7rGsMWfQa883A9r4HxEO
Time for 512×512 on a 3090: 43 seconds
Maximum resolution on a 24 GB 3090: 512×512
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: Modified version of CLIP Guided Diffusion that gets results quicker. Option for 256×256 or 512×512 sized images. Still very hit and miss when getting images that resemble the input prompt. The following samples came from a large overnight batch run of random prompts.
a cathedral
a digital painting of a space nebula
a lounge room
a monkey | lens flare
a nightmare creature
a rough seascape
a landscape
an android
an attractive woman
an oil on canvas painting of a cloudy sunset
Name: CLIP Guided Diffusion v5
Author: Katherine Crowson
Original script: https://colab.research.google.com/drive/1QBsaDAZv8np29FPbvjffbE1eytoJcsgA
Time for 512×512 on a 3090: 3 minutes 48 seconds
Maximum resolution on a 24 GB 3090: Locked to 512×512
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: Another CLIP Guided Diffusion script. Locked to 512×512 resolution. Needs less VRAM than the previous versions. The following samples came hand picked from a large batch run of random prompt phrases.
a cityscape
a gorilla
Cthulhu by Craig Mullins
computer rendering of Emporer Palpatine made of cheese by Evan Charlton
digital art of a mountainscape as created by Persis Goodale Thurston Taylor
a digital rendering of Chewbacca
an ugly person
See this tweet for an example of using CLIP Guided Diffusion to stylize a portrait.
Name: MSE Regulized Modified
Author: jbusted
Original script: https://colab.research.google.com/drive/1gFn9u3oPOgsNzJWEFmdK-N9h_y65b8fj
Time for 512×512 on a 3090: 3 minutes 02 seconds
Maximum resolution on a 24 GB 3090: 768×768 or 1120×720
Maximum resolution on an 8GB 2080: 256×256 2 minutes 45 seconds
Description: Modified and updated version of the previous “MSE Regulized VQGAN+CLIP” script. Less likely to suffer the previous script’s issue of subjects floating in a purple void.
a bronze sculpture of a planet
a cave by Asher Brown Durand
a charcoal drawing of Emporer Palpatine
a cozy den
a detailed drawing of a heart made of string by William MacTaggart
a digital rendering of Arnold Schwarzenegger made of metal by Muriel Brandt
an oil painting of a worried woman | Rendered in Cinema4D
an ugly creature
Dracula
Frankenstein
vector art of a forest clearing
Name: CLIP Guided Diffusion v6
Author: Dango233
Original script: https://colab.research.google.com/drive/14xBm1aSxQLbq26-jmDJi8I1HJ4ti5ybt
Time for 512×512 on a 3090: 3 minutes 10 seconds
Maximum resolution on a 24 GB 3090: Locked to 512×512
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: Latest CLIP Guided Diffusion script. The best one yet. Capable of some very nice results.
a hyperrealistic painting of a human
a sketch of planets
a storybook illustration of a cloudy sunset
a wizard | vivid colors
an art deco sculpture of a planet
an attractive man by John Linnell
an oil on canvas painting of satan
an oil painting of a clown
digital art of an ugly person by Avigdor Arikha
princess in sanctuary trending on artstation photorealistic portrait of a young princess
Name: CLIPDraw
Author: Kevin Frans
Original script: https://colab.research.google.com/github/kvfrans/clipdraw/blob/main/clipdraw.ipynb
Time for 512×512 on a 3090: 7 minutes 10 seconds
Maximum resolution on a 24 GB 3090: Huge. 4096×4096 and beyond.
Maximum resolution on an 8GB 2080: 1024×1024
Description: Generates images by a series of lines. Very abstract results.
a cloudy sunset
a digital painting of a rose
a sad clown
an abstract painting of Yoda
an etching of a library
The Sydney Harbour Bridge
Any Others I Missed?
Do you know of any other colabs and/or github Text-to-Image systems I have missed? Let me know and I will see if I can convert them to work with Visions of Chaos for a future release. If you know of any public Discords with other colabs being shared let me know too.
This post continues listing the Text-to-Image scripts included with Visions of Chaos and some example outputs from each script.
Name: VQGAN Gumbel
Author: Eleiber
Original script: https://colab.research.google.com/drive/1tim3xTsZXafK-A2rOUsevckdl4OitIiw
Time for 512×512 on a 3090: 3 minutes 27 seconds
Maximum resolution on a 24 GB 3090: 768×768 or 1120×480
Maximum resolution on an 8GB 2080: 256×256 4 minutes 05 seconds
Description: Variation using the gumbel-8192 model. Results are a bit rougher than others.
a childs drawing of a space nebula
a movie monster in the style of Edvard Munch
a raytraced image of the Amazon Rainforest
a tropical beach in the style of Polock
digital art of a rose
Name: OpenAI DVAE+CLIP
Author: Katherine Crowson
Original script: https://colab.research.google.com/drive/10DzGECHlEnL4oeqsN-FWCkIe_sq3wVqt
Time for 512×512 on a 3090: 3 minutes 07 seconds
Maximum resolution on a 24 GB 3090: 768×768 or 1120×480
Maximum resolution on an 8GB 2080: 256×256 2 minutes 20 seconds
Description: Results are very colorful and more abstract. By default it gives more noisy output images but this can be disabled if you prefer.
a dragon
a hyperrealistic painting of planets
a mountain cabin
a woodcut of a mountain range in the style of Marvel Comics
an angry person
Name: Aphantasia
Author: Vadim Epstein
Original script: https://github.com/eps696/aphantasia
Time for 512×512 on a 3090: 1 minute 5 seconds
Maximum resolution on a 24 GB 3090: 4096×4096 or 2520×1080
Maximum resolution on an 8GB 2080: 4096×4096 7 minutes 48 seconds
Description: Different and more messy pastel abstract Turneresque output. I spent a few hours trying many different combinations of settings trying to get the output more coherent and deeper colors. The following samples are as good as I could push it. I give up for now. If you can do better let me know. It does support creating larger 1280×720 resolution images on a 3090 GPU.
a midnineteenth century engraving of a cute monster
a skeleton
an ultrafine detailed painting of a crying person
puppies
Name: MSE VQGAN+CLIP z+quantize
Author: jbusted
Original script: https://colab.research.google.com/drive/1gFn9u3oPOgsNzJWEFmdK-N9h_y65b8fj
Time for 512×512 on a 3090: 6 minutes 19 seconds
Maximum resolution on a 24 GB 3090: 768×768 or 1120×480
Maximum resolution on an 8GB 2080: 256×256 3 minutes 36 seconds
Description: Awesome crisp results. Allows larger sized 480p images (854×480) on a 3090 GPU. One of the best scripts in this list worth exploring.
a charcoal drawing of a country town
a hyperrealistic painting of an ugly creature
a landscape made of mist
a mosaic of christmas
an octopus in the style of Vincent van Gogh
MSE VQGAN+CLIP z+quantize allows specifying an image as the input starting point. If you take the output and repeatedly use it as the input with some minor image stretching each frame you can get a movie zooming into the Text-to-Image output. No blending of frames or optical flow for this one, just straight combining of the 854×480 resolution frames into a movie. The VQGAN model was “vqgan_imagenet_f16_16384” and the CLIP model was “ViT-B/32”. The prompts for this movie were “hyperrealistic homer simpson”, “hyperrealistic marge simpson”, “hyperrealistic bart simpson”, “hyperrealistic lisa simpson” and “hyperrealistic maggie simpson”. The original 480p upload was badly compressed and looked terrible after YouTube compressed it, so I upscaled the 480p to 2160p (4K) in DaVinci Resolve and reuploaded to YouTube. This caused their compression to do a better encoding job so the movie is now watchable.
This next example is how MSE VQGAN+CLIP z+quantize interprets various common human phobias. Text prompts were “a hyperrealistic painting depicting acrophobia” etc. To try and smooth out the “flickering” when zooming I started using ImageMagick for zooming. ImageMagick allows sub pixel image resizing options. This movie was also originally 480p and upsized to 4K in Davinci Resolve before uploading.
I have also added some basic scripting (as in automating a series of steps rather than a Python py script) support to Visions of Chaos. Scripting allows the prompt, zoom speed, rotation and panning to be changed during the movie with smooth interpolations between them each frame.
The following video is a test of the scripting. This video is a Powers of Ten homage with zooming in from the largest scales to the smallest scales.
Another recent addition is the ability to use a series of images as “seed images” that are processed one at a time and then combined into a movie. The following GIF of the Alien chestburster scene is an example of this. The Text-to-Image prompt was “impasto oil painting”.
This next example movie is showing a “Self-Driven” zoom movie. As in a regular zoom movie the output frames are slightly stretched and fed back into the system each frame. The self-driven difference with this movie is that the Text-to-Image prompt text is automatically changed every 2 seconds by CLIP detecting what it “sees” in the current frame. This way the movie subjects are automatically changed and steered in new directions in a totally automated way. There is no human control except me setting the initial “Rainbow colored blobs” prompt. After that it was fully automated.
By default the CLIP Image Captioning script is very good at detecting what is in an image. Using the default accuracy resulted in a zoom movie that got stuck with a single topic or subject. One got stuck on a slight variation of a prompt dealing with kites, so as the zoom movie went deeper it only showed kites. Luckily after tweaking and decreasing the accuracy of the CLIP captioning the predicitons allow the resulting subjects to drift to new topics during the movie.
Name: Monster Maker
Author: P_Hoep
Original script: https://colab.research.google.com/drive/1ZbLnt5fLS_BDfpQY-9Dh_T40pLjfqSAC
Time for 512×512 on a 3090: 2 minutes 01 seconds
Description: No longer available. I was contacted by the author who does not want it shared publicly. The colab link no longer works.
a black and white photo of a library in the style of Rembrandt
a forest fire
a forest path
a heart made of feathers
a surrealist painting of the Las Vegas strip
Name: CLIP Guided Diffusion
Author: Katherine Crowson
Original script: https://colab.research.google.com/drive/12a_Wrfi2_gwwAuN3VvMTwVMz9TfqctNj
Time for 256×256 on a 3090: 1 minutes 35 seconds
Maximum resolution on a 24 GB 3090: Locked to 256×256
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: This one gives very unique results compared to the other scripts. Locked to 256×256 resolution. Some of the results can be very detailed and interesting, but a lot of time it is hit and miss to get a result that reliably matches the input phrase. The following samples came hand picked from a large batch run of random phrases.
a clown
a hyperrealistic painting of a witch
a sea monster
a surrealist sculpture of an android
Brad Pitt
New York City
Name: CLIP Guided Diffusion v2
Author: afiaka87
Original script: https://colab.research.google.com/github/afiaka87/clip-guided-diffusion/blob/main/colab_clip_guided_diff_hq.ipynb
Time for 256×256 on a 3090: 2 minutes 38 seconds
Maximum resolution on a 24 GB 3090: Locked to 256×256
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
escription: Modified CLIP Guided Diffusion with more options. This one gives very unique results compared to the other scripts. Locked to 256×256 resolution. Hopefully larger resolution versions of this script will appear in the future. Some of the results can be very detailed and interesting, but a lot of time it is hit and miss to get a result that reliably matches the input phrase. The following samples came hand picked from a large batch run of random phrases.
a digital painting of a crying person
a fine art painting of heaven in the style of Edvard Munch
a flemish baroque of an angry person
a flemish baroque of hell
a surrealist painting of a witch
the australian outback
Name: CLIPRGB
Author: Jonathan Whitaker
Original script: https://colab.research.google.com/drive/1MiKaFFgau6V5QhIed5tpNdLUiSbof4nI
Time for 512×512 on a 3090: 4 minutes 51 seconds
Maximum resolution on a 24 GB 3090: 4096×4096
Maximum resolution on an 8GB 2080: 4096×4096
Description: Very early 0.1 version shows a lot of potential. Can render huge resolution images up to 4096×4096 on a 3090 so I am really looking forward to future versions of this code with sharper details.
a digital painting of a wizard
a forest path
a tattoo of planets
a vampire
Name: CLIP Guided Diffusion v3
Author: Michael Friesen
Original script: https://colab.research.google.com/drive/1Fl2SZvLv23MVSAHxkoiNdxPeAZwibvu1
Time for 512×512 on a 3090: 2 minutes 23 seconds
Maximum resolution on a 24 GB 3090: Locked to 512×512
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: Modified CLIP Guided Diffusion that generates larger 512×512 images. Some of the results can be very detailed and interesting, but a lot of time it is hit and miss to get a result that reliably matches the input phrase. The following samples came hand picked from a large batch run of random phrases.
a cubist painting of a castle
a human made of vines
a rough seascape
frogs
h r giger
a matte painting of a landscape
Name: Zoetrope 5
Author: Bearsharktopusdev
Original script: https://colab.research.google.com/drive/1LpEbICv1mmta7Qqic1IcRTsRsq7UKRHM
Time for 512×512 on a 3090: 2 minutes 36 seconds
Maximum resolution on a 24 GB 3090: 768×768 or 1280×720
Maximum resolution on an 8GB 2080: 256×256 2 minutes 09 seconds
Description: Nice crisp results. Can generates up to 720p (1280×720) resolution images on a 3090. Includes a lot of new ideas from multiple people to help improve the outputs.
a detailed painting of a Pixar character
a futuristic city
a planet
a surrealist sculpture of a sea monster
an art deco scultpture of a policeman
cyberpunk art of a forest fire in the style of Edvard Munch
Name: CLIP RGB Optimization
Author: hotgrits
Original script: https://cdn.discordapp.com/attachments/730484623028519072/871624258260987934/CLIP__RGB_Optimization_v0_3.ipynb
Time for 512×512 on a 3090: 2 minutes 50 seconds
Maximum resolution on a 24 GB 3090: 4096×4096
Maximum resolution on an 8GB 2080: 4096×4096
Description: Another CLIP RGB based script without the pixelated artefacts of the CLIPRGB script. Can render huge resolution images up to 4096×4096 on a 3090. This script gives more impressionistic textures. By default the output was a bit too dark for my liking so I have added options to tweak the gamma and contrast of the output images in the script. The gamma and contrast tweaks are only at the display stage and do not change the internal image being generated.
a babbling brook
a movie monster
an amusement park
Chewbacca
Freddy Kruger in the style of Rembrandt
Name: MSE Regulized VQGAN+CLIP
Author: jbusted
Original script: https://colab.research.google.com/drive/1hf1seGOZctOJUznkhJNblLluXHbWLKZh
Time for 512×512 on a 3090: 3 minutes 16 seconds
Maximum resolution on a 24 GB 3090: 768×768 or 1120×480
Maximum resolution on an 8GB 2080: 128×128 2 minutes 30 seconds
Description: Generates good images but they tend to be inside a grey/purple border void.
a bronze sculpture of a heart
a cubist painting of Buzz Lightyear
a house made of string
an art deco sculpture of a vampire
chalk art of C-3PO
Name: Sequential VQGAN+CLIP
Author: Jakeukalane and Avengium
Original script: https://colab.research.google.com/drive/1CcibxlLDng2yzcjLwwwSADRcisc1qVCs
Time for 512×512 on a 3090: 1 minutes 41 seconds
Maximum resolution on a 24 GB 3090: 768×768 or 1120×480
Maximum resolution on an 8GB 2080: 256×256 2 minutes 11 seconds
Description: Really nice results and fast.
a campfire in the style of Vincent van Gogh
a colorful parrot
a hyperrealistic painting of C-3PO
an impressionist painting of Buzz Lightyear made of paper
New York City
Name: CLIPRGB ImStack
Author: Jonathan Whitaker
Original script: https://colab.research.google.com/drive/1MCC2IwAaRNCTBUzghuG41ypAkxjJvGtq
Time for 512×512 on a 3090: 2 minutes 07 seconds
Maximum resolution on a 24 GB 3090: 2048×2048
Maximum resolution on an 8GB 2080: 512×512 6 minutes 21 seconds
Description: Another CLIP RGB variation. Nice results after some brightness, contrast and sharpness tweaks to the generated images. Could still be a bit sharper.
a fine art painting of an angry person
a fireplace in the style of Claude Monet
a frog in the style of Beksinski
a nightmare creature in the style of H R Giger
a pointalism painting of a vampire made of copper
Any Others I Missed?
Do you know of any other colabs and/or github Text-to-Image systems I have missed? Let me know and I will see if I can convert them to work with Visions of Chaos for a future release. If you know of any public Discords with other colabs being shared let me know too.
Text-to-Image systems/models/scripts/networks (what is the official correct term for these?) are machine learning based models that take a descriptive phrase as input and attempt to generate images that match the input phrase.
Requirements
You do need a decent NVIDIA GPU. 3090 recommended for 768×768 resolution, 2080 for smaller 256×256 images, 10xx possibly for tiny images or if you want to try reduced settings and wait ages for results. If you have a commercial grade GPU with more memory you will be able to push these resolutions higher. VRAM matters more than GPU model, ie you can get 3090s with only 16GB of VRAM and others with 24GB. You may see a laptop with an advertised 3080 GPU, but the total VRAM will likely be much smaller than a desktop 3080. I have now updated these posts with the maximum resolution and times for a 2080 SUPER with 8GB VRAM to give people an idea of what an 8GB VRAM GPU will do.
To run these scripts from Visions of Chaos you need to have installed these prerequisites. Once you get all the prerequisites setup it really is as simple as typing your prompt text and clicking a button. I do include a lot of other settings so you can tweak the script parameters as you do more experimentation.
Visions of Chaos Text-to-Image Tutorial
You can watch the following tutorial video to get an idea of how the Text-to-Image mode works in Visions of Chaos.
Text-to-Image Scripts Included With Visions of Chaos
The rest of this blog post (and other parts) lists the 106 (so far) Text-to-Image scripts that I have been able to get working with Visions of Chaos.
If you are the author of one of these scripts then many thanks to you for sharing the code publicly. If you are a creator of a script I do not include here, please leave a comment with a link or send me an email so I can try it out. If you are a better coder than I am and improve any of these also let me know and I will share your fixes with the world.
I have included sample image outputs from each script. Most of the text prompts for these samples come from a prompt builder I include with Visions of Chaos that randomly combines subjects, adjectives, styles and artists.
Note also that these samples all use the default settings for GAN and CLIP models. Most of the included scripts allow tweaking of settings and different models to alter the outputs. There is a much wider range of output images possible. Download Visions of Chaos to experiment with all the combinations of scripts, models, prompts and settings.
Name: Deep Daze
Author: Phil Wang
Original script: https://github.com/lucidrains/deep-daze
Time for 512×512 on a 3090: 1 minutes 53 seconds.
Maximum resolution on a 24 GB 3090: 1024×1024
Maximum resolution on an 8GB 2080: 256×256 1 minute 9 seconds
Description: This was the first Text-to-Image script I ever found and tested. The output images from the original script are very washed out and pastel shaded, but after adding some torchvision transforms for brightness, contrast and sharpness tweaks they are a little better. Very abstract output compared to the other scripts.
a bronze sculpture of a colorful parrot in the style of Kandinsky
a crying person
a desert oasis
a surrealist painting of the Terminator made of silver
a zombie in the style of Turner
Name: Big Sleep
Author: Phil Wang
Original script: https://github.com/lucidrains/big-sleep
Time for 512×512 on a 3090: 4 minutes 0 seconds
Maximum resolution on a 24 GB 3090: 512×512
Maximum resolution on an 8GB 2080: 512×512 6 minutes 39 seconds
Description: Can give a good variety of images for any prompt text and does not suffer from the coloring or tiled image issues some of the other methods do. See here for my older post with a lot of Big Sleep examples. If you give it a chance and run repeated batches of the same prompt you can get some very nice results.
H R Giger
surrealism
colorful surrealism
a charcoal drawing of a landscape
Name: VQGAN+CLIP z-quantize
Author: Katherine Crowson
Original script: https://colab.research.google.com/drive/1L8oL-vLJXVcRzCFbPwOoMkPKJ8-aYdPN
Time for 512×512 on a 3090: 3 minutes 11 seconds
Maximum resolution on a 24 GB 3090: 768×768 or 1120×480
Maximum resolution on an 8GB 2080: 256×256 6 minutes 39 seconds
Description: The outputs tend to be divided up into rectangular regions, but the resulting imagery can be interesting.
a drawing of a bouquet of flowers made of cardboard
a rose made of silver
a tilt shift photo of traffic
an abstract painting of a house made of crystals
an abstract painting of a skull
VQGAN+CLIP z-quantize allows specifying an image as the input starting point. If you take the output, stretch it very slightly, and then feed it back into the system each frame you get a movie zooming in. For this movie I used SRCNN Super Resolution to double the resolution of the frames and then Super Slo-Mo for optical flow frame interpolation (both SRCNN and Super Slo-Mo are included with Visions of Chaos). The VQGAN model was “vqgan_imagenet_f16_16384” and the CLIP model was “ViT-B/32”. The prompts were the seven deadly sins, ie “a watercolor painting depicting pride”, “a watercolor painting depicting greed” etc.
The more astute viewers among you will notice there are only 6 of the sins in the previous video. What happened to “lust”? A while back one of my uploads was flagged as porn by the YouTube robots. Their (what I assume is) machine learning based system detected my upload as porn when there was no porn in it. An appeal was met with instant denial and so I now have a permanent “warning” on my channel with no way to talk to a person who could spend 1 minute looking at the video to tell it isn’t porn. Another warning would lead to a strike, so I am being overly cautious and omitting the lust part from the YouTube video. Those who want to see the full 7 part movie can click the following link to watch it on my LBRY channel.
Name: VQGAN+CLIP codebook
Author: Katherine Crowson
Original script: https://colab.research.google.com/drive/15UwYDsnNeldJFHJ9NdgYBYeo6xPmSelP
Time for 512×512 on a 3090: 3 minutes 19 seconds
Maximum resolution on a 24 GB 3090: 768×768 or 1120×480
Maximum resolution on an 8GB 2080: 256×256 3 minutes 46 seconds
Description: VQGAN-CLIP codebook seem to give very similar images for the same prompt phrase, so repeatedly running the script (with different seed values) does not give a wide variety of resulting images. Still gives interesting results.
a happy alien
a library
a teddy bear
digital art of a colorful parrot
digital art of an amusement park
Name: Aleph2Image Gamma
Author: Ryan Murdock
Original script: https://colab.research.google.com/drive/1VAO22MNQekkrVq8ey2pCRznz4A0_jY29
Time for 512×512 on a 3090: 2 minutes 1 second
Maximum resolution on a 24 GB 3090: Locked to 512×512
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: This one seems to evolve white blotches that grow and take over the entire image. Before the white out stage the images tend to have too much contrast.
H R Giger
surrealism
seascape painting
Name: Aleph2Image Delta
Author: Ryan Murdock
Original script: https://colab.research.google.com/drive/1oA1fZP7N1uPBxwbGIvOEXbTsq2ORa9vb
Time for 512×512 on a 3090: 2 minutes 1 second
Maximum resolution on a 24 GB 3090: Locked to 512×512
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: A newer revision of Aleph2Image that doesn’t have the white out issues. The resulting images have much more vibrant colors and that may be a good or bad point depending on your preferences.
a sketch of an angry person
a spooky forest
a sunset in the style of Rembrandt
a surrealist painting of a forest path
a tropical beach
Name: Aleph2Image Delta v2
Author: Ryan Murdock
Original script: https://colab.research.google.com/drive/1NGM9L8qP0gwl5z5GAuB_bd0wTNsxqclG
Time for 512×512 on a 3090: 3 minutes 42 seconds
Maximum resolution on a 24 GB 3090: Locked to 512×512
Maximum resolution on an 8GB 2080: 512×512 7 minutes 05 seconds
Description: A newer revision of Aleph2Image Delta that gives much sharper results. The resulting images tend to be similar to each other for each prompt text so not a lot of variety.
Name: Text2Image v2
Author: Denis Malimonov
Original script: https://colab.research.google.com/github/tg-bomze/collection-of-notebooks/blob/master/Text2Image_v2.ipynb
Time for 512×512 on a 3090: 1 minute 48 seconds
Maximum resolution on a 24 GB 3090: Locked to 512×512
Maximum resolution on an 8GB 2080: 512×512 3 minutes 12 seconds
Description: Can give more abstract results of the input phrase. Colors and details can be sharp, but not always. Good variety of output for each input phrase. Definitely worth a try.
a fireplace made of voxels
a green tree frog in the style of M C Escher
a pencil sketch of an evil alien
a sea monster
The Incredible Hulk made of silver
Name: The Big Sleep Customized
Author: NMKD
Original script: https://colab.research.google.com/drive/1Q2DIeMqYm_Sc5mlurnnurMMVqlgXpZNO
Time for 512×512 on a 3090: 1 minute 45 seconds
Maximum resolution on a 24 GB 3090: Locked to 512×512
Maximum resolution on an 8GB 2080: 512×512 3 minutes 09 seconds
Description: Another good one. Worth exploring further.
a forest path
a watercolor painting of a colorful parrot in the style of Kandinsky
a western town
Christmas
medusa made of vines
Name: Big Sleep Minmax
Author: @!goose
Original script: https://colab.research.google.com/drive/12CnlS6lRGtieWujXs3GQ_OlghmFyl8ch
Time for 512×512 on a 3090: 1 minute 45 seconds
Maximum resolution on a 24 GB 3090: Locked to 512×512
Maximum resolution on an 8GB 2080: 512×512 3 minutes 10 seconds
Description: Another interesting Big Sleep variation. Allows a second phrase to be specified that is minimized in the output. For example if your prompt for a landscape painting has too many clouds you could specify clouds as the minimize prompt so the system outputs less clouds in the resulting image.
a charcoal drawing of an eyeball
an ultrafine detailed painting of a crying person made of voxels
dense woodland
King Kong made of wrought iron in the style of Frida Kahlo
Michael Myers
Name: CLIP Pseudo Slime Mold
Author: hotgrits
Original script: https://discord.com/channels/729741769192767510/730484623028519072/850857930881892372
Time for 512×512 on a 3090: 2 minutes 57 seconds
Maximum resolution on a 24 GB 3090: Locked to 512×512
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: This one gives unique output compared to the others. Really nicely defined sharp details. The colors come from any color palette you select (currently all the 3,479 palettes within Visions of Chaos can be used) so you can “tint” the resulting images with color shades you prefer.
a color pencil sketch of Jason Vorhees made of plastic
a cubist painting of a science laboratory
a green tree frog in the style of Kandinsky
a watercolor painting of Godzilla
an octopus
Name: VQGAN+CLIP v3
Author: Eleiber
Original script: https://colab.research.google.com/drive/1go6YwMFe5MX6XM9tv-cnQiSTU50N9EeT
Time for 512×512 on a 3090: 2 minutes 52 seconds
Maximum resolution on a 24 GB 3090: 768×768 or 1120×480
Maximum resolution on an 8GB 2080: 256×256 3 minutes 53 seconds
Description: “v3” because it is the third VQGAN system I have tried and it didn’t have a unique specific name. Gives clear sharp images. Can give very painterly results with visible brush strokes if you use “a painting of” before the prompt subject.
a pencil sketch of a campfire in the style of Da Vinci
a pop art painting of a lush rainforest
a storybook illustration of a cityscape
an airbrush painting of frogs
the Amazon Rainforest
VQGAN+CLIP v3 allows specifying an image as the input starting point. If you take the output and repeatedly use it as the input with some minor image stretching each frame you can get a movie zooming into the Text-to-Image output. For this movie I used SRCNN Super Resolution to double the resolution of the frames and then Super Slo-Mo for optical flow frame interpolation (both SRCNN and Super Slo-Mo are included with Visions of Chaos). The VQGAN model was “vqgan_imagenet_f16_16384” and the CLIP model was “ViT-B/32”.
This next example movie is showing a “Self-Driven” zoom movie. As in a regular zoom movie the output frames are slightly stretched and fed back into the system each frame. The self-driven difference with this movie is that the Text-to-Image prompt text is automatically changed every 2 seconds by CLIP detecting what it “sees” in the current frame. This way the movie subjects are automatically changed and steered in new directions in a totally automated way. There is no human control except me setting the initial “A landscape” prompt. After that it was fully automated.
By default the CLIP Image Captioning script is very good at detecting what is in an image. Using the default accuracy resulted in a zoom movie that got stuck with a single topic or subject. One got stuck on a slight variation of a prompt dealing with kites, so as the zoom movie went deeper it only showed kites. Luckily after tweaking and decreasing the accuracy of the CLIP captioning the predicitons allow the resulting subjects to drift to new topics during the movie.
Name: VQGAN+CLIP v4
Author: crimeacs
Original script: https://colab.research.google.com/drive/1ZAus_gn2RhTZWzOWUpPERNC0Q8OhZRTZ
Time for 512×512 on a 3090: 2 minutes 37 seconds
Maximum resolution on a 24 GB 3090: 768×768 or 1120×480
Maximum resolution on an 8GB 2080: 256×256 3 minutes 05 seconds
Description: Another improved VQGAN system utilizing pooling. “v4” because it is the forth VQGAN system I have tried and it didn’t have a unique specific name.
a fine art painting of a cozy den
a king in the style of Kandinsky
a nurse in the style of Edward Hopper
a pastel of a demon
a watercolor painting of a mountain path
VQGAN+CLIP v4 allows specifying an image as the input starting point. If you take the output and repeatedly use it as the input with some minor image stretching each frame you can get a movie zooming into the Text-to-Image output. For this movie I used SRCNN Super Resolution to double the resolution of the frames and then Super Slo-Mo for optical flow frame interpolation (both SRCNN and Super Slo-Mo are included with Visions of Chaos). The VQGAN model was “vqgan_imagenet_f16_16384” and the CLIP model was “ViT-B/32”.
The text prompts for each part came from an idea in a YouTube comment to try more non-specific terms to see what happens, so here are the results of “an image of fear”, “an image of humanity”, “an image of knowledge”, “an image of love”, “an image of morality” and “an image of serenity”.
Here is another example. This time using the prompt of various directors, ie “Stanley Kubrick imagery”, “David Lynch imagery” etc. No super resolution this time. Super Slo-Mo was used for optical flow. I wasn’t sure if YouTube would accept the potentially unsettling horror visuals and I do not want to risk the hassle of a strike, so being on the safe side I am hosting this one on my LBRY channel only. Click the following image to open the movie in a new window. Note that LBRY can be a lot slower to buffer, so you may need to pause it for a while to let the movie load in.
If you find that too slow to buffer/load I also have a copy on my BitChute channel here.
Any Others I Missed?
Do you know of any other colabs and/or github Text-to-Image systems I have missed? Let me know and I will see if I can convert them to work with Visions of Chaos for a future release. If you know of any public Discords with other colabs being shared let me know too.
Compared to the last Deep Daze that generated washed out and pastel shaded results this Deep Daze creates images with sharp, crisp bright colors.
Sample results
“Shrek eating pizza”
“H R Giger”
“Freddy Krueger”
“Surrealist Homer Simpson”
“rose bush”
Availability
This and the previous Text-to-Image systems I have experimented with (here, here and here) are now supported by a GUI front end in Visions of Chaos. As long as you install these prerequisites and have a decent GPU you will be able to run these systems yourself.
For those who love to tinker I have now added a bunch more of the script parameters so you no longer have to edit the Python source code outside Visions of Chaos.
Other Text-to-Image
If you know of any other Text-to-Image systems (with sharable open-source code) then please let me know. All of the Text-to-Image systems I have tested so far all have their own unique behaviors and outputs so I will always be on the lookout for more new variations.