Text-to-Image Summary – Part 6

This is Part 6. There is also Part 1, Part 2, Part 3, Part 4, Part 5, Part 7 and Part 8.

This post continues listing the Text-to-Image scripts included with Visions of Chaos and some example outputs from each script.


Name: Augmented CLIP Guided Diffusion
Author: Peter Baylies
Original script: https://github.com/pbaylies/Augmented_CLIP
Time for 512×512 on a 3090: 1 minutes 16 seconds
Maximum resolution on a 24 GB 3090: 1664×704
Maximum resolution on an 8GB 2080: 256×256 57 seconds
Description: Another CLIP Guided Diffusion script. Fast. Gives unique textured results.

'a detailed painting of people by Nicolette Macnamara' Augmented CLIP Guided Diffusion
a detailed painting of people by Nicolette Macnamara

'a diagram of a nightmare creature made of gold' Augmented CLIP Guided Diffusion
a diagram of a nightmare creature made of gold

'a nightmare creature' Augmented CLIP Guided Diffusion
a nightmare creature

'a painting of a cabin next to a stream in a secluded forest' Augmented CLIP Guided Diffusion
a painting of a cabin next to a stream in a secluded forest

'a storybook illustration of Jabba The Hutt by Carle Hessay' Augmented CLIP Guided Diffusion
a storybook illustration of Jabba The Hutt by Carle Hessay

'a werewolf by A R Middleton Todd' Augmented CLIP Guided Diffusion
a werewolf by A R Middleton Todd

'an oil painting of Big Bird' Augmented CLIP Guided Diffusion
an oil painting of Big Bird

'Gandalf trending on pixiv' Augmented CLIP Guided Diffusion
Gandalf trending on pixiv

'Lovecraftian horror' Augmented CLIP Guided Diffusion
Lovecraftian horror

'Lovecraftian horror' Augmented CLIP Guided Diffusion
poster art of the Las Vegas strip by George Passantino


Name: Princess Generator
Author: Dango233
Original script: https://colab.research.google.com/drive/1QgH9TvQMXR3PpEGBcHnghtEcwFDXLaYE
Time for 512×512 on a 3090: 2 minutes 38 seconds
Maximum resolution on a 24 GB 3090: 1664×704
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM.
Description: The latest update to “CLIP Guided Diffusion v6” from Dango233. Can give some superb results. Worth exploring and experimenting with further.

'a cloudy sunset' Princess Generator
a cloudy sunset

'a fireplace by Jacob More' Princess Generator
a fireplace by Jacob More

'a happy alien by James Jarvaise' Princess Generator
a happy alien by James Jarvaise

'a mountain path by Stephen Pace' Princess Generator
a mountain path by Stephen Pace

'a raytraced image of a western town' Princess Generator
a raytraced image of a western town

'a teddy bear' Princess Generator
a teddy bear

'Charmander made of wood by Hua Yan' Princess Generator
Charmander made of wood by Hua Yan

'dense woodland by Marie Angel' Princess Generator
dense woodland by Marie Angel

'paranoia by Floris van Dyck' Princess Generator
paranoia by Floris van Dyck

'portrait of Princess Victoria trending on artstation' Princess Generator
portrait of Princess Victoria trending on artstation


Name: Disco Diffusion v4.1
Author: @Somnai
Original script: https://colab.research.google.com/drive/1sHfRn5Y0YKYKi1k-ifUSBFRNJ8_1sa39
Time for 512×512 on a 3090: 1 minute 57 seconds
Maximum resolution on a 24 GB 3090: 2496×1088
Maximum resolution on an 8GB 2080: 1152×512. 4 minutes 39 seconds.
Description: The latest update to Disco Diffusion. Really nice detailed outputs. Low VRAM requirments allow huge sized images. I didn’t realise I had 3 zombie themed results in this random batch.

'a bronze sculpture of a zombie' Disco Diffusion v4.1
a bronze sculpture of a zombie

'a fantasy land' Disco Diffusion v4.1
a fantasy land

'a pencil sketch of Cthulhu by Rudolf Koller' Disco Diffusion v4.1
a pencil sketch of Cthulhu by Rudolf Koller

'a pop art painting of zombies' Disco Diffusion v4.1
a pop art painting of zombies

'a portrait of a young boy by Hendrick Cornelisz. van Vliet' Disco Diffusion v4.1
a portrait of a young boy by Hendrick Cornelisz. van Vliet

'a tree by Philips Wouwerman' Disco Diffusion v4.1
a tree by Philips Wouwerman

'a western town' Disco Diffusion v4.1
a western town

'a zombie' Disco Diffusion v4.1
a zombie

'Han Solo psychedelic' Disco Diffusion v4.1
Han Solo psychedelic

'vector art of the Amazon Rainforest' Disco Diffusion v4.1
vector art of the Amazon Rainforest


Name: Hypertron v2
Author: Philipuss
Original script: https://colab.research.google.com/drive/10fa8X6EsfZfda1dfhJ_BtfPZ7Te1WGoX
Time for 512×512 on a 3090: 1 minute 57 seconds
Maximum resolution on a 24 GB 3090: 1120×480.
Maximum resolution on an 8GB 2080: 256×256 2 minutes 18 seconds
Description: Version 2 of Hypertron. More models, more flavors. Works OK. Can give the “image in a sea of purple/grey” that previous MSE based scripts suffered from. Can give good results if you let it run a large random batch overnight.

'a bronze sculpture of a spooky forest by Herb Aach' Hypertron v2
a bronze sculpture of a spooky forest by Herb Aach

'a diamond made of flowers' Hypertron v2
a diamond made of flowers

'a gouache of an android by Wu Bin' Hypertron v2
a gouache of an android by Wu Bin

'a photo of a kitchen' Hypertron v2
a photo of a kitchen

'a photorealistic painting of a cemetery' Hypertron v2
a photorealistic painting of a cemetery

'a sketch of a haunted house' Hypertron v2
a sketch of a haunted house

'a tattoo of Squirtle made of clay' Hypertron v2
a tattoo of Squirtle made of clay

'an art deco painting of a human by Nicolas Lancret 8K 3D' Hypertron v2
an art deco painting of a human by Nicolas Lancret 8K 3D

'goldfish by Elfriede Lohse-Wächtler' Hypertron v2
goldfish by Elfriede Lohse-Wächtler

'Lovecraftian horror by Aileen Eagleton' Hypertron v2
Lovecraftian horror by Aileen Eagleton


Name: CC12M Diffusion
Author: Katherine Crowson
Original script: https://colab.research.google.com/drive/1TBo4saFn1BCSfgXsmREFrUl3zSQFg6CC
Time for 512×512 on a 3090: 1 minute 48 seconds
Maximum resolution on a 24 GB 3090: 1664×704.
Maximum resolution on an 8GB 2080: 832×512 2 minutes 59 seconds
Description: Can support higher resolutions, but the coherance really falls apart with anything over 256×256. It handles multiple images at once, so these examples are 4 256×256 results.

'a beachside resort' CC12M Diffusion
a beachside resort

'a bouquet of flowers' CC12M Diffusion
a bouquet of flowers

'a castle' CC12M Diffusion
a castle

'a cemetery' CC12M Diffusion
a cemetery

'a cephalopod by Walter Stuempfig super detailed' CC12M Diffusion
a cephalopod by Walter Stuempfig super detailed

'a color pencil sketch of a bedroom super detailed' CC12M Diffusion
a color pencil sketch of a bedroom super detailed

'a kitchen' CC12M Diffusion
a kitchen

'a mountainscape' CC12M Diffusion
a mountainscape

'a nightclub' CC12M Diffusion
a nightclub

'a vast city' CC12M Diffusion
a vast city


Name: Disco Diffusion v5
Authors: @Somnai and @Gandamu
Original script: https://colab.research.google.com/github/alembics/disco-diffusion/blob/main/Disco_Diffusion.ipynb
Time for 512×512 on a 3090: 2 minutes 02 seconds
Maximum resolution on a 24 GB 3090: 2496×1088
Maximum resolution on an 8GB 2080: 1152×512. 4 minutes 43 seconds.
Description: The latest update to Disco Diffusion.

'a cloudy sunset' Disco Diffusion v5
a cloudy sunset

'a crying person made of wrought iron by František Jakub Prokyš psychedelic' Disco Diffusion v5
a crying person made of wrought iron by František Jakub Prokyš psychedelic

'a flemish baroque of a school of tropical fish' Disco Diffusion v5
a flemish baroque of a school of tropical fish

'a low poly render of puppies' Disco Diffusion v5
a low poly render of puppies

'a morning landscape' Disco Diffusion v5
a morning landscape

'a mosaic of a worried man by Paul Lohse' Disco Diffusion v5
a mosaic of a worried man by Paul Lohse

'a thunder storm by Cornelis Claesz van Wieringen' Disco Diffusion v5
a thunder storm by Cornelis Claesz van Wieringen

'a tropical beach' Disco Diffusion v5
a tropical beach

'computer rendering of an evil alien 4K HD realism' Disco Diffusion v5
computer rendering of an evil alien 4K HD realism

'the human condition Flickr' Disco Diffusion v5
the human condition Flickr


Name: Disco Diffusion v5 Turbo Smooth
Authors: Chris Allen
Original script: https://colab.research.google.com/github/zippy731/disco-diffusion-turbo/blob/main/Disco_Diffusion_v5_Turbo_%5Bw_3D_animation%5D.ipynb
Time for 512×512 on a 3090: 1 minutes 14 seconds
Maximum resolution on a 24 GB 3090: 2496×1088
Maximum resolution on an 8GB 2080: 832×384. 2 minutes 21 seconds.
Description: An updated version of Disco Diffusion v5 that gives fast and smooth movie outputs.

'a black and white photo of a lush rainforest trending on Flickr' Disco Diffusion v5 Turbo Smooth
a black and white photo of a lush rainforest trending on Flickr

'a detailed matte painting of a factory' Disco Diffusion v5 Turbo Smooth
a detailed matte painting of a factory

'a hacker by Mykola Burachek' Disco Diffusion v5 Turbo Smooth
a hacker by Mykola Burachek

'a sea monster CGSociety' Disco Diffusion v5 Turbo Smooth
a sea monster CGSociety

'a surrealist painting of a happy person' Disco Diffusion v5 Turbo Smooth
a surrealist painting of a happy person

'a tardigrade by Cosmo Alexander' Disco Diffusion v5 Turbo Smooth
a tardigrade by Cosmo Alexander

'an anime drawing of an evening landscape by Daphne Fedarb photorealistic' Disco Diffusion v5 Turbo Smooth
an anime drawing of an evening landscape by Daphne Fedarb photorealistic

'an art deco painting of a happy person by John Uzzell Edwards' Disco Diffusion v5 Turbo Smooth
an art deco painting of a happy person by John Uzzell Edwards

'chalk art of a bouquet of flowers' Disco Diffusion v5 Turbo Smooth
chalk art of a bouquet of flowers

'the human condition' Disco Diffusion v5 Turbo Smooth
the human condition


Name: Augmented CLIP Guided Diffusion v2
Author: Peter Baylies
Original script: https://github.com/pbaylies/Augmented_CLIP
Time for 512×512 on a 3090: 2 minutes 48 seconds
Maximum resolution on a 24 GB 3090: 1664×704
Maximum resolution on an 8GB 2080: 512×512 4 minutes 56 seconds
Description: Updaterd version of the Augmented CLIP Guided Diffusion script.

'a bungalow 4K HD realism' Augmented CLIP Guided Diffusion v2
a bungalow 4K HD realism

'a forest fire' Augmented CLIP Guided Diffusion v2
a forest fire

'a lush rainforest CryEngine' Augmented CLIP Guided Diffusion v2
a lush rainforest CryEngine

'a painting of a kitchen by Betye Saar' Augmented CLIP Guided Diffusion v2
a painting of a kitchen by Betye Saar

'a portrait of a princess trending on artstation' Augmented CLIP Guided Diffusion v2
a portrait of a princess trending on artstation

'a spooky forest' Augmented CLIP Guided Diffusion v2
a spooky forest

'a tattoo of a zombie' Augmented CLIP Guided Diffusion v2
a tattoo of a zombie

'a werewolf by David Cooke Gibson' Augmented CLIP Guided Diffusion v2
a werewolf by David Cooke Gibson

'an oil painting of a lake' Augmented CLIP Guided Diffusion v2
an oil painting of a lake

'an ugly man' Augmented CLIP Guided Diffusion v2
an ugly man


Name: v-diffusion
Author: Katherine Crowson
Original script: https://github.com/crowsonkb/v-diffusion-pytorch
Time for 512×512 on a 3090: 3 minutes 57 seconds
Maximum resolution on a 24 GB 3090: 896×512 or 640×640.
Maximum resolution on an 8GB 2080: 128×128 1 minute 19 seconds
Description: Updated version of Velocity-Diffusion. Tends to make incoherant collage images over 256×256.

'a black and white photo of a portrait of a young girl' v-diffusion Text-to-Image
a black and white photo of a portrait of a young girl

'a cityscape by Lujo Bezeredi' v-diffusion Text-to-Image
a cityscape by Lujo Bezeredi

'a cloudy sunset' v-diffusion Text-to-Image
a cloudy sunset

'a hologram of a sad face by Josef Šíma' v-diffusion Text-to-Image
a hologram of a sad face by Josef Šíma

'a lounge room by Riad Beyrouti IMAX' v-diffusion Text-to-Image
a lounge room by Riad Beyrouti IMAX

'a mountain path' v-diffusion Text-to-Image
a mountain path

'a portrait of a young boy made of metal' v-diffusion Text-to-Image
a portrait of a young boy made of metal

'a portrait of a young girl' v-diffusion Text-to-Image
a portrait of a young girl

'a space nebula' v-diffusion Text-to-Image
a space nebula

'an acrylic painting of a mountain range' v-diffusion Text-to-Image
an acrylic painting of a mountain range


Name: GLID-3
Author: Jack Qiao
Original script: https://github.com/Jack000/glid-3
Time for 512×512 on a 3090: 35 seconds
Maximum resolution on a 24 GB 3090: 768×768.
Maximum resolution on an 8GB 2080: 512×512 50 seconds
Description: Great textures and lighting. Poor image coherency.

'a cemetery' GLID-3 Text-to-Image
a cemetery

'a drawing of a cloudy sunset' GLID-3 Text-to-Image
a drawing of a cloudy sunset

'a drawing of a human lens flare' GLID-3 Text-to-Image
a drawing of a human lens flare

'a lake' GLID-3 Text-to-Image
a lake

'a large waterfall made of silver' GLID-3 Text-to-Image
a large waterfall made of silver

'a marina' GLID-3 Text-to-Image
a marina

'a minimalist painting of a teddy bear by Johann Ludwig Bleuler' GLID-3 Text-to-Image
a minimalist painting of a teddy bear by Johann Ludwig Bleuler

'a renaissance painting of paranoia made of vines' GLID-3 Text-to-Image
a renaissance painting of paranoia made of vines

'an abbey by Cornelis Pietersz' GLID-3 Text-to-Image
an abbey by Cornelis Pietersz

'an art deco painting of a rose' GLID-3 Text-to-Image
an art deco painting of a rose


Name: Disco Diffusion v5.1
Authors: @Somnai, @Gandamu and Chris Allen
Original script: https://colab.research.google.com/github/alembics/disco-diffusion/blob/main/Disco_Diffusion.ipynb
Time for 512×512 on a 3090: 2 minutes 05 seconds
Maximum resolution on a 24 GB 3090: 2496×1088
Maximum resolution on an 8GB 2080: 1152×512. 4 minutes 40 seconds.
Description: Latest version of Disco Diffusion incorporating the “Turbo” features of v5 that gives fast and smooth movie outputs.

'a flemish baroque of a sunset' Disco Diffusion v5 Turbo Smooth
a flemish baroque of a sunset

'a marsh' Disco Diffusion v5 Turbo Smooth
a marsh

'a mid-nineteenth century engraving of New York City' Disco Diffusion v5 Turbo Smooth
a mid-nineteenth century engraving of New York City

'a minimalist painting of a cephalopod' Disco Diffusion v5 Turbo Smooth
a minimalist painting of a cephalopod

'a photo of Dracula' Disco Diffusion v5 Turbo Smooth
a photo of Dracula

'a watercolor painting of a knight' Disco Diffusion v5 Turbo Smooth
a watercolor painting of a knight

'an ugly person by Samuel Colman trending on ArtStation' Disco Diffusion v5 Turbo Smooth
an ugly person by Samuel Colman trending on ArtStation

'chalk art of Gandalf' Disco Diffusion v5 Turbo Smooth
chalk art of Gandalf

'lineart of a zombie' Disco Diffusion v5 Turbo Smooth
lineart of a zombie

'the Amazon Rainforest 4K HD realism' Disco Diffusion v5 Turbo Smooth
the Amazon Rainforest 4K HD realism


Name: Latent Diffusion LAION_400M
Authors: @multimodalart
Original script: https://colab.research.google.com/github/multimodalart/latent-diffusion-notebook/blob/main/Latent_Diffusion_LAION_400M_model_text_to_image.ipynb
Time for 512×512 on a 3090: 57 seconds
Maximum resolution on a 24 GB 3090: 1152×512 or 768×768
Maximum resolution on an 8GB 2080: 256×256. 1 minute 12 seconds.
Description: A new script based on the newly trained LAION_400M moidel. Impressive results at 256×256. Loses coherency at larger sizes. These examples are 4 256×256 images of each prompt.

'a black and white photo of a nightmare creature' Latent Diffusion LAION_400M
a black and white photo of a nightmare creature

'a futuristic city' Latent Diffusion LAION_400M
a futuristic city

'a hyperrealistic painting of a queen made of flowers' Latent Diffusion LAION_400M
a hyperrealistic painting of a queen made of flowers

'a painting of a happy clown' Latent Diffusion LAION_400M
a painting of a happy clown

'a skeleton' Latent Diffusion LAION_400M
a skeleton

'a stained glass window 4K HD realism' Latent Diffusion LAION_400M
a stained glass window 4K HD realism

'a watercolor painting of a lounge room' Latent Diffusion LAION_400M
a watercolor painting of a lounge room

'an eagle' Latent Diffusion LAION_400M
an eagle

'an ultrafine detailed painting of Harry Potter' Latent Diffusion LAION_400M
an ultrafine detailed painting of Harry Potter

'vector art of a zombie by Oskar Kokoschka' Latent Diffusion LAION_400M
vector art of a zombie by Oskar Kokoschka


Name: JAX CLIP Guided Diffusion v2.7
Author: nshepperd
Original script: https://colab.research.google.com/drive/1nmtcbQsE8sTjfLJ1u3Y4d6vi9ZTAvQph
Time for 512×512 on a 3090: 2 minutes 37 seconds
Maximum resolution on a 24 GB 3090: 2496×1088
Maximum resolution on an 8GB 2080: 512×512. 3 minutes 59 seconds.
Description: ANother diffusion based script. Can give very nice high detail results.

'a Dalek made of feathers' JAX CLIP Guided Diffusion v2.7
a Dalek made of feathers

'a haunted house' JAX CLIP Guided Diffusion v2.7
a haunted house

'a picture of a chateau by Odhise Paskali' JAX CLIP Guided Diffusion v2.7
a picture of a chateau by Odhise Paskali

'a refinery' JAX CLIP Guided Diffusion v2.7
a refinery

'a studio by Allan Ramsay trending on ArtStation' JAX CLIP Guided Diffusion v2.7
a studio by Allan Ramsay trending on ArtStation

'a sunset' JAX CLIP Guided Diffusion v2.7
a sunset

'a thunder storm' JAX CLIP Guided Diffusion v2.7
a thunder storm

'a watercolor painting of a fire breathing dragon' JAX CLIP Guided Diffusion v2.7
a watercolor painting of a fire breathing dragon

'a witch made of mist' JAX CLIP Guided Diffusion v2.7
a witch made of mist

'the tropics by Thomas de Keyser' JAX CLIP Guided Diffusion v2.7
the tropics by Thomas de Keyser


Name: GLID-3-XL
Author: Jack Qiao
Original script: https://github.com/Jack000/glid-3-xl
Time for 512×512 on a 3090: 1 minute 04 seconds
Maximum resolution on a 24 GB 3090: 512×512.
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM.
Description: Improved/updated version of GLID-3. Uses CLIP for better accuracy. Great textures and lighting. Poor image coherency when over 256×256.

'a demon' GLID-3-XL Text-to-Image
a demon

'a detailed matte painting of a bouquet of flowers' GLID-3-XL Text-to-Image
a detailed matte painting of a bouquet of flowers

'a kitchen' GLID-3-XL Text-to-Image
a kitchen

'a photorealistic painting of a movie monster hyperrealistic' GLID-3-XL Text-to-Image
a photorealistic painting of a movie monster hyperrealistic

'a picture of The Incredible Hulk by Kazimir Malevich' GLID-3-XL Text-to-Image
a picture of The Incredible Hulk by Kazimir Malevich

'a pop art painting of an angry woman' GLID-3-XL Text-to-Image
a pop art painting of an angry woman

'a spooky forest' GLID-3-XL Text-to-Image
a spooky forest

'an abbey' GLID-3-XL Text-to-Image
an abbey

'New York City by Marie Courtois' GLID-3-XL Text-to-Image
New York City by Marie Courtois

'poster art of Gandalf vivid colors' GLID-3-XL Text-to-Image
poster art of Gandalf vivid colors


Name: ruDALL-E Aspect Ratio
Author: Alex Shonenkov
Original script: https://github.com/shonenkov-AI/rudalle-aspect-ratio
Time for 512×512 on a 3090: N/A
Maximum resolution on a 24 GB 3090: N/A
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM.
Description: Version of ruDALL-E that generates wide and/or tall aspect ratio images. The shorter side is limited to 256 pixels. Results can be very nice. Will generate multiple images at once, so these sample images have 4 results per prompt.

'a black and white photo of a werewolf' ruDALL-E Aspect Ratio Text-to-Image
a black and white photo of a werewolf

'a cartoon of a swamp' ruDALL-E Aspect Ratio Text-to-Image
a cartoon of a swamp

'a large waterfall made of metal' ruDALL-E Aspect Ratio Text-to-Image
a large waterfall made of metal

'a lounge room' ruDALL-E Aspect Ratio Text-to-Image
a lounge room

'a matte painting of a townhouse' ruDALL-E Aspect Ratio Text-to-Image
a matte painting of a townhouse

'a palace made of mist' ruDALL-E Aspect Ratio Text-to-Image
a palace made of mist

'a photo of an ugly woman' ruDALL-E Aspect Ratio Text-to-Image
a photo of an ugly woman

'a tropical beach' ruDALL-E Aspect Ratio Text-to-Image
a tropical beach

'an evil clown' ruDALL-E Aspect Ratio Text-to-Image
an evil clown

'dense woodland' ruDALL-E Aspect Ratio Text-to-Image
dense woodland

Any Others I Missed?

Do you know of any other colabs and/or github Text-to-Image systems I have missed? Let me know and I will see if I can convert them to work with Visions of Chaos for a future release. If you know of any public Discords with other colabs being shared let me know too.

Jason.

9 responses to “Text-to-Image Summary – Part 6

  1. Love the work you’ve been doing compiling these! I was trying to play with the midi composer features and was getting an instant failure on each one… the console displays

    ModuleNotFoundError: No module named ‘music21’

    Any idea what might be the cause?

  2. Not sure if you’re aware, but your main VoC website seems to be down right now and has been for several days now

    • Domain has changed from softology.com.au to softology.pro. All links in these blogs and Visions of Chaos are updated.

Leave a comment