Text-to-Image Summary – Part 5

This is Part 5. There is also Part 1, Part 2, Part 3 and Part 4.

This post continues listing the Text-to-Image scripts included with Visions of Chaos and some example outputs from each script.


Name: Multi-Perceptor CLIP Guided Diffusion Secondary Model Method
Author: SOMNAI
Original script: https://colab.research.google.com/drive/1Pf5F84FzWe9iAKNbiPaEo_v4hvQZ9SqS
Time for 512×512 on a 3090: 7 minutes 23 seconds
Maximum resolution on a 24 GB 3090: 1792×768 or 2048×640.
Description: The winner for the longest name so far. Needs tweaking as the addition of the secondary model here reduces the usual excellent quality of the Multi-Perceptor CLIP Guided Diffusion. Still shows a lot of potential.

'a 3D render of Robocop' Multi-Perceptor CLIP Guided Diffusion Secondary Model Method Text-to-Image
a 3D render of Robocop

'a futuristic city IMAX' Multi-Perceptor CLIP Guided Diffusion Secondary Model Method Text-to-Image
a futuristic city IMAX

'a matte painting of trypophobia' Multi-Perceptor CLIP Guided Diffusion Secondary Model Method Text-to-Image
a matte painting of trypophobia

'a renaissance painting of a cloudy sunset trending on ArtStation' Multi-Perceptor CLIP Guided Diffusion Secondary Model Method Text-to-Image
a renaissance painting of a cloudy sunset trending on ArtStation

'a woman 4K photo' Multi-Perceptor CLIP Guided Diffusion Secondary Model Method Text-to-Image
a woman 4K photo

'an evil clown Flickr' Multi-Perceptor CLIP Guided Diffusion Secondary Model Method Text-to-Image
an evil clown Flickr

'an oil painting of a nightmare creature by Louis Janmot' Multi-Perceptor CLIP Guided Diffusion Secondary Model Method Text-to-Image
an oil painting of a nightmare creature by Louis Janmot

'Indiana Jones' Multi-Perceptor CLIP Guided Diffusion Secondary Model Method Text-to-Image
Indiana Jones

'reflective spheres' Multi-Perceptor CLIP Guided Diffusion Secondary Model Method Text-to-Image
reflective spheres

'zombies filmic' Multi-Perceptor CLIP Guided Diffusion Secondary Model Method Text-to-Image
zombies filmic


Name: Multi-Perceptor VQGAN+CLIP v2
Author: Remi Durant
Original script: https://colab.research.google.com/drive/1peZ98vBihDD9A1v7JdH5VvHDUuW5tcRK
Time for 512×512 on a 3090: 3 minutes 45 seconds
Maximum resolution on a 24 GB 3090: 1120×480.
Description: Version 2 of Remi’s Multi-Perceptor VQGAN+CLIP script.

'a babbling brook by Zhou Wenjing' Multi-Perceptor VQGAN+CLIP v2 Text-to-Image
a babbling brook by Zhou Wenjing

'a bedroom by Francesco Furini' Multi-Perceptor VQGAN+CLIP v2 Text-to-Image
a bedroom by Francesco Furini

'a computer by Édouard Detaille' Multi-Perceptor VQGAN+CLIP v2 Text-to-Image
a computer by Édouard Detaille

'a cross stitch of a landscape vivid colors' Multi-Perceptor VQGAN+CLIP v2 Text-to-Image
a cross stitch of a landscape vivid colors

'a kitchen filmic' Multi-Perceptor VQGAN+CLIP v2 Text-to-Image
a kitchen filmic

'a matte painting of halloween' Multi-Perceptor VQGAN+CLIP v2 Text-to-Image
a matte painting of halloween

'a pastel of a peacock' Multi-Perceptor VQGAN+CLIP v2 Text-to-Image
a pastel of a peacock

'a storybook illustration of a kitchen by Lena Alexander' Multi-Perceptor VQGAN+CLIP v2 Text-to-Image
a storybook illustration of a kitchen by Lena Alexander

'an oil on canvas painting of a zombie made of voxels' Multi-Perceptor VQGAN+CLIP v2 Text-to-Image
an oil on canvas painting of a zombie made of voxels

'vector art of Darth Vader' Multi-Perceptor VQGAN+CLIP v2 Text-to-Image
vector art of Darth Vader


Name: 360Diffusion
Author: @sadly_existent
Original script: https://colab.research.google.com/github/sadnow/360Diffusion/blob/main/360Diffusion_Public.ipynb
Time for 512×512 on a 3090: 2 minutes 50 seconds
Maximum resolution on a 24 GB 3090: 1120×480.
Description: A new diffusion based script. Capable of some interesting results

'a bronze sculpture of a crying person by Auguste BaudBovy' 360Diffusion Text-to-Image
a bronze sculpture of a crying person by Auguste BaudBovy

'a flemish baroque of a bouquet of flowers' 360Diffusion Text-to-Image
a flemish baroque of a bouquet of flowers

'a haunted house trending on ArtStation' 360Diffusion Text-to-Image
a haunted house trending on ArtStation

'a hyperrealistic painting of trypophobia by Xia Gui' 360Diffusion Text-to-Image
a hyperrealistic painting of trypophobia by Xia Gui

'a nightmare creature' 360Diffusion Text-to-Image
a nightmare creature

'a space nebula rendered in Cinema4D' 360Diffusion Text-to-Image
a space nebula rendered in Cinema4D

'a tentacle monster 4K HD realism' 360Diffusion Text-to-Image
a tentacle monster 4K HD realism

'an oil on canvas painting of Danny Trejo by Pablo Rey' 360Diffusion Text-to-Image
an oil on canvas painting of Danny Trejo by Pablo Rey

'Frankenstein' 360Diffusion Text-to-Image
Frankenstein

'heaven 8K 3D' 360Diffusion Text-to-Image
heaven 8K 3D


Name: Multi-Perceptor VQGAN+CLIP v3
Author: Remi Durant
Original script: https://colab.research.google.com/drive/1peZ98vBihDD9A1v7JdH5VvHDUuW5tcRK
Time for 512×512 on a 3090: 3 minutes 38 seconds
Maximum resolution on a 24 GB 3090: 1120×480.
Description: Version 3 of Remi’s Multi-Perceptor VQGAN+CLIP script.

'a bronze sculpture of Gandalf' Multi-Perceptor VQGAN+CLIP v3 Text-to-Image
a bronze sculpture of Gandalf

'a clown made of clay' Multi-Perceptor VQGAN+CLIP v3 Text-to-Image
a clown made of clay

'a detailed painting of a desert oasis' Multi-Perceptor VQGAN+CLIP v3 Text-to-Image
a detailed painting of a desert oasis

'a house by Kathleen Guthrie' Multi-Perceptor VQGAN+CLIP v3 Text-to-Image
a house by Kathleen Guthrie

'a peacock made of metal' Multi-Perceptor VQGAN+CLIP v3 Text-to-Image
a peacock made of metal

'a tilt shift photo of the Las Vegas strip' Multi-Perceptor VQGAN+CLIP v3 Text-to-Image
a tilt shift photo of the Las Vegas strip

'a watercolor painting of reflective spheres 8K 3D' Multi-Perceptor VQGAN+CLIP v3 Text-to-Image
a watercolor painting of reflective spheres 8K 3D

'an art deco painting of an amusement park' Multi-Perceptor VQGAN+CLIP v3 Text-to-Image
an art deco painting of an amusement park

'lineart of Big Bird by Alesso Baldovinetti' Multi-Perceptor VQGAN+CLIP v3 Text-to-Image
lineart of Big Bird by Alesso Baldovinetti

'vector art of a forest fire' Multi-Perceptor VQGAN+CLIP v3 Text-to-Image
vector art of a forest fire


Name: FuseDream
Author: Xingchao Liu et al
Original script: https://github.com/gnobitab/FuseDream
Time for 512×512 on a 3090: 3 minutes 38 seconds
Maximum resolution on a 24 GB 3090: Locked to 512×512.
Description: Gives some unique outputs compared to all the previous scripts.

'a clown' FuseDream Text-to-Image
a clown

'a king' FuseDream Text-to-Image
a king

'a matte painting of New York City by Robin Guthrie' FuseDream Text-to-Image
a matte painting of New York City by Robin Guthrie

'a portrait of a young girl' FuseDream Text-to-Image
a portrait of a young girl

'a rough seascape' FuseDream Text-to-Image
a rough seascape

'a sea monster' FuseDream Text-to-Image
a sea monster

'a teddy bear' FuseDream Text-to-Image
a teddy bear

'a werewolf' FuseDream Text-to-Image
a werewolf

'an airbrush painting of an angry woman' FuseDream Text-to-Image
an airbrush painting of an angry woman

'an attractive woman' FuseDream Text-to-Image
an attractive woman


Any Others I Missed?

Do you know of any other colabs and/or github Text-to-Image systems I have missed? Let me know and I will see if I can convert them to work with Visions of Chaos for a future release. If you know of any public Discords with other colabs being shared let me know too.

Jason.