Text-to-Image Summary – Part 5

This is Part 5. There is also Part 1, Part 2, Part 3, Part 4, Part 6, Part 7 and Part 8.

This post continues listing the Text-to-Image scripts included with Visions of Chaos and some example outputs from each script.


Name: Multi-Perceptor CLIP Guided Diffusion Secondary Model Method
Author: SOMNAI
Original script: https://colab.research.google.com/drive/1Pf5F84FzWe9iAKNbiPaEo_v4hvQZ9SqS
Time for 512×512 on a 3090: 7 minutes 23 seconds
Maximum resolution on a 24 GB 3090: 1792×768 or 2048×640.
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: The winner for the longest name so far. Needs tweaking as the addition of the secondary model here reduces the usual excellent quality of the Multi-Perceptor CLIP Guided Diffusion. Still shows a lot of potential.

'a 3D render of Robocop' Multi-Perceptor CLIP Guided Diffusion Secondary Model Method Text-to-Image
a 3D render of Robocop

'a futuristic city IMAX' Multi-Perceptor CLIP Guided Diffusion Secondary Model Method Text-to-Image
a futuristic city IMAX

'a matte painting of trypophobia' Multi-Perceptor CLIP Guided Diffusion Secondary Model Method Text-to-Image
a matte painting of trypophobia

'a renaissance painting of a cloudy sunset trending on ArtStation' Multi-Perceptor CLIP Guided Diffusion Secondary Model Method Text-to-Image
a renaissance painting of a cloudy sunset trending on ArtStation

'a woman 4K photo' Multi-Perceptor CLIP Guided Diffusion Secondary Model Method Text-to-Image
a woman 4K photo

'an evil clown Flickr' Multi-Perceptor CLIP Guided Diffusion Secondary Model Method Text-to-Image
an evil clown Flickr

'an oil painting of a nightmare creature by Louis Janmot' Multi-Perceptor CLIP Guided Diffusion Secondary Model Method Text-to-Image
an oil painting of a nightmare creature by Louis Janmot

'Indiana Jones' Multi-Perceptor CLIP Guided Diffusion Secondary Model Method Text-to-Image
Indiana Jones

'reflective spheres' Multi-Perceptor CLIP Guided Diffusion Secondary Model Method Text-to-Image
reflective spheres

'zombies filmic' Multi-Perceptor CLIP Guided Diffusion Secondary Model Method Text-to-Image
zombies filmic


Name: Multi-Perceptor VQGAN+CLIP v2
Author: Remi Durant
Original script: https://colab.research.google.com/drive/1peZ98vBihDD9A1v7JdH5VvHDUuW5tcRK
Time for 512×512 on a 3090: 3 minutes 45 seconds
Maximum resolution on a 24 GB 3090: 1120×480.
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: Version 2 of Remi’s Multi-Perceptor VQGAN+CLIP script.

'a babbling brook by Zhou Wenjing' Multi-Perceptor VQGAN+CLIP v2 Text-to-Image
a babbling brook by Zhou Wenjing

'a bedroom by Francesco Furini' Multi-Perceptor VQGAN+CLIP v2 Text-to-Image
a bedroom by Francesco Furini

'a computer by Édouard Detaille' Multi-Perceptor VQGAN+CLIP v2 Text-to-Image
a computer by Édouard Detaille

'a cross stitch of a landscape vivid colors' Multi-Perceptor VQGAN+CLIP v2 Text-to-Image
a cross stitch of a landscape vivid colors

'a kitchen filmic' Multi-Perceptor VQGAN+CLIP v2 Text-to-Image
a kitchen filmic

'a matte painting of halloween' Multi-Perceptor VQGAN+CLIP v2 Text-to-Image
a matte painting of halloween

'a pastel of a peacock' Multi-Perceptor VQGAN+CLIP v2 Text-to-Image
a pastel of a peacock

'a storybook illustration of a kitchen by Lena Alexander' Multi-Perceptor VQGAN+CLIP v2 Text-to-Image
a storybook illustration of a kitchen by Lena Alexander

'an oil on canvas painting of a zombie made of voxels' Multi-Perceptor VQGAN+CLIP v2 Text-to-Image
an oil on canvas painting of a zombie made of voxels

'vector art of Darth Vader' Multi-Perceptor VQGAN+CLIP v2 Text-to-Image
vector art of Darth Vader


Name: 360Diffusion
Author: @sadly_existent
Original script: https://colab.research.google.com/github/sadnow/360Diffusion/blob/main/360Diffusion_Public.ipynb
Time for 512×512 on a 3090: 2 minutes 50 seconds
Maximum resolution on a 24 GB 3090: 1120×480.
Maximum resolution on an 8GB 2080: 256×256 2 minutes 28 seconds
Description: A new diffusion based script. Capable of some interesting results

'a bronze sculpture of a crying person by Auguste BaudBovy' 360Diffusion Text-to-Image
a bronze sculpture of a crying person by Auguste BaudBovy

'a flemish baroque of a bouquet of flowers' 360Diffusion Text-to-Image
a flemish baroque of a bouquet of flowers

'a haunted house trending on ArtStation' 360Diffusion Text-to-Image
a haunted house trending on ArtStation

'a hyperrealistic painting of trypophobia by Xia Gui' 360Diffusion Text-to-Image
a hyperrealistic painting of trypophobia by Xia Gui

'a nightmare creature' 360Diffusion Text-to-Image
a nightmare creature

'a space nebula rendered in Cinema4D' 360Diffusion Text-to-Image
a space nebula rendered in Cinema4D

'a tentacle monster 4K HD realism' 360Diffusion Text-to-Image
a tentacle monster 4K HD realism

'an oil on canvas painting of Danny Trejo by Pablo Rey' 360Diffusion Text-to-Image
an oil on canvas painting of Danny Trejo by Pablo Rey

'Frankenstein' 360Diffusion Text-to-Image
Frankenstein

'heaven 8K 3D' 360Diffusion Text-to-Image
heaven 8K 3D


Name: Multi-Perceptor VQGAN+CLIP v3
Author: Remi Durant
Original script: https://colab.research.google.com/drive/1peZ98vBihDD9A1v7JdH5VvHDUuW5tcRK
Time for 512×512 on a 3090: 3 minutes 38 seconds
Maximum resolution on a 24 GB 3090: 1120×480.
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: Version 3 of Remi’s Multi-Perceptor VQGAN+CLIP script.

'a bronze sculpture of Gandalf' Multi-Perceptor VQGAN+CLIP v3 Text-to-Image
a bronze sculpture of Gandalf

'a clown made of clay' Multi-Perceptor VQGAN+CLIP v3 Text-to-Image
a clown made of clay

'a detailed painting of a desert oasis' Multi-Perceptor VQGAN+CLIP v3 Text-to-Image
a detailed painting of a desert oasis

'a house by Kathleen Guthrie' Multi-Perceptor VQGAN+CLIP v3 Text-to-Image
a house by Kathleen Guthrie

'a peacock made of metal' Multi-Perceptor VQGAN+CLIP v3 Text-to-Image
a peacock made of metal

'a tilt shift photo of the Las Vegas strip' Multi-Perceptor VQGAN+CLIP v3 Text-to-Image
a tilt shift photo of the Las Vegas strip

'a watercolor painting of reflective spheres 8K 3D' Multi-Perceptor VQGAN+CLIP v3 Text-to-Image
a watercolor painting of reflective spheres 8K 3D

'an art deco painting of an amusement park' Multi-Perceptor VQGAN+CLIP v3 Text-to-Image
an art deco painting of an amusement park

'lineart of Big Bird by Alesso Baldovinetti' Multi-Perceptor VQGAN+CLIP v3 Text-to-Image
lineart of Big Bird by Alesso Baldovinetti

'vector art of a forest fire' Multi-Perceptor VQGAN+CLIP v3 Text-to-Image
vector art of a forest fire


Name: FuseDream
Author: Xingchao Liu et al
Original script: https://github.com/gnobitab/FuseDream
Time for 512×512 on a 3090: 3 minutes 38 seconds
Maximum resolution on a 24 GB 3090: Locked to 512×512.
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: Gives some unique outputs compared to all the previous scripts.

'a clown' FuseDream Text-to-Image
a clown

'a king' FuseDream Text-to-Image
a king

'a matte painting of New York City by Robin Guthrie' FuseDream Text-to-Image
a matte painting of New York City by Robin Guthrie

'a portrait of a young girl' FuseDream Text-to-Image
a portrait of a young girl

'a rough seascape' FuseDream Text-to-Image
a rough seascape

'a sea monster' FuseDream Text-to-Image
a sea monster

'a teddy bear' FuseDream Text-to-Image
a teddy bear

'a werewolf' FuseDream Text-to-Image
a werewolf

'an airbrush painting of an angry woman' FuseDream Text-to-Image
an airbrush painting of an angry woman

'an attractive woman' FuseDream Text-to-Image
an attractive woman


Name: Looking Glass
Author: bearsharktopus
Original script: https://colab.research.google.com/drive/11vdS9dpcZz2Q2efkOjcwyax4oob6N40G
Time for 265×256 on a 3090: 1 minute 19 seconds
Maximum resolution on a 24 GB 3090: Locked to 256×256.
Maximum resolution on an 8GB 2080: 256×256 2 minutes 03 seconds
Description: A variation on ruDALL-E that added support for training the output with a single image or directory of images. It does seem to create better results than the raw ruDALL-E scripts (starting from a single image of random Perlin noise).

'a cemetery trending on pixiv' Looking Glass Text-to-Image
a cemetery trending on pixiv

'a colorful parrot' Looking Glass Text-to-Image
a colorful parrot

'a photo of a house' Looking Glass Text-to-Image
a photo of a house

'a rough seascape' Looking Glass Text-to-Image
a rough seascape

'an alien city' Looking Glass Text-to-Image
an alien city

'an angry person by Eric Auld' Looking Glass Text-to-Image
an angry person by Eric Auld

'an angry woman' Looking Glass Text-to-Image
an angry woman

'an ugly woman' Looking Glass Text-to-Image
an ugly woman

'monkeys' Looking Glass Text-to-Image
monkeys

'Yoda' Looking Glass Text-to-Image
Yoda


Name: Velocity Diffusion
Author: Katherine Crowson
Original script: https://github.com/crowsonkb/v-diffusion-pytorch
Time for 512×512 on a 3090: 3 minutes 57 seconds
Maximum resolution on a 24 GB 3090: 896×512 or 640×640.
Maximum resolution on an 8GB 2080: 128×128 1 minute 19 seconds
Description: The latest script from Katherine Crowson. Unique results compared to her previous diffusion based scripts. Worth experimenting with further.

'a detailed matte painting of traffic' Velocity Diffusion Text-to-Image
a detailed matte painting of traffic

'a detailed painting of Jason Vorhees' Velocity Diffusion Text-to-Image
a detailed painting of Jason Vorhees

'a Ghostbuster' Velocity Diffusion Text-to-Image
a Ghostbuster

'a manga drawing of a lounge room by Yayoi Kusama' Velocity Diffusion Text-to-Image
a manga drawing of a lounge room by Yayoi Kusama

'a mountain range CryEngine' Velocity Diffusion Text-to-Image
a mountain range CryEngine

'a portrait of a young girl made of feathers rendered in unreal engine' Velocity Diffusion Text-to-Image
a portrait of a young girl made of feathers rendered in unreal engine

'a zombie' Velocity Diffusion Text-to-Image
a zombie

'lineart of a Rubiks cube' Velocity Diffusion Text-to-Image
lineart of a Rubiks cube

'The Grinch' Velocity Diffusion Text-to-Image
The Grinch

'vector art of Emporer Palpatine' Velocity Diffusion Text-to-Image
vector art of Emporer Palpatine


Name: ruDALL-E Arbitrary Resolution v1
Author: @nev
Original script: https://colab.research.google.com/drive/1DbqOIUIVBPOrJ4MeaV4YkAlb7ilWQjKZ
Time for 512×512 on a 3090: 4 minutes 40 seconds
Maximum resolution on a 24 GB 3090: 1024×1024
Maximum resolution on an 8GB 2080: 768×768 16 minutes 34 seconds
Description: Allows larger resolution images using the ruDALL-E model. Very nice results and supports larger resolutions on GPUs with less VRAM.

'a color pencil sketch of a werewolf' ruDALL-E Arbitrary Resolution v1 Text-to-Image
a color pencil sketch of a werewolf

'a colorful parrot' ruDALL-E Arbitrary Resolution v1 Text-to-Image
a colorful parrot

'a gorilla' ruDALL-E Arbitrary Resolution v1 Text-to-Image
a gorilla

'a painting of a cabin next to a stream in a secluded forest' ruDALL-E Arbitrary Resolution v1 Text-to-Image
a painting of a cabin next to a stream in a secluded forest

'a portrait of a girl with a dragon tattoo' ruDALL-E Arbitrary Resolution v1 Text-to-Image
a portrait of a girl with a dragon tattoo

'a rose vivid colors' ruDALL-E Arbitrary Resolution v1 Text-to-Image
a rose vivid colors

'a sketch of an ugly man' ruDALL-E Arbitrary Resolution v1 Text-to-Image
a sketch of an ugly man

'a surrealist sculpture of a submarine' ruDALL-E Arbitrary Resolution v1 Text-to-Image
a surrealist sculpture of a submarine

'dense woodland' ruDALL-E Arbitrary Resolution v1 Text-to-Image
dense woodland

'medusa' ruDALL-E Arbitrary Resolution v1 Text-to-Image
medusa


Name: ruDALL-E Arbitrary Resolution v2
Author: @nev
Original script: https://colab.research.google.com/drive/1DbqOIUIVBPOrJ4MeaV4YkAlb7ilWQjKZ
Time for 512×512 on a 3090: 4 minutes 40 seconds
Maximum resolution on a 24 GB 3090: 1024×1024
Maximum resolution on an 8GB 2080: 768×768 15 minutes 48 seconds
Description: v2 of the ruDALL-E Arbitrary Resolution script. Allows larger resolution images using the ruDALL-E model. Very nice results and supports larger resolutions on GPUs with less VRAM.

'a bouquet of flowers' ruDALL-E Arbitrary Resolution v2 Text-to-Image
a bouquet of flowers

'a cross stitch of a well kept garden' ruDALL-E Arbitrary Resolution v2 Text-to-Image
a cross stitch of a well kept garden

'a futuristic city' ruDALL-E Arbitrary Resolution v2 Text-to-Image
a futuristic city

'a large waterfall' ruDALL-E Arbitrary Resolution v2 Text-to-Image
a large waterfall

'a minimalist painting of a castle in the mountains' ruDALL-E Arbitrary Resolution v2 Text-to-Image
a minimalist painting of a castle in the mountains

'a photocopy of a monkey vivid colors' ruDALL-E Arbitrary Resolution v2 Text-to-Image
a photocopy of a monkey vivid colors

'a spooky forest by Laura Muntz Lyall' ruDALL-E Arbitrary Resolution v2 Text-to-Image
a spooky forest by Laura Muntz Lyall

'a teddy bear made of wrought iron' ruDALL-E Arbitrary Resolution v2 Text-to-Image
a teddy bear made of wrought iron

'dense woodland' ruDALL-E Arbitrary Resolution v2 Text-to-Image
dense woodland

'God' ruDALL-E Arbitrary Resolution v2 Text-to-Image
God


Name: GLIDE
Author: Unknown
Original script: https://colab.research.google.com/github/openai/glide-text2im/blob/main/notebooks/text2im.ipynb
Time for 256×256 on a 3090: 23 seconds
Maximum resolution on a 24 GB 3090: Locked to 256×256
Maximum resolution on an 8GB 2080: Locked to 256×256
Description: Images are rendered tiny at 64×64 and then upscaled internally within the script to 256×256 for ouput. The model has been “trimmed” so it cannot do anything human related and only does well for subjects it knows about. Hopefully they release the full model and/or train a larger resolutioon model in the future. Nothing to get excited about yet.

'a cathedral' GLIDE Text-to-Image
a cathedral

'a color pencil sketch of a fire breathing dragon by Erwin Bowien' GLIDE Text-to-Image
a color pencil sketch of a fire breathing dragon by Erwin Bowien

'a gorilla' GLIDE Text-to-Image
a gorilla

'a library' GLIDE Text-to-Image
a library

'a mosaic of monkeys' GLIDE Text-to-Image
a mosaic of monkeys

'a painting of a cabin next to a stream in a secluded forest' GLIDE Text-to-Image
a painting of a cabin next to a stream in a secluded forest

'an elephant' GLIDE Text-to-Image
an elephant

'dinosaurs' GLIDE Text-to-Image
dinosaurs

'goldfish' GLIDE Text-to-Image
goldfish

'the Sydney Harbour Bridge lens flare' GLIDE Text-to-Image
the Sydney Harbour Bridge lens flare


Name: Disco Diffusion
Author: @Somnai
Original script: https://colab.research.google.com/drive/1bItz4NdhAPHg5-u87KcH-MmJZjK-XqHN
Time for 512×512 on a 3090: 3 minutes 18 seconds
Maximum resolution on a 24 GB 3090: 2496×1088 11 minutes 50 seconds
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: Diffusion script that includes all the latest features. Capable of rendering some very nice large resolution images (it may even do better at larger sized images than smaller resolutions like these samples).

'a cute creature' Disco Diffusion Text-to-Image
a cute creature

'a detailed matte painting of a morning landscape' Disco Diffusion Text-to-Image
a detailed matte painting of a morning landscape

'a peacock made of mist by Reinier Nooms' Disco Diffusion Text-to-Image
a peacock made of mist by Reinier Nooms

'a Pokemon character by William Etty' Disco Diffusion Text-to-Image
a Pokemon character by William Etty

'a polaroid photo of an angry woman' Disco Diffusion Text-to-Image
a polaroid photo of an angry woman

'a rough seascape' Disco Diffusion Text-to-Image
a rough seascape

'a watercolor painting of a mountain path by Mark A Brennan rendered in Cinema4D' Disco Diffusion Text-to-Image
a watercolor painting of a mountain path by Mark A Brennan rendered in Cinema4D

'an attractive woman' Disco Diffusion Text-to-Image
an attractive woman

'computer rendering of a desert oasis rendered in unreal engine' Disco Diffusion Text-to-Image
computer rendering of a desert oasis rendered in unreal engine

\

'the Amazon Rainforest by Qian Du' Disco Diffusion Text-to-Image
the Amazon Rainforest by Qian Du


Name: Infinite Diffusion
Author: https://github.com/crowsonkb/v-diffusion-pytorch
Original script: https://colab.research.google.com/drive/1VJrfInU5RbciXXD_8jzY-FntFqiyj6au
Time for 512×512 on a 3090: 3 minutes 32 seconds
Maximum resolution on a 24 GB 3090: 512×512
Maximum resolution on an 8GB 2080: 256×256 3 minutes 15 seconds
Description: Diffusion basecd script. Very VRAM hungry. Renders some unique images compared to the other methods.

'cookie monster eating a cookie' Infinite Diffusion Text-to-Image
cookie monster eating a cookie

'a renaissance painting of a farm by Bernardo Strozzi' Infinite Diffusion Text-to-Image
a renaissance painting of a farm by Bernardo Strozzi

'a silk screen of God' Infinite Diffusion Text-to-Image
a silk screen of God

'a storybook illustration of a cute monster trending on pixiv' Infinite Diffusion Text-to-Image
a storybook illustration of a cute monster trending on pixiv

'a surrealist painting of Frankenstein' Infinite Diffusion Text-to-Image
a surrealist painting of Frankenstein

'a watercolor painting of Yoda' Infinite Diffusion Text-to-Image
a watercolor painting of Yoda

'a worried woman made of clay lens flare' Infinite Diffusion Text-to-Image
a worried woman made of clay lens flare

'an art deco painting of Luke Skywalker' Infinite Diffusion Text-to-Image
an art deco painting of Luke Skywalker

'an oil painting of Buzz Lightyear' Infinite Diffusion Text-to-Image
an oil painting of Buzz Lightyear

'Chewbacca' Infinite Diffusion Text-to-Image
Chewbacca


Name: minDALL-E
Author: Kakao Brain Corp
Original script: https://github.com/kakaobrain/minDALL-E
Time for 256×256 on a 3090: 1 minutes 59 seconds
Maximum resolution on a 24 GB 3090: Locked to 256×256
Maximum resolution on an 8GB 2080: 256×256 1 minute 59 seconds
Description: Another DALL-E variation script. Locked to 256×256 but can geenrate multiple images each run.

'a cozy den' minDALL-E Text-to-Image
a cozy den

'a digital painting of Chewbacca by Willem van de Velde the Elder' minDALL-E Text-to-Image
a digital painting of Chewbacca by Willem van de Velde the Elder

'a sad person' minDALL-E Text-to-Image
a sad person

'a skull' minDALL-E Text-to-Image
a skull

'a storybook illustration of a happy clown by Gwen Barnard' minDALL-E Text-to-Image
a storybook illustration of a happy clown by Gwen Barnard

'a tree by Colin Gill' minDALL-E Text-to-Image
a tree by Colin Gill

'Bugs Bunny' minDALL-E Text-to-Image
Bugs Bunny

'fireworks by Károly Lotz' minDALL-E Text-to-Image
fireworks by Károly Lotz

'The Grand Canyon' minDALL-E Text-to-Image
The Grand Canyon

'Yoda' minDALL-E Text-to-Image
Yoda


Name: ruDOLPH
Author: SBER AI
Original script: https://github.com/sberbank-ai/ru-dolph
Time for 128×128 on a 3090: 1 minutes 15 seconds
Maximum resolution on a 24 GB 3090: Locked to 128×128
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: Another ruDALL-E variation script. Locked to a tiny 128×128 resolution for now until they train the larger models. These examples were 4x upscaled with Real ESRGAN.

'a castle' ruDOLPH Text-to-Image
a castle

'a colorful parrot' ruDOLPH Text-to-Image
a colorful parrot

'a fine art painting of an ugly woman' ruDOLPH Text-to-Image
a fine art painting of an ugly woman

'a kitchen' ruDOLPH Text-to-Image
a kitchen

'a pastel of spirals made of plastic' ruDOLPH Text-to-Image
a pastel of spirals made of plastic

'a photorealistic painting of a cityscape' ruDOLPH Text-to-Image
a photorealistic painting of a cityscape

'a portrait of a woman' ruDOLPH Text-to-Image
a portrait of a woman

'a sad person by Ramon Casas i CarbÃ' ruDOLPH Text-to-Image
a sad person by Ramon Casas i CarbÃ

'kittens' ruDOLPH Text-to-Image
kittens

'vector art of a woman' ruDOLPH Text-to-Image
vector art of a woman


Name: CLIP Guided Deep Image Prior
Author: Daniel Russell
Original script: https://colab.research.google.com/drive/1_oqIK8A67EgtJDdfsuJojc5ukNzirdle
Time for 512×512 on a 3090: 1 minutes 45 seconds
Maximum resolution on a 24 GB 3090: 1024×1024 or 1680×720
Maximum resolution on an 8GB 2080: 512×512 (5 minutes 7 seconds) or 640×360
Description: Interesting script that has decent coherency. If only the output was slightly sharper and the colors slightly richer it would be a winner. Still good for unique outputs that the other methods cannot achieve.

'a flemish baroque of a shrine' CLIP Guided Deep Image Prior
a flemish baroque of a shrine

'a statue of a tardigrade made of clay' CLIP Guided Deep Image Prior
a statue of a tardigrade made of clay

'a surrealist painting of a Pixar character' CLIP Guided Deep Image Prior
a surrealist painting of a Pixar character

'a surrealist painting of an evening landscape 4K photo' CLIP Guided Deep Image Prior
a surrealist painting of an evening landscape 4K photo

'an abstract sculpture of an evil clown by Han Gan' CLIP Guided Deep Image Prior
an abstract sculpture of an evil clown by Han Gan

'an ambient occlusion render of Bugs Bunny made of wood' CLIP Guided Deep Image Prior
an ambient occlusion render of Bugs Bunny made of wood

'Cookie Monster' CLIP Guided Deep Image Prior
Cookie Monster

'Jabba The Hutt by Shūbun Tenshō' CLIP Guided Deep Image Prior
Jabba The Hutt by Shūbun Tenshō

'tentacles by Johanna Marie Fosie' CLIP Guided Deep Image Prior
tentacles by Johanna Marie Fosie

'vector art of heaven' CLIP Guided Deep Image Prior
vector art of heaven


Any Others I Missed?

Do you know of any other colabs and/or github Text-to-Image systems I have missed? Let me know and I will see if I can convert them to work with Visions of Chaos for a future release. If you know of any public Discords with other colabs being shared let me know too.

Jason.

6 responses to “Text-to-Image Summary – Part 5

  1. Attractive women seems to be the last barrier holding AI back and preventing the unavoidable takeover of entirety of an art industry 😀
    Are there any AIs apart from Dalee Ru that somebody without 3090 can play with?
    Thanks for your explorations (and Visions of Chaos)

    • Depends what GPU you have and how much VRAM it has.

      I have added a stat for “Maximum resolution on an 8GB 2080” on each script so you can get an idea of which ones work with a lower VRAM GPU.

      Jason.

  2. Hey, love the software, I figured id comment to mention that disco has been updated to v4.1 and includes a bunch of new additions and such. I’ve been using the version in VoC but it would be neat to see it updated.
    As for GPUs, I’ve had no issues running the disco 3.0 version in VoC on an RTX 3070 which has 8GB of VRAM. Atm, I’ve been running 4 models at 512×512, no crashes, and I’ve run over 90 prompts.

    Thank you.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s