This is Part 4. There is also Part 1, Part 2, Part 3, Part 5, Part 6, Part 7 and Part 8.
This post continues listing the Text-to-Image scripts included with Visions of Chaos and some example outputs from each script.
Name: PixelDraw
Author: dribnet
Original script: https://colab.research.google.com/github/dribnet/clipit/blob/master/demos/PixelDrawer.ipynb
Time for 512×512 on a 3090: 1 minutes 59 seconds
Maximum resolution on a 24 GB 3090: Huge. 4096×4096 and beyond.
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: Generates “pixel art” images. I had a lot of requests to add support for this one.
a cartoon of a peacock
a cloudy sunset
a gorilla
a morning landscape
a watercolor painting of a castle
an art deco painting of Al Pacino
Hell
Shrek
Name: DirectVisions
Author: Jens Goldberg
Original script: https://colab.research.google.com/drive/127lKSsQjx-UDDUSvIkLL6mREfZ0KQu5D
Time for 512×512 on a 3090: 2 minutes 39 seconds
Maximum resolution on a 24 GB 3090: Huge. 4096×4096 and beyond.
Maximum resolution on an 8GB 2080: 4096×4096
Description: Interesting detailed images. Can create huge resolution results.
a color pencil sketch of a western town
a detailed painting of a cephalopod
a digital rendering of an ugly face
a pencil sketch of Buzz Lightyear
a rough seascape by Pinchus Kremegne
a stock photo of a president
a sunset
an alien city
an alien forest by Helen Berman
an evening landscape
Name: Pixel Direct
Author: Unknown
Original script: https://colab.research.google.com/drive/1F9ZOZnpV3uBPRDSESaAXYwzNZJQRJT75
Time for 512×512 on a 3090: 1 minutes 03 seconds
Maximum resolution on a 24 GB 3090: Huge. 4096×4096 and beyond.
Maximum resolution on an 8GB 2080: 2048×2048 1 minute 51 seconds
Description: Another “Pixel Art” script. More abstract results than the PixelDraw script above.
a bronze sculpture of a nightmare creature
a cartoon of Al Pacino
a nightclub
a silk screen of a bouquet of flowers
an etching of a worried woman
an illustration of of a thunder storm
Name: FourierVisions
Author: Unknown
Original script: https://colab.research.google.com/drive/1nGNBjhbYnDHSumGPjpFHjDOsaZFAqGgF
Time for 512×512 on a 3090: 1 minutes 40 seconds
Maximum resolution on a 24 GB 3090: Huge. 4096×4096 and beyond.
Maximum resolution on an 8GB 2080: 1024×1024 4 minutes 07 seconds
Description: Detailed images. The default script generates washed out pastel images, but with some gamma and brightness tweaks they can be improved (still not ideal, but better). Allows very large resolution images.
a cathedral
a charcoal drawing of zombies
a detailed painting of a sunset by Thomas Cantrell Dugdale
a ghost made of mist
a kitchen
a movie monster
a pencil sketch of a sad clown
a werewolf
an evil clown by Viktor Oliva
an ink drawing of an ugly monster
Name: PyramidVisions
Author: Unknown
Original script: https://colab.research.google.com/drive/1dpAS_wK34y7c6s-CatAFmBtbkjGT_erM
Time for 512×512 on a 3090: 3 minutes 08 seconds
Maximum resolution on a 24 GB 3090: Huge. 4096×4096 and beyond.
Maximum resolution on an 8GB 2080: 1024×1024 10 minutes 48 seconds
Description: Very detailed images. Not the fastest script, but gives some very nice results. Lower VRAM requirements so good for lesser spec GPUs. Definitely one of the better scripts worth exploring.
a desert oasis
a lush rainforest
a marble sculpture of an angry person
a minimalist painting of the Amazon Rainforest
a nightmare creature
a pastel of a computer made of paper
an abstract sculpture of a sad clown
an acrylic painting of an alien forest | vivid colors
Medusa
vector art of an ugly woman
Name: Visions of AI v1
Author: Jason Rampe
Original script: Included with Visions of Chaos. No colab.
Time for 512×512 on a 3090: 1 minutes 32 seconds
Maximum resolution on a 24 GB 3090: 768×768 or 1120×480.
Maximum resolution on an 8GB 2080: 256×256 1 minute 33 seconds
Description: My first attempt at actually creating a Text-to-Image script. Based on the excellent example from Jonathan Whitaker‘s AIAIArt Lesson 3 tutorial. Gives some very nice fine detail in some areas, but suffers the non coherance of other scripts in that it creates multiple copies of the subject throughout the image. After actually trying to write my own script I only have more respect for those who can do this. Hopefully I can improve these results for a version 2. In the meantime, here are some sample from the current Visions of AI script.
a cartoon of the human condition by Judy Takács
a cubist painting of an evening landscape
a digital rendering of frogs
a fire breathing dragon
a hyperrealistic painting of a movie monster
a morning landscape
a shark
a woodcut of an ugly man
an airbrush painting of C-3PO
Frankenstein
Name: Visions of AI v2
Author: Jason Rampe
Original script: Included with Visions of Chaos. No colab.
Time for 512×512 on a 3090: 2 minutes 35 seconds
Maximum resolution on a 24 GB 3090: 768×768 or 1120×480.
Maximum resolution on an 8GB 2080: 256×256 2 minutes 36 seconds
Description: An attempt to improve the coherency of the previous script. The first 30 iterations zoom into the image every 10 frames. This results in larger shapes/blobs for the rest of the script to work from. The idea is that it will give larger subjects compared to the v1 script. Kind of works. Gives blurrier results. To be fixed in the next version?
a morning landscape by William Gear
a raytraced image of a nightclub lens flare
a tentacle monster by Carlo Crivelli
a woodcut of a worried woman by Li Keran
an illustration of of a cave made of cheese
Cthulhu
cyberpunk art of a futuristic city
goldfish
reflective spheres
the Australian outback
Name: Multi-Perceptor CLIP Guided Diffusion
Author: Varkarrus
Original script: https://colab.research.google.com/drive/1y3Vt39A5KSNFRa6Z2bCqDHxteZSVH9NC
Time for 512×512 on a 3090: 3 minutes 08 seconds
Maximum resolution on a 24 GB 3090: 896×512 or 1152×384 (dimensions must be divisible by 128).
Maximum resolution on an 8GB 2080: 128×128 1 minute 56 seconds
Description: Builds upon previous CLIP Guided Diffusion scripts. Like the previous script by Dango233 it uses three CLIP models simultaneously to “rate” the generated images, and I have added options to use up to six different CLIP models. The resulting image accuracy compared to the prompt, and the resulting image coherence seem to be much better than previous CLIP Guided Diffusion scripts that could almost have random outputs sometimes. This script is superb and highly recommended. Great lighting, textures and brushstrokes. Normally with these blog posts I do a batch run of random prompts overnight and then pick the best 10 images. In this case I had nearly 50 images in my “good” folder after going through the batch results. So, for this script I am showing 20 sample images.
a cute creature | TriX 400 TX
a digital painting of Frankenstein by Kanzan Shimomura
a morning landscape by János SaxonSzász
a nightmare creature
a photorealistic painting of a teddy bear
a portrait of a young girl
a space nebula | IMAX
a worried man
a zombie by Nathaniel Hone
an acrylic painting of a spider by Abram Arkhipov
an airbrush painting of a monkey by Jeremy Henderson
an alien landscape
an ugly creature made of insects
an ultrafine detailed painting of a sad person | ZBrush
Arnold Schwarzenegger | trending on ArtStation
concept art of Robocop
dinosaurs
Dracula | CGSociety
flesh made of insects
God by William Simpson
Name: Pixel MultiColors
Author: Remi Durant
Original script: https://colab.research.google.com/drive/17c-13cl_VQKpHq2rDrnFVi6ZT-CHeZNn
Time for 512×512 on a 3090: 0 minutes 44 seconds
Maximum resolution on a 24 GB 3090: 4096×4096.
Maximum resolution on an 8GB 2080: 2048×2048 7 minutes 45 seconds
Description: Very noisy/pixelated/abstract results. The default script gives dark images which some tweaks to brightness and contrast can help. Maybe a little bit of blur could help too in a future revision. It is fast though, and can support huge image sizes.
a charcoal drawing of a cute creature made of metal
a farm
a forest path by Walter Leighton Clark
a lighthouse
a surrealist painting of a beachside resort
a well kept garden
an abstract sculpture of Pikachu
an art deco painting of a volcano
an ink drawing of tentacles
an octopus Rendered in Cinema4D
Name: Ultraquick CLIP Guided Diffusion
Author: @sadly_existent
Original script: https://colab.research.google.com/github/sadnow/360Diffusion/blob/main/360Diffusion_AlphaTesting.ipynb
Time for 512×512 on a 3090: 1 minute 57 seconds
Maximum resolution on a 24 GB 3090: Locked to either 256×256 or 512×512.
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: Another CLIP Guided Diffusion script. Can give some interesting results.
a cave
a color pencil sketch of Cthulhu
a detailed painting of Shrek
a flemish baroque of the human condition by George Barret Jr
a low poly render of halloween
a photorealistic painting of a worried woman made of paper by Ann Thetis Blacker
a surrealist painting of a worried man
a surrealist sculpture of an angry man 8K 3D
Robocop
zombies
Name: ruDALL-E
Author: @sadly_existent
Original script: https://colab.research.google.com/drive/1wGE-046et27oHvNlBNPH07qrEQNE04PQ
Optimized script: https://colab.research.google.com/drive/1euIMG8E6kSFA2nU58LqrVsq6nbXjqELY
Time for 256×256 on a 3090: 1 minute 05 seconds
Maximum resolution on a 24 GB 3090: Locked to 256×256.
Maximum resolution on an 8GB 2080: Cannot run on 8GB VRAM
Description: Russian version of DALL-E. Only takes text prompts in Russian, so I do some auto English to Russian translations. Locked to small 256×256 images at this stage, but can create some interesting results.
a hyperrealistic painting of Chewbacca by Edith Grace Wheatley
a low poly render of Pikachu
a man
a rose
a stock photo of puppies
egyptian art of a portrait of a woman
Harry Potter
Indiana Jones
Robocop made of gold
Yoda
Name: ruVQGAN+CLIP
Author: nev
Original script: https://colab.research.google.com/drive/1wAnIHocDYFAbWtA7rk8C7cFEUdRyLzwZ
Time for 512×512 on a 3090: 1 minute 28 seconds
Maximum resolution on a 24 GB 3090: 1120×480.
Maximum resolution on an 8GB 2080: 256×256 1 minute 27 seconds
Description: Creates fairly blurry results. Even with post process sharpening. If anyone could get these results crisper it would be really improve the output.
a 3D render of a wizard by Gertrude Greene
a cubist painting of a Pokemon character
a cute creature
a matte painting of halloween by Carlos Trillo Name
a photorealistic painting of an alien landscape by Jacob Ochtervelt
a rough seascape filmic
a sea monster
a woodcut of a skull by Gu Hongzhong trending on ArtStation
Cthulhu
trypophobia
Name: Multi-Perceptor VQGAN+CLIP
Author: Remi Durant
Original script: https://colab.research.google.com/drive/1peZ98vBihDD9A1v7JdH5VvHDUuW5tcRK
Time for 512×512 on a 3090: 2 minute 30 seconds
Maximum resolution on a 24 GB 3090: 1120×480.
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: As with the previous Multi-Perceptor CLIP Guided Diffusion scripts this one allows two different CLIP models to be used to rate the VQGAN output images. VQGAN is not going to beat diffusion for image coherance, but this script can give some very nice lighting and fine details in images.
a bronze sculpture of an evil clown made of clay by Dionisio Baixeras Verdaguer
a fantasy land by Shigeru Aoki
a hyperrealistic painting of puppies
a midnineteenth century engraving of the Sydney Opera House
a statue of reflective spheres
a surrealist painting of a tropical beach
an alien city CGSociety
an oil painting of a fire breathing dragon
computer rendering of a well kept garden by Norman Garstin ZBrush
war CryEngine
Name: Hypertron
Author: Philipuss
Original script: https://colab.research.google.com/drive/10fa8X6EsfZfda1dfhJ_BtfPZ7Te1WGoX
Time for 512×512 on a 3090: 2 minute 00 seconds
Maximum resolution on a 24 GB 3090: 1120×480.
Maximum resolution on an 8GB 2080: 256×256 1 minute 35 seconds
Description: Another VQGAN based script. Has various “flavors” to give different results. Works OK. Can give the “image in a sea of purple/grey” that previous MSE based scripts suffered from. Still worth a try.
a black and white photo of a fireman
a cute monster by Józef Mehoffer
a matte painting of a forest clearing
a pop art painting of a human
a renaissance painting of a ghost by Jan van de Cappelle film
a sea monster made of metal
a tattoo of a zombie
a watercolor painting of a dragon Flickr
an art deco painting of a haunted house by Mary Cameron
concept art of a mountainscape by Maximilian Cercha
Name: CLIP Guided Diffusion Secondary Model Method
Author: Katherine Crowson
Original script: https://colab.research.google.com/drive/1mpkrhOjoyzPeSWy2r7T8EYRaU7amYOOi
Time for 512×512 on a 3090: 2 minute 28 seconds
Maximum resolution on a 24 GB 3090: 1792×768 or 2048×640.
Maximum resolution on an 8GB 2080: Unable to run on 8GB VRAM
Description: A new diffusion based script from Katherine Crowson including a new “secondary model” she trained. Capable of some unique results with good textures and lighting.
a detailed painting of Fozzy Bear by LeConte Stewart
a flemish baroque of a happy person trending on pixiv
a flock of birds
a Ghostbuster CGSociety
a kitchen made of cheese
a nightmare creature
a photorealistic painting of The Grinch
a portrait of a woman
an art deco painting of a sad clown
an oil painting of a nightmare
Any Others I Missed?
Do you know of any other colabs and/or github Text-to-Image systems I have missed? Let me know and I will see if I can convert them to work with Visions of Chaos for a future release. If you know of any public Discords with other colabs being shared let me know too.
Jason.
StyleCariGAN would be fun: https://github.com/PeterZhouSZ/StyleCariGAN
Looks interesting but I cannot automate a download of the model from dropbox and it seems to need linux only components. I get other strange errors I cannot find a fix for at the moment.
If they release a more simple to use version with an easily downloadable model I may be able to add a front end to it in Visions of Chaos, but for now, no go.
Thank you for trying, I know that some can be a git to install!
I’m not sure if this one, pytti, is any easier but it has no docs:
https://github.com/sportsracer48/pytti
Then this is another interesting implementation:
https://github.com/mehdidc/feed_forward_vqgan_clip
Although it tends to centralise detail, it should speed up video creation.
From what I know Pytti is behind a Patreon member’s only paywall at the moment. If I cannot share it with others then I am not going to spend time trying to get it working. If the author ever decides to freely share the script I will look at implementing a front end for it in Visions of Chaos, but in the meantime I am concentrating on the many other freely available and sharable scripts out there.
Feed Forward VQGAN is on my to look at list. I briefly tried it before but got stuck with dependencies problems and circular references.
Cataloging these scripts is incredibly valuable, thank you! I’m curious as to your methodology for finding all these scripts?
Scripts found by web searching, links online, people emailing me, reddit, etc.
If there was a good way to reliably search all public colabs for certain text then I could easily find many more.
I just updated to be able to run Multi-perceptor and the first results look promising. I’m running an automated batch now to see what comes up.
I would really like to see an implementation of StyleCLIP (https://github.com/orpatashnik/StyleCLIP) but I don’t know if this version is Linux-only.
Many thanks for your great work!
I had a quick look and I got the basic StyleCLIP “playground” colab notebook working locally. One issue is the model files. Because they are hosted on Google Drive I cannot easily automate the download. So I would need to redirect users to a web page that had instructions for “click here, then save it here” etc so they could manually update the required model files on their local PC.
When I tried to get the “StyleCLIP_global” notebook running locally it needs the linux ninja build. StyleCLIP_global is the notebook needed to modify any image/photo so at this time it won’t work. Maybe if someone can get all this working locally on Windows then I can add it to Visions of Chaos, but not yet.
Thanks for your effort. I’ll keep my eyes open for a windows-implementation.
I’ve been having a hard time getting the Multi-Perceptor Clip-Guided Diffusion script to generate outputs anywhere near as good as yours… my own system hit a VRAM bottleneck, so I’m running it directly from the linked Google Colab. What settings did you use to create those outputs? And did you tweak anything while implementing it into VoC?
Not every image is going to be a keeper. For my images I tend to run a batch process overnight with random prompts. Then the next day I pick the best 10 from the hundreds of results.
But, having a quick look at the colab, the main changes I made are;
timestep_respacing = ‘200’
cutn_batches = 1
Not using ddim seems to improve the results the most.
You can always see my version of the script as I include it with Visions of Chaos under
C:\Users\YourUserName\AppData\Roaming\Visions of Chaos\Examples\MachineLearning\Text To Image\CLIP Guided Diffusion\multi_perceptor_clip_guided_diffusion.py
The new ruDALL-E script looks amazing, but when I try to run it, I get a missing module error for ‘youtokentome’. When I execute the required pip-instruction I get an error saying this module requires Python 3.5 whereas I use 3.9.5 as per your installation instructions. Any idea how I can get this interesting script to run?
Try running these commands. That ensures all the latest libraries are installed and then checks PyTorch just in case.
pip install –no-cache-dir –ignore-installed –upgrade –force-reinstall deep-translator==1.5.4
pip install –no-cache-dir –ignore-installed –upgrade –force-reinstall cython==0.29.24
pip install –no-cache-dir –ignore-installed –upgrade –force-reinstall youtokentome==1.0.6
pip install –no-cache-dir –ignore-installed –upgrade –force-reinstall more-itertools==8.10.0
pip uninstall -y torch
pip uninstall -y torch
pip install –no-cache-dir –ignore-installed –upgrade –force-reinstall torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio===0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
pip list
Thanks for your swift reply! Using these commands, I got an incomprehensible error message about a non-existing option (-n or -m). I therefore did a complete reinstall of all packages and that did the trick. It is now running my standard list of test prompts, but what I’ve seen thus far, the results are disappointing. Even on simple instructions like ‘a dog’, I get imagery that’s nowhere near.
I’m doing most of my testing on Multi-perceptor, which works great at just 100 iterations. This reduces runtime to 1m43s on my system. I’ve also reworked and added to your prompt files and I now get interesting imagery most of the time. If you like, I can mail the altered txt-files for you to try.
Two final suggestions: the random seed option for running batches is useful, but would it be a lot of work to add a sequential option starting at seed X with increments of Y? Another useful feature would be the ability to use only a subset of styles, artists, etcetera using checkboxes before starting a random batch.
Have a nice day!
I’ve noticed an odd bug in Multi-perceptor that – in retrospect – must have been there for a while: it renders 6 iterations less than what’s asked for. I haven’t seen this behavior with other AI.
That was due to a hard coded skip iteration value. I have fixed it now for the next release so you get exactly the right number of iterations shown rather than 6 less than intended.
This bug was gone but in 88.1 it appears to be back. It’s not a big deal because I simply add 6. Just wanted to let you know.
I also noticed I now have more control over random batches, which makes me a happy camper!
When I experimented with this feature, I noticed I can’t seem to lock more than one line, though. After locking the subject, I also tried locking the medium, but that kept being randomized.
Yes, it is back to what it was.
I had to revert to the old code, because automatically adding 6 to the iterations caused problems when the ddim checkbox was checked. Triggered an error deep within some other python code that I couldn’t fix.
Hi – I wonder if you have any suggestions on use of the VQGAN+Clip Animation model? It looks interesting but I can’t seem to generate a movie except via the create movie from updates checkbox.
Hi Mark,
The “VQGAN+CLIP Animations” does only generate movies as you said (via checking the create frames from updates). This is because the zooming, panning and rotation is done within the script code rather than by Visions of Chaos between frames.
For more complete control over frames etc you want to use the script functionality. See here for how the other movie options work.
Regards,
Jason.
Hi Jason!
Is there any way to improve image coherence in VQGAN + CLIP scripts?
Not really. That is just the way they are. I haven’t seen any VQGAN based script that does not suffer from coherence issues. You are better off using one of the newer Multi-Perceptor CLIP Guided Diffusion scripts (if you have a GPU with lots of VRAM) as they give superior results compared to VQGAN.
Jason.
all your work here is so fascinating Jason, thank you for Visions of Chaos.
i was wondering if you would ever consider making your enhanced version of the Multi-Perceptor CLIP Guided Diffusion available on Colab for those of us without a high-end GPU.
while i have had lots of enjoyment and artistic fulfillment with other Colab notebooks, the examples you posted show that your enhancements to that script somehow encourage CLIP to wrangle faces into the proper number of eyes, nose and mouth, and located in the right place! that’s something i haven’t been able to do with any other method. faces are apparently very difficult for AIs!
I never use colab beyond downloading scripts to edit locally. Anyone can merge my edits with a new colab though. My versions of all the Text-to-Image scripts ship with Visions of Chaos.
Note that the example images in these blog posts are cherry picked best results from a lot of random prompts. No Text-to-Image system is going to give good images every time. If you want really good results you need to run multiple prompts overnight and then pick the best ones. Faces especially can take hundreds of outputs before you get that reasonable looking result. The basic rule for all Text-to-Image scripts is to have a lot of patience if you want the best results. My edits do not make the script suddenly able to make perfect faces every time.
Jason.