NOTE: Make sure you also see this post that has a summary of all the Text-to-Image scripts supported by Visions of Chaos with example images.
More Fascinating Text-to-Image
This time “Deep Daze Fourier” from Vadim Epstein. Code available in this notebook.
Compared to the last Deep Daze that generated washed out and pastel shaded results this Deep Daze creates images with sharp, crisp bright colors.
“Shrek eating pizza”
“H R Giger”
“Surrealist Homer Simpson”
This and the previous Text-to-Image systems I have experimented with (here, here and here) are now supported by a GUI front end in Visions of Chaos. As long as you install these prerequisites and have a decent GPU you will be able to run these systems yourself.
For those who love to tinker I have now added a bunch more of the script parameters so you no longer have to edit the Python source code outside Visions of Chaos.
If you know of any other Text-to-Image systems (with sharable open-source code) then please let me know. All of the Text-to-Image systems I have tested so far all have their own unique behaviors and outputs so I will always be on the lookout for more new variations.
Started using Visions of Chaos about a week ago and I’m having heaps of fun with all the machine learning options available from within the program. So let me preface my comment by saying thanks for all the work you’re putting into making things like this available to a wider audience!
As a heads-up for any other folks trying out Deep Daze Fourier: I’m getting out-of-memory errors on my RTX 3080 Laptop with 16 Gb of VRAM when using the model with an output resolution of 512 x 512, the 256 x 256 setting works fine.So better bring a beast of a GPU with heaps of dedicated VRAM if you prefer to use this one with all of its possibilities 🙂
I would say hang in there if you have smaller GPUs. With only my 6Gb 1060 (stop laughing), I have produced full 512×512 images. I’ve only started playing with the settings but reducing the Siren settings seem to help with memory usage. Not completely sure what I am loosing yet but they are taking about 10-12 minutes each.
So back again with some results. If we look at the settings Siren layers, Siren hidden features and Fourier maps while keeping the other settings at their defaults, the following combinations work to create 512×512 images with 6Gb of VRAM.
Each combination produces a different style of image and I can’t say what would be your perfect combination as different text input might work with more color, more detail, more focus etc.
All good stuff! Thanks for another learning method and a better way to play with the settings.
thanks again for including another awesome update – off to try it out right now!
hi, Hi, this notebook is an improvement but I have not been able to spread it because I don’t know how to place the text.
Thanks. I have already seen that one and include it in my summary post.
It is definitely one of the better scripts.
This is wonderful, what a great work your doing for us all here.
I was wonderring if you tried implementing this yet?https://github.com/dribnet/clipit
Thanks for the link. I had a quick look and I will be able to include CLIPIT with the next release of Visions of Chaos.
CLIPIT is now supported in the latest version of Visions of Chaos I just released.