Adding PyTorch support to Visions of Chaos

TensorFlow 2

Recently after getting a new 3090 GPU I needed to update TensorFlow to version 2. Going from TensorFlow version 1 to TensorFlow version 2 had way too many code breaking changes for me. Looking at other github examples for TensorFlow 2 code (eg an updated Style Transfer script) gave me all sorts of errors. Not just one git repo either, lots of supposed TensorFlow 2 code would not work for me. If it is a pain for me it is going to be a bigger annoyance for my users. I already get enough emails saying “I followed your TensorFlow instructions exactly, but it doesn’t work”. I am in no way an expert in Python, TensorFlow or PyTorch, so I need something that for most of the time “just works”.

I did manage to get the current TensorFlow 1 scripts in Visions of Chaos running under TensorFlow 2, so at least the existing TensorFlow functionality will still work.

PyTorch

After having a look around and watching some YouTube videos I wanted to give PyTorch a go.

The install is one pip command they build for you on their home page after you select OS, CUDA, etc. So for my current TensorFlow tutorial (maybe I now need to change that to “Machine Learning Tutorial”) all I needed to do was add 1 more line to the pip install section.


pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 torchaudio===0.8.1 -f https://download.pytorch.org/whl/torch_stable.html

PyTorch Style Transfer

First Google hit is the PyTorch tutorial here. After spending most of a day banging my head against the wall with TensorFlow 2 errors, that single self contained Python script using PyTorch “just worked”! The settings do seem harder to tweak to get a good looking output compared to the previous TensorFlow Style Transfer script I use. After making the following examples I may need to look for another PyTorch Style Transfer script.

Here are some example results using Biscuit as the source image.

Biscuit

Biscuit Style Transfer

Biscuit Style Transfer

Biscuit Style Transfer

Biscuit Style Transfer

PyTorch DeepDream

Next up was ProGamerGov’s PyTorch DeepDream implementation. Again, worked fine. I have used ProGamerGov‘s TensorFlow DeepDream code in the past and it worked just as well this time. It gives a bunch of other models to use too, so more different DeepDream outputs for Visions of Chaos are now available.

Biscuit DeepDream

Biscuit DeepDream

Biscuit DeepDream

Biscuit DeepDream

PyTorch StyleGAN2 ADA

Using NVIDIA’s official PyTorch implentation from here. Also easy to get working. You can quickly generate images from existing models.

StyleGAN2 ADA

Metropolitan Museum of Art Faces – NVIDIA – metfaces.pkl

StyleGAN2 ADA

Trypophobia – Sid Black – trypophobia.pkl

StyleGAN2 ADA

Alfred E Neuman – Unknown – AlfredENeuman24_ADA.pkl

StyleGAN2 ADA

Trypophobia – Sid Black – trypophobia.pkl

I include the option to train your own models from a bunch of images. Pro tip: if you do not want to have nightmares do not experiment with training a model based on a bunch of naked women photos.

Going Forward

After these early experiments with PyTorch, I am going to use PyTorch from now on wherever possible.

Jason.

TensorFlow 2 and RTX 3090 Performance

A Pleasant Surprise

Just in time for when 3090 GPUs started to become available again in Sydney I was very generously gifted the funds to finally purchase a new GeForce RTX 3090 for my main development PC. After including a 1000 Watt power supply the total cost came to around $4000 AUD ($3000 USD). Such a rip off.

GeForce RTX™ 3090 GAMING X TRIO 24G

The card itself is really heavy and solid. They include a bracket to add support to help it not sag over time which is a nice touch. Like all recent hardware parts it lights up in various RGB colors and shades. These RGB rigs are going to look so out of date once this fad goes away. After upgrading my PC parts over the last few years I now have PCs that flash and blink more than my Christmas tree does when fully setup and lit.

Who needs a Christmas tree?

Not So Fast

I naively assumed that a quick GPU swap would give me the boost in performance that previous GPU upgrades did (like when I upgraded to the 1080 and then to the 2080 Super). Not this time. I ran a few quick machine learning TensorFlow (version 1) tests from Visions of Chaos and the result was either the Python scripts ran extremely slow (around 10x to 30x SLOWER) or they just crashed. So much for a simple upgrade for more power.

Turns out the Ampere architecture the 3090 GPUs use is only supported by CUDA 11.0 or higher. After updating CUDA, cuDNN, all the various Python libraries and the Python scripts I was back to where I was before the upgrades. If you have been through the tedious process of installing TensorFlow before for Visions of Chaos, you will need to follow my new instructions to get TensorFlow version 2 support. Updating TensorFlow v1 code to TensorFlow v2 code is a pain. From now on I am going to use PyTorch scripts for all my machine learning related needs.

High Temperatures

These newer GPUs can run hot. Under 100% load (when I was doing a Style Transfer movie with repeated frames being calculated one after the other) the 3090 peaks around 80 degrees C (176 F). I do have reasonable cooling in the case, but the air being blown out is noticeably hot. The 2080 running the same test peaks around 75 degrees.

The 2080 and my older 1080 seem to push all the hot exhaust air out the rear vents of the card, but the 3090 has no rear exhaust so all the hot air goes directly into the case. I can only assume this is due to them not being able to handle all that heat going “through” the card and out the back, so it needs to just vent that heat anywhere it can. This means if the card is running hot a lot of hot air goes straight into the case. When I touched the side of the case next to the GPU it was very warm.

Apparently 80 and under is perfectly fine and safe for a GPU, but they would say that wouldn’t they. They would be bragging about low temps if they could manufacture cooler running cards.

After some experimenting with Afterburner I lowered the temp limit from the GPU default of 83 degrees down to 75 degrees. This resulted in more throttling but only a slight performance hit (style transfer took 1 minute 21 seconds rather than 1 minute 14 seconds). The case was noticeably cooler and the average temp was now down to a much more chilly 65 degrees. Afterburner allows tweaking (overclocking/underclocking) of your GPU, but the most useful feature is its graphing capabilities to see what is really going on. You can monitor temperatures and throttling as you run complex GPU operations.

Extra Cooling

I wanted to see if more case fans would help, so I removed the current 3 stock case fans and installed 6 of these fans (2 sucking in at the front, 3 blowing out at the top, and 1 blowing out at the rear of the case). My silent PC is no longer silent. I set the GPU back to its default profile with a temp limit of 83 degrees and started another Style Transfer movie to keep the GPU pegged as close to 100% usage as possible for an extended period of time. Watching the temp graph in Afterburner shows peaks still up to 76 degrees, but much less throttling with the core clock graph around 95% to 100% maximum possible MHz when running so that results in a better overall performance.

After a week the extra noise annoyed me too much though so I replaced the Gamdias fans with Corsair fans. 6 of these fans and one of these controllers. Setting the fans to the default “quiet” profile gets the noise back down to near silent sound levels. When I start a machine learning batch run the temp sensors detect the increased heat in the case and ramp up the fans to compensate. Watching Afterburner graphs shows they may even be slightly better at cooling than the Gamdias fans. The problem with the auto-adjust speed control is that there is this noticeable ramping up and down of the fan speeds as they compensate for temp changes all the time (not just when the GPU is 100%). That was more annoying than always on full speed fans. After some adjustments and tests with the excellent Corsair software I settled on a custom temp curve that only cranked up as necessary when I start full 100% GPU load processing. Once the GPU usage drops back to normal the fans ramp down and are silent again.

Power Usage

Using one of those cheap inline watt meters shows the PC pulls 480 watts when the GPU is at 100% usage. Afterburner reports the card using around 290 watts under full load.

I have basically been using the 3090 24 hours a day training and testing machine learning setups since I bought it. 3 weeks with the 3090 usage made my latest quarterly electricity bill go up from $284 to $313. That works out to roughly $1.40 a day to power the GPU full time. If you can afford the GPU you should be able to afford the cost of powering it.

Final Advice

When experimenting with one of these newer GPUs use all the monitoring you need to make sure the card is not overheating and throttling performance. Afterburner is great to setup a graph showing GPU temp, usage, MHz and power usage. Monitor levels when the GPU is under 100% load and over an extended period of time.

Temperature controlled fans like the Corsair Commander Pro setup can work as a set and forget cooling solution once you tweak a custom temp curve that suits your usage and hardware.

Final Opinion

Was it worth spending roughly three times the cost of the 2080 on the 3090? No, definitely not. These current over inflated priced GPUs are not worth the money. But if you need or want one you have to pay the price. If the prices were not so artificially inflated and they sold at the initial recommended retail prices then it would be a more reasonable price (still expensive, but not ridiculously so).

After testing the various GPU related modes in Visions of Chaos, the 3090 is only between 10% to 70% faster than the 2080 Super depending on what GPU calculations are being made, and more often on the slower end of that scale. OpenGLSL shader performance is a fairly consistent speed boost between 10% and 15%.

The main reason I wanted the 3090 was for the big jump in VRAM from 8GB to 24GB so I am now able to train and run larger machine learning models without the dreaded out of memory errors. StyleGAN2 ADA models are the first things I have now successfully been able to train.

StyleGAN2 ADA - Alfred E Neuman

Upgrading the 1080 in my older PC to the 2080 Super is a big jump in performance and allows me to run less VRAM intensive sessions. Can you tell I am trying to convince myself this was a good purchase? I just expected more. Cue the “Ha ha, you idiot! Serves you right for not researching first.” comments.

Jason.

DeepDream – Part 3

DeepDream

This is the third part in a series of posts. See Part 1 and Part 2.

DeepDream

ProGamerGov Code

The script from Part 2 supports rendering 59 layers of the Inception model. Each of the 59 DeepDream layers have multiple channels that allow many more unique patterns and outputs.

I found this out thanks to ProGamerGov‘s script here.

DeepDream

There are 7,548 channels total. A huge number of patterns to explore and create movies from. If I followed the same principal as in part 2 and created a movie changing the channel every 10 seconds that would result in a movie almost 21 hours long. If each frame took around 25 seconds to render it would take 1310 DAYS to render all the frames. Not even I am that patient.

DeepDream

Channel Previews

The following links are previews of each layer and available channels within them. The layer, channel and other settings are included so you can reproduce them in Visions of Chaos if required.

DeepDream

As the layers get deeper the images get more complex. If you notice a layer name shown twice it is because it had too many channels within that layer to render into a valid image file so it had to be split into two separate images.

DeepDream

conv2d0_pre_relu
conv2d1_pre_relu
conv2d2_pre_relu

DeepDream

head0_bottleneck_pre_relu
head1_bottleneck_pre_relu

DeepDream

mixed3a_1x1_pre_relu
mixed3a_3x3_bottleneck_pre_relu
mixed3a_3x3_pre_relu
mixed3a_5x5_bottleneck_pre_relu
mixed3a_5x5_pre_relu
mixed3a_pool_reduce_pre_relu

DeepDream

mixed3b_1x1_pre_relu
mixed3b_3x3_bottleneck_pre_relu
mixed3b_3x3_pre_relu
mixed3b_5x5_bottleneck_pre_relu
mixed3b_5x5_pre_relu
mixed3b_pool_reduce_pre_relu

DeepDream

mixed4a_1x1_pre_relu
mixed4a_3x3_bottleneck_pre_relu
mixed4a_3x3_pre_relu
mixed4a_5x5_bottleneck_pre_relu
mixed4a_5x5_pre_relu
mixed4a_pool_reduce_pre_relu

DeepDream

mixed4b_1x1_pre_relu
mixed4b_3x3_bottleneck_pre_relu
mixed4b_3x3_pre_relu
mixed4b_5x5_bottleneck_pre_relu
mixed4b_5x5_pre_relu
mixed4b_pool_reduce_pre_relu

DeepDream

mixed4c_1x1_pre_relu
mixed4c_3x3_bottleneck_pre_relu
mixed4c_3x3_pre_relu
mixed4c_5x5_bottleneck_pre_relu
mixed4c_5x5_pre_relu
mixed4c_pool_reduce_pre_relu

DeepDream

mixed4d_1x1_pre_relu
mixed4d_3x3_bottleneck_pre_relu
mixed4d_3x3_pre_relu
mixed4d_3x3_pre_relu
mixed4d_5x5_bottleneck_pre_relu
mixed4d_5x5_pre_relu
mixed4d_pool_reduce_pre_relu

DeepDream

mixed4e_1x1_pre_relu
mixed4e_3x3_bottleneck_pre_relu
mixed4e_3x3_pre_relu
mixed4e_3x3_pre_relu
mixed4e_5x5_bottleneck_pre_relu
mixed4e_5x5_pre_relu
mixed4e_pool_reduce_pre_relu

DeepDream

mixed5a_1x1_pre_relu
mixed5a_3x3_bottleneck_pre_relu
mixed5a_3x3_pre_relu
mixed5a_3x3_pre_relu
mixed5a_5x5_bottleneck_pre_relu
mixed5a_5x5_pre_relu
mixed5a_pool_reduce_pre_relu

DeepDream

mixed5b_1x1_pre_relu
mixed5b_1x1_pre_relu
mixed5b_3x3_bottleneck_pre_relu
mixed5b_3x3_pre_relu
mixed5b_3x3_pre_relu
mixed5b_5x5_bottleneck_pre_relu
mixed5b_5x5_pre_relu
mixed5b_pool_reduce_pre_relu

Individual Sample Images

DeepDream

I was going to render each layer/channel combination as a 4K single image to really show the possible results, but after seeing it would take 15 minutes to generate each image I was looking at nearly 79 days to render all the example images. HDV 1920×1080 resolution will have to do for now (at least until the next generation of hopefully much faster consumer GPUs are released by Nvidia).

DeepDream

Even using two PCs (one with a 1080 GPU and one with a 2080 Super GPU) these images still took nearly 3 weeks to generate. Each image took 6 minutes on a 1080 GPU and 5 minutes on a 2080 Super GPU. Since working with neural networks and GPU computations (especially these week long all day sessions) I can see they do have a noticeable impact on my power bill. These GPUs are not electricity friendly.

DeepDream

See this gallery for all of the individual 7,548 channel images. Starts at page 4 to skip the more plain images from the initial layers.

DeepDream

Movie Samples

The following movies use a series of channels that follow a basic theme.

Eye imagery.

Architecture imagery.

Furry imagery.

Trypophobia imagery.

Availability

DeepDream Dialog

As long as you setup the TensorFlow pre-requisites you can run DeepDream processing from within Visions of Chaos.

Tutorial

The following tutorial goes into much more detail on using the DeepDream functionality within Visions of Chaos.

Jason.

DeepDream – Part 2

Part 2 of my DeepDream series of posts. See here for Part 1.

Darshan Bagul

This time the DeepDream code I experimented with is thanks to Darshan Bagul‘s HalluciNetwork code here.

Darshan’s code allows all of the DeepDream layers to be visualized (59 total) and from my tests the resulting images are much more rich in color and more “trippy”. It also does not require the auto-brightness tweak of the previous code in part 1.

Layers Visualized

Here are the results of processing a gray scale noise image with 10 of the neural network layers. Settings for rendering are 6 octaves, 200 iterations per octave, 1.5 step size and 2.0 rescale factor.

DeepDream

DeepDream

DeepDream

DeepDream

DeepDream

DeepDream

DeepDream

DeepDream

DeepDream

DeepDream

See this gallery for all of the example layer images.

Image Processing

These examples use the following photo of Miss Marple.

A selection of 10 processed images.

DeepDream Image Processing

DeepDream Image Processing

DeepDream Image Processing

DeepDream Image Processing

DeepDream Image Processing

DeepDream Image Processing

DeepDream Image Processing

DeepDream Image Processing

DeepDream Image Processing

DeepDream Image Processing

See this gallery for all of the example image processed images.

Movie Results

The following movie cycles through the layers of the DeepDream network going to a deeper layer every 10 seconds. The frames for this movie took almost 2 full weeks to render using an Nvidia 1080 GPU.

Availability

DeepDream Dialog

As long as you setup the TensorFlow pre-requisites you can run DeepDream processing from within Visions of Chaos.

Tutorial

The following tutorial goes into much more detail on using the DeepDream functionality within Visions of Chaos.

Jason.

DeepDream – Part 1

Another venture into the world of neural networks. This time experimenting with DeepDream. The original blog post describing DeepDream by Alex Mordvintsev is here and if you want to have a look at the original code see this github.

I have split these DeepDream posts into parts based on the source python code I was experimenting with at the time.

My basic explanation of DeepDream is that a convolutional neural network enhances what it thinks it sees within an image. So if the network detects that a part of an image looks like a bear’s head then that area will be enhanced and tweaked towards looking more like a bear’s head. Different layers of the neural network detect different patterns and give you different visual results.

Part 1

For this first part the most awesome sentdex got me going with deep dreaming after watching the following two YouTube videos of his;

Magnus Pederesen’s DeepDream Code

The sentdex videos use this code from Magnus Erik Hvass Pedersen.

Problem

One problem with the code is that the resulting images tend to be slightly darker than the source. This is not a big issue for single images, but when creating movies it results in the movie getting progressively darker over time.

A fix from the sentdex video is to add an amount to the RGB image values after deep dream processing, ie


img_result[:, :, 0] += brightness; #R
img_result[:, :, 1] += brightness; #G
img_result[:, :, 2] += brightness; #B

The problem with adding a fixed value is that you need to manually adjust it to suit each movie or source image and the frames still eventually tend towards black or white.

Fix

The first fix I tried was to scale the image array values to fit between 0 and 255 prior to saving the image.


#scale the passed array values to between x_min and x_max
def scale(X, x_min, x_max):
    range = x_max - x_min
    res = (X - X.min()) / X.ptp() * range + x_min
    return res

That works, but the overall brightness can change noticeably between frames resulting in an annoying strobing effect as it auto-adjusts the brightness each frame.

A tweak to fix the strobing is the following scaling.


#scale the passed array values to between x_min and x_max
def scale(X, x_min, x_max):
    range = x_max - x_min
    #res = np.sqrt((X - X.min())) / np.sqrt(X.ptp()) * range + x_min
    res = (X - X.min()) / X.ptp() * range + x_min
    # go one fifth of the way towards the desired scaled result
    res = X + (res - X) / 5
    return res

Rather than jumping straight to the scaled brightness value, the above code nudges or bumps the brightness 1/5th of the distance to the target brightness. This helps avoid the strobing brightness when creating DeepDream movie frames.

Final Code

Here is my final hacked together front end script that passes all the settings to Magnus’ Python script.


#front end to https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/14_DeepDream.ipynb
from deepdreamer import model, load_image, save_image, recursive_optimize
import sys
import numpy as np

#scale the passed array values to between x_min and x_max
def scale(X, x_min, x_max):
    range = x_max - x_min
    res = (X - X.min()) / X.ptp() * range + x_min
    # go half-way towards the desired scaled result to help decrease frames brightness "strobing"?
    # res = (X + res) / 2
    res = X + (res - X) / 5
    return res

#arguments passed in
sourceimage = str(sys.argv[1])
layernumber = int(sys.argv[2])
iterations = int(sys.argv[3])
stepsize = float(sys.argv[4])
rescalefactor = float(sys.argv[5])
passes = int(sys.argv[6])
blendamount = float(sys.argv[7])
autoscale = int(sys.argv[8])
outputimage = str(sys.argv[9])

layer_tensor = model.layer_tensors[layernumber]
file_name = sourceimage
img_result = load_image(filename='{}'.format(file_name))

img_result = recursive_optimize(layer_tensor=layer_tensor, image=img_result,
                 num_iterations=iterations, step_size=stepsize, rescale_factor=rescalefactor,
                 num_repeats=passes, blend=blendamount)

#auto adjust brightness
if autoscale:
    img_result = scale(img_result, 0, 255)

save_image(img_result,outputimage)

print("DeepDream processing complete")

Layer Images

Here are the results of running each of the DeepDream layers on an image of random gray scale noise.

DeepDream

DeepDream

DeepDream

DeepDream

DeepDream

DeepDream

DeepDream

DeepDream

DeepDream

DeepDream

Image Processing Results

DeepDream can be used to process single images. The DeepDream result is not just a texture overlay for the whole image. The processing will detect and follow contours and shapes within the image being processed.

These examples use the following photo of Miss Marple.

Here are the results of processing that photo using each of the available layers in DeepDream model.

DeepDream Image Processing

DeepDream Image Processing

DeepDream Image Processing

DeepDream Image Processing

DeepDream Image Processing

DeepDream Image Processing

DeepDream Image Processing

DeepDream Image Processing

DeepDream Image Processing

DeepDream Image Processing

Movie Results

If you repeatedly use the output of DeepDream as the input of another DeepDream you can make movies. Stretch each frame slightly and you get a nice zooming while morphing result. For this movie I changed the DeepDream layer every 300 frames (10 seconds) so the movie starts simple and gets more complex as deeper layers of the neural network are used for the visualizations. Rendering time was around 2 and a half days to generate all the frames using a Nvidia GTX 2080 Super.

Availability

DeepDream Dialog

As long as you setup the TensorFlow pre-requisites you can run DeepDream processing from within Visions of Chaos.

Tutorial

The following tutorial goes into much more detail on using the DeepDream functionality within Visions of Chaos.

Jason.

Long Short-Term Memory Music Composer

Automatic Music Composition

For almost as long as I have been able to program I have wanted to create software that could compose music.

I have had some primitive attempts at automated music composition over the years in Visions of Chaos. Mostly based on simple math formulas or genetic mutations of random note sequences. After recently having some success learning and implementing TensorFlow and neural networks for cellular automata searching I was keen to try using neural networks for music creation.

How It Works

The composer works by training a long short-term memory (LSTM) neural network. LSTM networks are good at predicting “what comes next” in a sequence of data. Another page that goes into more depths about LSTMs is here.

LSTM

The LSTM network is fed a bunch of different note sequences (in this case single channel midi files). Once the network has been trained sufficiently it is then able to create music that is similar to the training material.

LSTM

The above diagrams of LSTM internals may look daunting, but using TensorFlow and/or Keras makes creating and experimenting with LSTMs much simpler.

Source Music to Train Model

For these simpler LSTM composer networks you want source songs with just a single midi channel. Solo piano midi files work well for this. I found single piano midi files at Classical Piano Midi Page and mfiles that I used for training my models.

Music of different composers is put into separate folders. That way the user can select Bach, click the Compose button and have a piece of music generated that (hopefully) sounds a little like Bach.

LSTM Model

The model I based my code on was this example by Sigurður Skúli Sigurgeirsson which he describes in more detail here.

I ran the included lstm.py script and after 15 hours it had finished training. When I used the predict.py to generate midi files they all disappointingly contained just a repeated single note. I reran it two times more and got the same results.

The original model is


model = Sequential()
model.add(CuDNNLSTM(512,input_shape=(network_input.shape[1], network_input.shape[2]),return_sequences=True))
model.add(Dropout(0.3))
model.add(CuDNNLSTM(512, return_sequences=True))
model.add(Dropout(0.3))
model.add(CuDNNLSTM(512))
model.add(Dense(256))
model.add(Dropout(0.3))
model.add(Dense(n_vocab))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='rmsprop',metrics=["accuracy"])

Once I had plotting added to the script I saw why the model does not work.

The accuracy never rises as it should over time and the loss gets stuck at around 3.4 because of this. See the good plots further down this post to see what the accuracy plot should like look in a working model.

I have no idea why, but I gave up on that model and started tweaking settings.


model = Sequential()
model.add(CuDNNLSTM(512, input_shape=(network_input.shape[1], network_input.shape[2]), return_sequences=True))
model.add(Dropout(0.2))
model.add(BatchNormalization())
model.add(CuDNNLSTM(256))
model.add(Dropout(0.2))
model.add(BatchNormalization())
model.add(Dense(128, activation="relu"))
model.add(Dropout(0.2))
model.add(BatchNormalization())
model.add(Dense(n_vocab))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam',metrics=["accuracy"])

Smaller and fewer LSTM layers. I added BatchNormalization too after seeing it in a sentdex video. There are most likely better models, but this one worked OK for the rest of my training sessions.

Note that in both models I have replaced LSTM with CuDNNLSTM. This results in much faster LSTM training by using Cuda. If you do not have a Cuda capable GPU you would need to change these back to LSTM. Thanks to sentdex for this tip. Training new models and composing midi files is approximately 5 times as fast using CuDNNLSTM.

How Long Should You Train Your Model For

Depending on how long you train the model (how many epochs) determines how similar to the source music the results will be. Too few epochs and the output will have too many repeated notes. Too many epochs and the model will overfit and just copy the source music.

But how do you know how many epochs to stop at?

A simple method is to add a callback that saves the model and a plot of accuracy and loss every 50 epochs during a 500 epoch training run. That way, once the training is done you have 50 epoch increment models and graphs that show you exactly how the training is going.

Here are the graph results from a sample run saving every 50 epochs merged into an animated GIF file.

That is the sort of graph you want to see. Loss should drop down and stay down. Accuracy should rise and stay up near 100%.

You want to use a model with the epoch count corresponding to when the graphs first hit their limits. For the above graph this would be 150 epochs. Using any of the models beyond this would be using a model trained too long and most likely result in a model that just copies the source material.

The model those graphs are showing was trained on the “Anthems” midi files from here.


Sample midi output from the 150 epoch model.


Sample midi output from the 100 epoch model.

Even the 100 epoch model may be copying the source too closely. This could be due to the relatively small sample of midi files to train against. The training works better with more total notes.

When Training Goes Bad

The above is an example of what can and does happen sometimes during training. The loss is decreasing and accuracy increasing as they usually do and then suddenly they both crap out. At that point you may as well stop training. The model will not (at least from my experience) ever start training correctly again. In this case the saved 100 epoch model is too random and the 150 is just past the point the model failed. I am now saving every 25 epochs to be sure I get that sweet spot of the best trained model before it trains too much or fails.

Another example of training failing. This model was trained on the midi files from here. In this case it was going good until just after epoch 200. Using the epoch 200 model gives the following Midi output.

Without plotting you never know when or if the training has problems and if you may be able to still get a good model without having to retrain from scratch.

More Examples


75 epoch model based on Chopin.


50 epoch model based on Christmas Midi files.


100 epoch model based on Christmas Midi files. Not sure how “Christmasy” they sound?


300 epoch model based on Bach Midi files from here and here.


200 epoch model based on a single Balakirew Midi file from here.


200 epoch model based on Debussy.


175 epoch model based on Mozart.


100 epoch model based on Schubert.


200 epoch model based on Schumann.


200 epoch model based on Tchaikovsky.


175 epoch model based on folk songs.


100 epoch model based on nursery rhymes.


100 epoch model based on wedding music.


200 epoch model based on my own midi files that come from my YouTube movies soundtracks. This may be overtrained slightly as it generates what is more of a medly of my short 1 or 2 bar midi files.

Sheet Music

Once you have the midi file, you can use online tools like SolMiRe to convert them into sheet music. The following is the 200 epoch Softology midi file above.

Availability

The LSTM Composer is now included as part of Visions of Chaos.

LSTM Music Composer

You select a style from a dropdown list and click Compose. As long as you have the Python and TensorFlow pre-reqs installed (see here for instructions), within seconds (if you have a fast GPU) you will have a new machine composed midi file to listen to and use for any other purpose. No copyright. No royalties need to be paid. If you don’t like the results you can click Compose again a few seconds later you will have a new composition to listen to.

The results so far could not be considered full songs, but they all do contain interesting smaller sequences of notes that I will be using when creating music in the future. In this way the LSTM composer can be a good inspiration starting point for new songs.

Python Source

Here are the LSTM training and prediction Python scripts I am using. You do not need to have Visions of Chaos installed for these scripts to work and the training and midi generation will both work from the command line.

This is the training script “lstm_music_train.py”


# based on code from https://github.com/Skuldur/Classical-Piano-Composer
# to use this script pass in;
# 1. the directory with midi files
# 2. the directory you want your models to be saved to
# 3. the model filename prefix
# 4. how many total epochs you want to train for
# eg python -W ignore "C:\\LSTM Composer\\lstm_music_train.py" "C:\\LSTM Composer\\Bach\\" "C:\\LSTM Composer\\" "Bach" 500

import os
import tensorflow as tf

# ignore all info and warning messages
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2' 
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)

import glob
import pickle
import numpy
import sys
import keras
import matplotlib.pyplot as plt

from music21 import converter, instrument, note, chord
from datetime import datetime
from keras.models import Sequential
from keras.layers.normalization import BatchNormalization
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import CuDNNLSTM
from keras.layers import Activation
from keras.utils import np_utils
from keras.callbacks import TensorBoard
from shutil import copyfile

# name of midi file directory, model directory, model file prefix, and epochs
mididirectory = str(sys.argv[1])
modeldirectory = str(sys.argv[2])
modelfileprefix = str(sys.argv[3])
modelepochs = int(sys.argv[4])

notesfile = modeldirectory + modelfileprefix + '.notes'

# callback to save model and plot stats every 25 epochs
class CustomSaver(keras.callbacks.Callback):
	def __init__(self):
		self.epoch = 0	
	# This function is called when the training begins
	def on_train_begin(self, logs={}):
		# Initialize the lists for holding the logs, losses and accuracies
		self.losses = []
		self.acc = []
		self.logs = []
	def on_epoch_end(self, epoch, logs={}):
		# Append the logs, losses and accuracies to the lists
		self.logs.append(logs)
		self.losses.append(logs.get('loss'))
		self.acc.append(logs.get('acc')*100)
		# save model and plt every 50 epochs
		if (epoch+1) % 25 == 0:
			sys.stdout.write("\nAuto-saving model and plot after {} epochs to ".format(epoch+1)+"\n"+modeldirectory + modelfileprefix + "_" + str(epoch+1).zfill(3) + ".model\n"+modeldirectory + modelfileprefix + "_" + str(epoch+1).zfill(3) + ".png\n\n")
			sys.stdout.flush()
			self.model.save(modeldirectory + modelfileprefix + '_' + str(epoch+1).zfill(3) + '.model')
			copyfile(notesfile,modeldirectory + modelfileprefix + '_' + str(epoch+1).zfill(3) + '.notes');
			N = numpy.arange(0, len(self.losses))
			# Plot train loss, train acc, val loss and val acc against epochs passed
			plt.figure()
			plt.subplots_adjust(hspace=0.7)
			plt.subplot(2, 1, 1)
			# plot loss values
			plt.plot(N, self.losses, label = "train_loss")
			plt.title("Loss [Epoch {}]".format(epoch+1))
			plt.xlabel('Epoch')
			plt.ylabel('Loss')
			plt.subplot(2, 1, 2)
			# plot accuracy values
			plt.plot(N, self.acc, label = "train_acc")
			plt.title("Accuracy % [Epoch {}]".format(epoch+1))
			plt.xlabel("Epoch")
			plt.ylabel("Accuracy %")
			plt.savefig(modeldirectory + modelfileprefix + '_' + str(epoch+1).zfill(3) + '.png')
			plt.close()
			
# train the neural network
def train_network():

	sys.stdout.write("Reading midi files...\n\n")
	sys.stdout.flush()

	notes = get_notes()

	# get amount of pitch names
	n_vocab = len(set(notes))

	sys.stdout.write("\nPreparing note sequences...\n")
	sys.stdout.flush()

	network_input, network_output = prepare_sequences(notes, n_vocab)

	sys.stdout.write("\nCreating CuDNNLSTM neural network model...\n")
	sys.stdout.flush()

	model = create_network(network_input, n_vocab)

	sys.stdout.write("\nTraining CuDNNLSTM neural network model...\n\n")
	sys.stdout.flush()

	train(model, network_input, network_output)

# get all the notes and chords from the midi files
def get_notes():

	# remove existing data file if it exists
	if os.path.isfile(notesfile):
		os.remove(notesfile)
	
	notes = []

	for file in glob.glob("{}/*.mid".format(mididirectory)):
		midi = converter.parse(file)

		sys.stdout.write("Parsing %s ...\n" % file)
		sys.stdout.flush()

		notes_to_parse = None

		try: # file has instrument parts
			s2 = instrument.partitionByInstrument(midi)
			notes_to_parse = s2.parts[0].recurse() 
		except: # file has notes in a flat structure
			notes_to_parse = midi.flat.notes

		for element in notes_to_parse:
			if isinstance(element, note.Note):
				notes.append(str(element.pitch))
			elif isinstance(element, chord.Chord):
				notes.append('.'.join(str(n) for n in element.normalOrder))

	with open(notesfile,'wb') as filepath:
		pickle.dump(notes, filepath)

	return notes

# prepare the sequences used by the neural network
def prepare_sequences(notes, n_vocab):
	sequence_length = 100

	# get all pitch names
	pitchnames = sorted(set(item for item in notes))

	 # create a dictionary to map pitches to integers
	note_to_int = dict((note, number) for number, note in enumerate(pitchnames))

	network_input = []
	network_output = []

	# create input sequences and the corresponding outputs
	for i in range(0, len(notes) - sequence_length, 1):
		sequence_in = notes[i:i + sequence_length] # needs to take into account if notes in midi file are less than required 100 ( mod ? )
		sequence_out = notes[i + sequence_length]  # needs to take into account if notes in midi file are less than required 100 ( mod ? )
		network_input.append([note_to_int[char] for char in sequence_in])
		network_output.append(note_to_int[sequence_out])

	n_patterns = len(network_input)

	# reshape the input into a format compatible with CuDNNLSTM layers
	network_input = numpy.reshape(network_input, (n_patterns, sequence_length, 1))
	# normalize input
	network_input = network_input / float(n_vocab)

	network_output = np_utils.to_categorical(network_output)

	return (network_input, network_output)

# create the structure of the neural network
def create_network(network_input, n_vocab):

	'''
	""" create the structure of the neural network """
	model = Sequential()
	model.add(CuDNNLSTM(512, input_shape=(network_input.shape[1], network_input.shape[2]), return_sequences=True))
	model.add(Dropout(0.3))
	model.add(CuDNNLSTM(512, return_sequences=True))
	model.add(Dropout(0.3))
	model.add(CuDNNLSTM(512))
	model.add(Dense(256))
	model.add(Dropout(0.3))
	model.add(Dense(n_vocab))
	model.add(Activation('softmax'))
	model.compile(loss='categorical_crossentropy', optimizer='rmsprop',metrics=["accuracy"])
	'''
	
	model = Sequential()
	
	model.add(CuDNNLSTM(512, input_shape=(network_input.shape[1], network_input.shape[2]), return_sequences=True))
	model.add(Dropout(0.2))
	model.add(BatchNormalization())
	
	model.add(CuDNNLSTM(256))
	model.add(Dropout(0.2))
	model.add(BatchNormalization())
	
	model.add(Dense(128, activation="relu"))
	model.add(Dropout(0.2))
	model.add(BatchNormalization())
	
	model.add(Dense(n_vocab))
	model.add(Activation('softmax'))
	model.compile(loss='categorical_crossentropy', optimizer='adam',metrics=["accuracy"])
	
	return model

# train the neural network
def train(model, network_input, network_output):
	
	# saver = CustomSaver()
	# history = model.fit(network_input, network_output, epochs=modelepochs, batch_size=50, callbacks=[tensorboard])
	history = model.fit(network_input, network_output, epochs=modelepochs, batch_size=50, callbacks=[CustomSaver()])

	# evaluate the model
	print("\nModel evaluation at the end of training")
	train_acc = model.evaluate(network_input, network_output, verbose=0)
	print(model.metrics_names)
	print(train_acc)
	
	# save trained model
	model.save(modeldirectory + modelfileprefix + '_' + str(modelepochs) + '.model')

	# delete temp notes file
	os.remove(notesfile)
	
if __name__ == '__main__':
	train_network()

This is the midi generation script “lstm_music_predict.py”


# based on code from https://github.com/Skuldur/Classical-Piano-Composer
# to use this script pass in;
# 1. path to notes file
# 2. path to model
# 3. path to midi output
# eg python -W ignore "C:\\LSTM Composer\\lstm_music_predict.py" "C:\\LSTM Composer\\Bach.notes" "C:\\LSTM Composer\\Bach.model" "C:\\LSTM Composer\\Bach.mid"

# ignore all info and warning messages
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2' 
import tensorflow as tf
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)

import pickle
import numpy
import sys
import keras.models

from music21 import instrument, note, stream, chord
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import Activation

# name of weights filename
notesfile = str(sys.argv[1])
modelfile = str(sys.argv[2])
midifile = str(sys.argv[3])

# generates a piano midi file
def generate():
	sys.stdout.write("Loading notes data file...\n\n")
	sys.stdout.flush()

	#load the notes used to train the model
	with open(notesfile, 'rb') as filepath:
		notes = pickle.load(filepath)

	sys.stdout.write("Getting pitch names...\n\n")
	sys.stdout.flush()

	# Get all pitch names
	pitchnames = sorted(set(item for item in notes))
	# Get all pitch names
	n_vocab = len(set(notes))

	sys.stdout.write("Preparing sequences...\n\n")
	sys.stdout.flush()

	network_input, normalized_input = prepare_sequences(notes, pitchnames, n_vocab)

	sys.stdout.write("Loading LSTM neural network model...\n\n")
	sys.stdout.flush()

	model = create_network(normalized_input, n_vocab)

	sys.stdout.write("Generating note sequence...\n\n")
	sys.stdout.flush()

	prediction_output = generate_notes(model, network_input, pitchnames, n_vocab)

	sys.stdout.write("\nCreating MIDI file...\n\n")
	sys.stdout.flush()

	create_midi(prediction_output)

# prepare the sequences used by the neural network
def prepare_sequences(notes, pitchnames, n_vocab):
	# map between notes and integers and back
	note_to_int = dict((note, number) for number, note in enumerate(pitchnames))

	sequence_length = 100
	network_input = []
	output = []
	for i in range(0, len(notes) - sequence_length, 1):
		sequence_in = notes[i:i + sequence_length]
		sequence_out = notes[i + sequence_length]
		network_input.append([note_to_int[char] for char in sequence_in])
		output.append(note_to_int[sequence_out])

	n_patterns = len(network_input)

	# reshape the input into a format compatible with LSTM layers
	normalized_input = numpy.reshape(network_input, (n_patterns, sequence_length, 1))
	# normalize input
	normalized_input = normalized_input / float(n_vocab)

	return (network_input, normalized_input)

# create the structure of the neural network
def create_network(network_input, n_vocab):
	model = keras.models.load_model(modelfile)
	return model

# generate notes from the neural network based on a sequence of notes
def generate_notes(model, network_input, pitchnames, n_vocab):
	# pick a random sequence from the input as a starting point for the prediction
	start = numpy.random.randint(0, len(network_input)-1)

	int_to_note = dict((number, note) for number, note in enumerate(pitchnames))

	pattern = network_input[start]
	prediction_output = []

	# generate 500 notes
	for note_index in range(500):
		prediction_input = numpy.reshape(pattern, (1, len(pattern), 1))
		prediction_input = prediction_input / float(n_vocab)

		prediction = model.predict(prediction_input, verbose=0)

		index = numpy.argmax(prediction)
		result = int_to_note[index]
		prediction_output.append(result)

		pattern.append(index)
		pattern = pattern[1:len(pattern)]

		if (note_index + 1) % 50 == 0:
			sys.stdout.write("{} out of 500 notes generated\n".format(note_index+1))
			sys.stdout.flush()

	return prediction_output

# convert the output from the prediction to notes and create a midi file from the notes
def create_midi(prediction_output):
	offset = 0
	output_notes = []

	# create note and chord objects based on the values generated by the model
	for pattern in prediction_output:
		# pattern is a chord
		if ('.' in pattern) or pattern.isdigit():
			notes_in_chord = pattern.split('.')
			notes = []
			for current_note in notes_in_chord:
				new_note = note.Note(int(current_note))
				new_note.storedInstrument = instrument.Piano()
				notes.append(new_note)
			new_chord = chord.Chord(notes)
			new_chord.offset = offset
			output_notes.append(new_chord)
		# pattern is a note
		else:
			new_note = note.Note(pattern)
			new_note.offset = offset
			new_note.storedInstrument = instrument.Piano()
			output_notes.append(new_note)

		# increase offset each iteration so that notes do not stack
		offset += 0.5

	midi_stream = stream.Stream(output_notes)

	midi_stream.write('midi', fp=midifile)

if __name__ == '__main__':
	generate()

Model File Sizes

One downside to including neural networks with Visions of Chaos is file size. If model generation was quicker I would just include a button so the end user could train the model(s) themselves. But seeing as some of these training sessions can take days to train multiple models that is not really practical. A better solution is for me to do all the training and testing work and only include the best working models. This also means that the end user just has to click a button and the trained models are then used for the music compositions. I now download the 1 GB zip file of models automatically when the user starts the LSTM Composer mode for the first time.

What’s Next?

The LSTM composer as shown in this post is the most basic usage of neural networks to compose music.

I have found other neural network music composers that I will experiment with next so expect more music composition options to be included with Visions of Chaos in the future.

Jason.

Automatic Detection of Interesting Cellular Automata

This post has been in a draft state for at least a couple of years now. I revisit it whenever I get inspiration for a new idea. I wasn’t going to bother posting it until I had a better solution to the problem, but maybe these ideas can trigger a working solution in someone else’s mind.

Compared to my other blog posts this one is more rambling as it follows the paths I have gone down so far when trying to solve this problem.




Objective

Cellular automata tend to have huge parameter search spaces to find interesting results within. The vast majority of rules within this space will be junk rules with only a small fraction of a percentage being interesting rules. I have spent way too many hours repeatedly trying random rules when looking for new interesting cellular automata. Between the boring rules that either die out or rules that go chaotic there is that sweet spot of interesting rules. Finding these interesting rules is the problem.

Needle in a haystack

My ideal goal has always been to be able to run random rules repeatedly hands free and have software that is “clever” enough to determine the difference of interesting vs boring results. If the algorithms are good enough at detecting interesting then you can come back to the computer hours or days later and have a set of rules in a folder with preview images and/or movies to check out.

I want the smarts to be smart enough to work with a variety of CA types beyond the basic 2 state 2D cellular automata. Visions of Chaos contains many varieties of cellular automata with varying maximum cell states, dimensions and neighborhoods that I ultimately would like to be able to click a “Look for interesting rules” button.




Interesting Defined

Interesting is a very loose term. Maybe a few examples will help define what I mean when I say interesting.

Boring results are when a CA stabilizes to a fixed pattern or a pattern with very minimal change between steps.

Cellular Automaton

Cellular Automaton

Cellular Automaton

Chaotic results are when the CA turns into a screen of static with no real discernible patterns or features like gliders or other CA related structures. For a CA classifier these rules are also boring.

Cellular Automaton

Cellular Automaton

Cellular Automaton

Interesting is anything else. Rules like Game of Life, Brian’s Brain and others that create evolvable structures that survive after multiple cycles of the CA. This is what I want the software to be able to detect.

Cellular Automaton

Conway’s Game of Life – 23/3/2



Cellular Automaton

Brian’s Brain – /2/3



Cellular Automaton

Fireballs – 346/2/4




My Previous Search Methods

1. Random rules. Repeatedly generate random rules hoping to see an interesting result. Tedious to say the least, although the majority of the interesting cellular automata rules I have found over the years have been through repeatedly trying different random rules. While a boring TV show or movie is on I can repeatedly hit F3, F4 and Enter in Visions of Chaos while looking for interesting results. F3 stops the current CA running, F4 shows the settings dialog, Enter clicks the Random Rule button.

2. Brute force all possible rules. Only applicable for when the total number of rules is small (possible for some of the simpler 1D CAs). Most 2D CAs have millions or billions of possible rules and brute force rendering them all and then checking manually is impossible.

3. Mutating existing interesting rules. If you get an interesting rule, you can try mutating the rule slightly to try alternatives that may behave similarly yet better to the rule. Slightly usually means toggling one of the survival/birth checkboxes on/off. This has occasionally helped me find interesting rules or refine a rule to that sweet spot. The problem with CAs is that even changing one checkbox will usually result in a completely different result. The good results do not tend to “clump” together in the parameter space.

The rest of this blog post contains methods others and myself have tried to classify cellular automata behavior.




Wolfram Classification

Stephen Wolfram

Stephen Wolfram defined a rough set of 4 classifications for CAs.

Class 1: Nearly all initial patterns evolve quickly into a stable, homogeneous state. Any randomness in the initial pattern disappears.

Class 2: Nearly all initial patterns evolve quickly into stable or oscillating structures. Some of the randomness in the initial pattern may filter out, but some remains. Local changes to the initial pattern tend to remain local.

Class 3: Nearly all initial patterns evolve in a pseudo-random or chaotic manner. Any stable structures that appear are quickly destroyed by the surrounding noise. Local changes to the initial pattern tend to spread indefinitely.

Class 4: Nearly all initial patterns evolve into structures that interact in complex and interesting ways, with the formation of local structures that are able to survive for long periods of time.

Classes 1 to 3 would be considered “boring” for anyone trying random rules. Class 4 is that “sweet spot” of CAs that something interesting happens between dying out and chaotic explosions.

You can look at a CA after it has been discovered and put it into one of those 4 categories but that doesn’t help detecting interesting rules in Class 4.




Other Methods From Various Papers

Here are some other classification methods in papers I found or saw mentioned elsewhere. The mathematics is beyond me for most of them. I wish papers included a small snippet of source code with them that shows the math. I always find it much easier understanding and implementing some source code rather than try and understand formal equations.

Behavioral Metrics

Search Of Complex Binary Cellular Automata_Using_Behavioral_Metrics.

Entropy

Wolfram’s Universality And Complexity In Cellular Automata discusses “entropy” values that I don’t understand.

Wuensche’s Classifying Cellular Automata Automatically

Lyapunov Exponents

Stability Of Cellular Automata Trajectories Revisited : Branching Walks And Lyapunov Profiles.

Towards The Full Lyapunov Sprectrum Of Elementary Cellular Automata.

Kolmogorov–Chaitin Complexity

Asymptotic Behaviour And Ratios Of Complexity In Cellular Automata.

Genetic Algorithms

Searching For Complex CA Rules With GAs.

Evolving Continuous Cellular Automata For Aesthetic Objectives.

Extracting Cellular Automaton Rules Directly From Experimental Data.

Other Papers

Pattern Generation Using Likelihood Inference For Cellular Automata. 1D CAs.




MergeLife

Jeff Heaton uses genetic mutations to evolve cellular automata.




Langton’s Lambda

Chris Langton

Chris Langton defined a single number that can help predict if a CA will fall within the ordered realm. See his paper Computation at the edge of chaos for the mathematical definitions etc.

Langton called this number lambda. According to this page Lambda is calculated by counting the number of cells that have just been “born” that step of the CA and dividing it by the total CA cells. This gives a value between 0 and 1.

L = newlyborn/totalcellcount
L within 0.01 and 0.15 means a good rule to further investigate.

So if the grid is 20×20 in size and there were 50 cells that were newly born that CA cycle, then lambda would be 50/20×20=0.125

I skip the first 100 CA cycles to allow the CA to settle down and then average the lambda value for the next 50 steps.

As stated here there is no single value of lambda that will always give an interesting result. Langton’s paper and example applet are only concerned with 1D CA examples. I really want to find methods to search and classify 2D, 3D (and even 4D) cellular automata.

Rampe’s Lambdas

For lack of a better name, these are the “Rampe’s Lambda” values I experimented with as alternatives to Langton’s Lambda.

R1 = newlyborn/newlydead
R1 within 0.9 and 1.1 means a good rule to further investigate.

R2 = abs(newlyborn-newlydead)/totalcellcount
R2 within 0.001 and 0.005 means a good rule to further investigate.

R3 = (newlyborn+newlydead)/totalcellcount
R3 within 0.01 and 0.8 means a good rule to further investigate.

R4 = ((newlyborn/totalcellcount)+(newlydead/totalcellcount))/2
R4 within 0.01 and 0.23 means a good rule to further investigate.

R5 = % change in Langton’s Lambda between the last and current CA cycle
R5 within 0.01 and 0.1 means a good rule to further investigate.

Again, skip the first 100 cycles of the CA and then use the average lambda from the following 50 cycles.

Lambda Results

All of them (both Langton and my “Rampe” variations) are next to useless from my tests. I ran a bunch of known good rules and got mixed results. All the lambda’s gave enough false positives to not be of any use in searching for interesting new rules. You may as well use a random number generator to classify the rules.

Maybe they can be used to weed out the extreme class 1, 2 and 3 uninteresting dead rules, but they are not useful for classifying if a class 4 like result is interesting or not.




Fractal Dimension

Fractal Dimension CA Search

Another method I tried is finding the fractal dimension of the CA image using box counting. Fractal dimensions are unlike the usual 1D, 2D and 3D fixed dimensions and for a 2D image are and floating point value between 0 and 2.

The above screenshot shows the fractal dimension tests on existing sample interesting CA files. The results are all over the place with no “sweet spot” of dimension correlating to interesting. The way it works is that each CA is run for 50 steps, the image is converted to black and white (non black pixels in the image are changed to white) and then the dimension is calculated using the box counting method.

Increasing the range of dimension for “good” detection may result in the known interesting rules to pass the tests, but it then thinks a lot of uninteresting rules are then interesting, meaning you still need to manually sort good vs bad.

A fractal dimension between 1.0 and 1.4-1.5 can help weed out obvious “bad” results, but is really not helpful in hands free searching.




Compression Based Searching

Another new interesting idea on CA searching comes from Hugo Cisneros, Josef Sivic and Tomas Mikolov. Using data compression algorithms to rate CAs.

Their paper “Evolving Structures in Complex Systems” available here is an interesting read.

Source code accompanying the paper is provided here.




Neural Networks – Part 1

This was an idea I had for a while. Train a neural network to detect if a CA rule is interesting or not.

I was able to implement a rudimentary neural network system after watching these excellent videos from Dan Shiffman.

I went from almost zero knowledge of the internals of neural networks to much more comfortable and being able to code a working NN system. If you want to learn about the basics of coding a neural network I highly recommend Dan’s playlist.

For a neural network to be able to give you meaningful output (in this case if a CA rule is interesting or not) it needs to be trained with known good and bad data.

I tried creating a neural network with 19 inputs (9 for survival states, 9 for birth states and 1 for number of states) to cover the possible CA settings, ie

2D CA Rules

The neural network has 19 inputs, a number of neurons in the hidden layer and a single output neuron that does the interestingness prediction.

Neural Network

I mainly kept the hidden neuron count the same as the inputs, but I did experiment with other counts as the next diagram shows.

Neural Network

The known good and bad rules are fed through the neural network in random order for 10 million or more times. You can see how well the network is “learning” by tracking the mean squared error. As you repeatedly feed the network known data the error value should drop meaning the network is becoming more accurate at predicting the results you train it with.

Once the network is trained, you can run random rules and see if the prediction of the network matches your rating of if the CA is interesting or not. You can also repeatedly try random rules until they pass a threshold level of interesting. Every time a prediction is made the human can rate if the detection was correct. These human ratings are added back to the good and bad rule training pool so they can be used the next time the network is trained.

The end result is “just OK”. I used a well trained network (with a mean squared error of around 0.001) and got it to repeatedly try random rules until it found a rule it predicted would be interesting. The results are not always interesting. More interesting than purely sitting there clicking random repeatedly as I have done in the past, but there are still a lot of not interesting rules spat out. If I let the network run for a few hours and got it to save every rule it predicted to be interesting I would still have a tedious process of weeding out the actual interesting rules.

I don’t think inputs from survival and birth rules is the best way of doing this. This is because a toggle of a single survival or birth checkbox will usually drastically change the results from interesting to boring or just chaos. Also changing the maximum states each cell can have by 1 will cause well behaved rules to change into chaotic mess results.

One idea I need to try is using a basic NN like this that uses the lambda values above for inputs. Maybe then it can work out which combination of lambdas (and maybe fractal dimension) work together to create good rules. This is worth experimenting with when I get some time.




Neural Networks – Part 2

This time I am trying to get the network to detect interesting CAs by using images from a frame of the CAs. For each of the known good and bad rules I take the 100th frame as an input. I also repeat each of the rules 100 times to get 100 samples of each rule.

If I use a 64×64 sized grayscale image then there will now be 4,096 inputs to the network. Add another 100 hidden nodes and that makes a large and much slower network when training.

Run the CA rules on a 64×64 sized grid, convert the image pixels into the 4,096 inputs and train the network.

So far, no good results. The mean squared error falls very slowly. Maybe it would get better after days of training, but I am not that patient yet.

This online example and this article show how this method (a fully connected neural network) is never as accurate as a convolutional neural network. So, onto Part 3…




Neural Networks – Part 3

My next idea was to try using Convolutional Neural Networks. See here for a nice explanation of convolutional neural networks.

Convolutional Neural Network

CNN’s are made for image processing, feature extraction and detection. If a CNN can be trained to recognize digits and tell if a photo is of a cat or a dog then I should be able to use a CNN to “look at” a frame of a cellular automaton and tell me if it is interesting or not.

After watching a bunch of YouTube university lectures and tutorials on CNNs I decided not to extend my existing neural network code to handle CNNs. For the network sizes I will be training I need a real world library. I chose Google’s TensorFlow.

TensorFlow Logo

TensorFlow supports GPU acceleration with CUDA and is magnitudes faster and more reliable than anything I could code.

Once I managed to get Python, TensorFlow, Keras, CUDA and cuDNN installed correctly I was able to execute Python scripts from within Visions of Chaos and successfully run the example TensorFlow CNN MNIST code. That showed I had all the various components working as expected.

Creating Training Data for the CNN

Acquiring clean and accurate training data is vital for a good model. The more data the better.

I used the following steps to create a lot of training images;

1. Take a bunch of CA rules that I had previously ranked as either good or bad.

2. Run all of them over a 128×128 sized grid for 100 steps and save the 100th frame as a grayscale jpg file.

3. Step 2 can be repeated multiple times to increase the amount of training data. CAs starting from a random grid will always give you a unique 100th frame so this is an easy way to generate lots more training data.

4. Copy some of the generated images into a test folder. I usually move 1/10th of the total generated images into a test folder. These will be used to evaluate how accurate the model is at predictions once it has been trained. You want test data that is different to the data used to train and validate the model.

Good Cellular Automata Training Data

Examples of good CA frames



Bad Cellular Automata Training Data

Examples of bad CA frames


Quantity and Dimensions of Training Data Images

I tried image sizes between 32×32 pixels and 128×128 pixels. I also tried various zoomed in CA images with each cell being 2×2 pixels rather than a single pixel per cell.

For image counts I tried between 10,000 up to 300,000.

After days of generating images and training and testing models I found a good balance between image size and model accuracy was images 128×128 pixels in size with a single pixel per CA cell (so a CA grid of 128×128 too).

I also experimented with blurring the images thinking that may help search for more general patterns, but it did not seem to make any difference in the number found or accuracy of results.

One thing working with neural networks teaches you is patience. Generating the images is the slowest part of these experiments. If anyone is willing to gift me some decent high end CPUs and GPUs I would put them to good use.

Custom Input for CNNs

The best videos I found on using CNNs with custom images were these videos on YouTube by sentdex. Parts 1 to 6 of that playlist got me up and running.

Creating the Training Data for TensorFlow

Once you have your training images they need to be converted into a data format that TensorFlow can be trained with.

Again, I recommend the following sentdex video that covers how to create the training data.

The process to convert the training images into training data is fast and should not take longer than a minute or two.

Model Variations

Time to actually use this training data to train a convolutional neural network (what TensorFlow calls a model).

There are a wide variety of model and layer types to experiment with. For CNNs you basically start with one or more Conv2D layers followed by one or more Dense layers and finally a single output node to predict a probability of the image being good or bad.

Here are some models I tried during testing. From various sources and videos and pages I have seen. Running on an Nvidia 1080 GPU took around 2 hours per model to train (50 epochs each with 100,000 training images), which seemed lightning fast after waiting 30 hours for my training images to generate.


# Version 1
# Original model from sentdex videos
# https://youtu.be/WvoLTXIjBYU

model = Sequential()

model.add(Conv2D(64, (3,3), input_shape = X.shape[1:]))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Conv2D(64, (3,3)))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Flatten())

model.add(Dense(64))
model.add(Activation("relu"))

model.add(Dense(1))
model.add(Activation("sigmoid"))

model.compile(loss="binary_crossentropy",optimizer="adam",metrics=["accuracy"])
model.fit(X, y, batch_size=50, epochs=50, validation_split=0.3)

When the 50 epochs finish, you can plot the accuracy and loss vs the validation accuracy and loss.

TensorFlow training graph

Version 1 gave these results;
test loss, test acc: [0.13674676911142003, 0.9789666691422463]
98% accuracy with a loss of 13%
When I test a different unique set of images as test data I get;
14500 good images predicted as good – 301 good images predicted as bad – 97.97% predicted correctly
14653 bad images predicted as bad – 184 bad images predicted as good – 98.76% predicted correctly

One thing the above “Model loss” graph shows is overfitting. The val_loss graph should follow the loss graph and continue to go down. Instead of going down the line starts going up around the 5th epoch. This is an obvious sign of overfitting. Overfitting is bad. We don’t want overfitting. See here for more info on overfitting and how to avoid it.

The second suggestion from here mentions dropouts. Dropouts remove random links between nodes in the model network as it trains. This can help reduce overfitting. So let’s give that a go.


# Version 2
# Original model from sentdex videos
# https://youtu.be/WvoLTXIjBYU
# Adding dropouts to stop overfitting

model = Sequential()

model.add(Conv2D(64, (3,3), input_shape = X.shape[1:]))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.4))

model.add(Conv2D(64, (3,3)))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.4))

model.add(Flatten())

model.add(Dense(64))
model.add(Activation("relu"))
model.add(Dropout(0.4))

model.add(Dense(1))
model.add(Activation("sigmoid"))

model.compile(loss="binary_crossentropy",optimizer="adam",metrics=["accuracy"])
model.fit(X, y, batch_size=50, epochs=50, validation_split=0.3)

50 epochs finished with this graph.

TensorFlow training graph

Now the validation loss continues to generally go down with the loss graph. This shows overfitting is no longer occurring.

Version 2 gave these results;
test loss, test acc: [0.037338864829847204, 0.9866000044345856]
98% accuracy with a 13% loss
When I test a different unique set of images as test data I get;
14151 good images predicted as good – 68 good images predicted as bad – 99.52% predicted correctly
14326 bad images predicted as bad – 12 bad images predicted as good – 99.92% predicted correctly


# Version 3
# https://towardsdatascience.com/applied-deep-learning-part-4-convolutional-neural-networks-584bc134c1e2

model = Sequential()

model.add(Conv2D(32, (3,3), input_shape = X.shape[1:]))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Conv2D(64, (3,3)))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Conv2D(128, (3,3)))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Conv2D(128, (3,3)))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Flatten())
model.add(Dropout(0.5))

model.add(Dense(512))
model.add(Activation("relu"))

model.add(Dense(1))
model.add(Activation("sigmoid"))

model.compile(loss="binary_crossentropy",optimizer="adam",metrics=["accuracy"])
model.fit(X, y, batch_size=50, epochs=50, validation_split=0.3)

Graphs looked good without any obvious overfitting.

Version 3 gave these results;
test loss, test acc: [0.03628389219510306, 0.9891333370407422]
98% accuracy with 3% loss. Getting better.
When I test a different unique set of images as test data I get;
14669 good images predicted as good – 59 good images predicted as bad – 99.60% predicted correctly
14490 bad images predicted as bad – 62 bad images predicted as good – 99.57% predicted correctly


# Version 4
# http://www.dsimb.inserm.fr/~ghouzam/personal_projects/Simpson_character_recognition.html

model = Sequential()

model.add(Conv2D(32, (3,3), input_shape = X.shape[1:]))
model.add(Conv2D(32, (3,3)))
model.add(Activation("relu"))
# BatchNormalization better than Dropout? https://www.kdnuggets.com/2018/09/dropout-convolutional-networks.html
# model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.25))

model.add(Conv2D(64, (3,3)))
model.add(Conv2D(64, (3,3)))
model.add(Activation("relu"))
# BatchNormalization better than Dropout? https://www.kdnuggets.com/2018/09/dropout-convolutional-networks.html
# model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.25))

model.add(Conv2D(128, (3,3)))
model.add(Conv2D(128, (3,3)))
model.add(Activation("relu"))
# BatchNormalization better than Dropout? https://www.kdnuggets.com/2018/09/dropout-convolutional-networks.html
# model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.5))

model.add(Flatten())

model.add(Dense(64))
model.add(BatchNormalization())
model.add(Activation("relu"))

model.add(Dense(32))
model.add(BatchNormalization())
model.add(Activation("relu"))

model.add(Dense(16))
model.add(BatchNormalization())
model.add(Activation("relu"))

model.add(Dense(1))
model.add(Activation("sigmoid"))

model.compile(loss="binary_crossentropy",optimizer="adam",metrics=["accuracy"])
model.fit(X, y, batch_size=50, epochs=50, validation_split=0.3)

For this model I threw in multiple ideas from all previous models and more.

Version 4 gave these results;
test loss, test acc: [0.031484683321298994, 0.9896000043551128]
99% accuracy with a 3% loss. Best result so far.
When I test a different unique set of images as test data I get;
14383 good images predicted as good – 119 good images predicted as bad – 99.18% predicted correctly
14845 bad images predicted as bad – 4 bad images predicted as good – 99.97% predicted correctly

For the rest of my tests I used Version 4 for all training.

Tweaking your CNN models

See the sentdex videos above for a good example of how to tweak models and see how the variations rate. Use TensorBoard to see how they compare and optimize them.

TensorBoard Graphs

TensorBoard has other interesting histograms it will generate from your training like the following. I have no idea what this is telling me yet, but they look cool. Using histograms did seem to slow down the training with extended pauses between epochs, so unless you need them I recommend disabling them.

TensorBoard Histograms

Testing the Trained Model

Now it is finally time to put the model to the test.

Randomly set a CA rule, run it for 100 generations and then use model.predict on the 100th frame. This takes around 6 seconds per random rule.

The model.predict function returns a floating point value between 0 and 1.
Between 0 and 0.2 are classified as bad.
0.2 to 0.95 are classified as unsure.
0.95 to 1 are classified as good.

The prediction accuracy is better than any of the other methods shown previously in this post.

The rules it did detect in the bad category were all bad, so it does a great job there. No interesting rules got incorrectly classified as bad from my tests. I can safely ignore rules classified as bad which speeds up the search time as I don’t have to re-run the rules and create a sample movie.

The detected good rules did have a blend of interesting and boring/chaotic, but there were a lot less of them to check. Roughly 1% of total random rules are classified as good. The rules the model incorrectly predicts as interesting can be moved into the “known bad” folder and can be added to the next trained model (another 40 hours or so of my PC churning away generating images and training a new model).

The rules it predicted in the unsure 0.2 to 0.95 range did have features that were in the range between good and bad. Some of them would have made excellent good samples if only they were not as chaotic and “busy”.

Results

Here are some examples found from overnight convolutional neural network searches.

Cellular Automaton

TF247445 – 4567/2358/5 – Brian’s Brain with islands


Cellular Automaton

TF394435 – 34/256/3 – Another Brian like rule


Cellular Automaton

TF263299 – 3/25/3 – Over excited brain rule


Cellular Automaton

TF174642 – 15678/12678/2 – Solid islands grow amongst static


Cellular Automaton

TF1959733 – 1235678/23478/5 – Solidification


Cellular Automaton

TF2254533 – 0478/2356/5 – Waves with stable pixels


Other CNN Problems and Ideas

One problem is that CNNs seem to only detect shapes/gliders/patterns that are similar to the training data. After days of testing self searching with the CNN models there were no brand new different rules discovered. Just a bunch of very similar to existing rules and maybe a few slight tweaks. For example if a CNN is trained using only examples of Conway’s Game of Life CA then it is not going to predict Brian’s Brain is interesting if it randomly tries the rule for Brian’s Brain. The CNN needs to have previously seen the rule(s) it will detect as interesting. I did see slight variations found and scored as interesting, but for a new CA type without a lot of “good” rules to train on the CNN is not going to have problems finding new/different interesting rules. The main reason I want a “search for interesting” function is for when I have a new type of CA without a lot of known good rules. I want the search to be able to work without needing hundreds or thousands of examples of already rated good vs bad. Otherwise I need to sit there trying random rules for hours and manually rate them good or bad before training a new model specific to that CA rule.

Maybe using single frames is not the best idea. Maybe the difference between the 99th and 100th frame? Maybe a blur or average of 3 frames? This is still to be experimented with when I have another week to spend generating images and training and testing new models.

Then I thought maybe I am over training the models. If you train a neural network for too long it will overfit and then only be able to recognize the data you trained it with. This is as if it memorizes only the good data you gave it as good. It cannot generalize to detect other different good results as good. This results in new interesting CAs being potentially classified as bad. I did try lowering the training epochs from 50 to 10 to see if that helped detect more generalized interesting CA rules but it didn’t seem to make any difference. Even lowering it to 5 epochs trained a model that was still accurate at predictions. Plus the difference between random frames of good CAs shows it can detect gliders at different locations within frames.

Rather than train a model for each type of CA, train a model with examples from multiple CA types. Try and make the model more capable of general CA detection. Maybe it could then detect newer shapes/gliders in different new CA rules if it has a good general idea of what interesting CA features are from multiple different CAs. This may work? Another one for the to do list.

Convolutional Neural Networks (and neural networks in general) are not an instant win solution. You do need to do a lot of research about the various settings and do a lot of testing to get a good model which you can then use to predict the “things” you want the model to predict. But once you get a well trained model CNNs can be almost magical in how they can learn and be useful when solving problems.

The more I experiment with and learn about neural networks only makes me want to continue the journey. They really are fascinating. Using TensorFlow and Keras are a great way to get into the world of neural networks without having to code your own neural network system from scratch. I do recommend at least coding a basic feed forward neural network to get a good grip on the basics. When you jump into Keras the terminology will make more sense. YouTube has lots of good neural network related videos.




Availability to End Users

I have now included the trained (20 epochs Version 4 to hopefully leave a little room for finding more unique results) TensorFlow CNN model with Visions of Chaos. That means the end user does not need to do any image generation or training before using the CNN for searching. Python and TensorFlow need to be installed first, but after that the user can start a hands free search for interesting rules. When TensorFlow is installed and detected a search button appears on the 2D Cellular Automata dialog. Clicking Search starts a hands free random search and classification.

TensorFlow CNN CA Searching

The other search methods above are still hidden as they do not predict interesting with a high enough accuracy.




The End (for now)

If you managed to get this far, thanks for reading.

If you have some knowledge about any of the above methods that I missed please leave a reply or get in touch and let me know.

Any other ideas for cellular automata searching and classification are also welcome.

I will continue to update this post with any other methods I find in the future.

Jason.

Style Transfer GANs (Generative Adversarial Networks)

Style Transfer Generative Adversarial Networks take two images and apply the style from one image to the other image. Here are some sample results from here.

Style Transfer GAN examples

For a more technical explanation of how these work, you can refer to the following papers;

Image Style Transfer Using Convolutional Neural Networks
Artistic style transfer for videos
Preserving Color in Neural Artistic Style Transfer

Ever since first seeing this technique I wanted to add it as an image processing option within Visions of Chaos.

If you only want to play around with style transfer or only have a few photos you want to experiment with, then I recommend you use an online service like DeepArt because this can be a tedious process to setup and use on your own PC.

How It Works

Behind the scenes the style transfer processing uses Cameron Smith‘s excellent Python script from here. After trying various Style Transfer related scripts that one gives the sharpest and most interesting results. See that link if you want to run these sort of style transfers yourself from the command line outside Visions of Chaos.

Installing Style Transfer Prerequisites

If you want to use style transfer from within Visions of Chaos you need to follow these steps to get Python, Python Libraries, CUDA and CuDNN installed.

Style Transfer in Visions of Chaos

Generate any image, then select Image->Image Processing->Style Transfer.

Visions of Chaos Style Transfer GAN

The first time you select Style Transfer it will download the 500 MB neural network model that is used for all the style transfer magic.

Start with smaller image sizes to get an idea of how long the process will take on your system before going for larger sized images.

You can also select any external image file to apply the style transfer to. So dig out those cat photos and have fun. Note that if you get tired of the limited style images that come with Visions of Chaos, you can put any image you like under the Style Transfer folder (by default this will be C:\Users\\AppData\Roaming\Visions of Chaos\Examples\TensorFlow\Style Transfer\) and use those. Grab an image of your favorite artist’s works and experiment.

For these next examples I used the following photo of Miss Marple.

Miss Marple

And applied some various transfer style images.

MC Escher Plane Filling II

Miss Marple Style Transfer GAN

A Mandelbrot fractal

Miss Marple Style Transfer GAN

Another Mandelbrot fractal

Miss Marple Style Transfer GAN

HR Giger Biomechanical Landscape

Miss Marple Style Transfer GAN

Kandinsky Composition VII

Miss Marple Style Transfer GAN

Mondrian

Miss Marple Style Transfer GAN

Monet

Miss Marple Style Transfer GAN

Picasso Les Femmes d’Alger

Miss Marple Style Transfer GAN

Picasso Seated Nude

Miss Marple Style Transfer GAN

Hokusai The Great Wave off Kanagawa

Miss Marple Style Transfer GAN

Munch The Scream

Miss Marple Style Transfer GAN

Turner The Wreck of a Transport Ship

Miss Marple Style Transfer GAN

van Gogh Starry Night

Miss Marple Style Transfer GAN

Troubleshooting

If you get a failed style transfer and an error message, here are a few things to try;
1. Smaller image size. Depending on the RAM in your PC and GPU you may have maxed out.
2. Wait 30 seconds and try again. This seems to help sometimes.
3. Reboot. If all else fails. Seems to always fix a stubborn error for me. The Cuda and/or cuDNN seem to be the main culprit. They get hung or locked or something and only a reboot will get them working again.

Style Transfer Movies

I have now added options to create style transfer movies. This works by starting with an image of random RGB noise. Style transfer is applied, then the resulting image is slightly stretched to cause the zooming in. The style transfer and stretch are repeated for each frame of the movie.

I created a movie using this technique that used a tasteful nude image as the style image. After repeatedly iterating the style transfer over the previous output it started to create some disturbing imagery. I thought this was an interesting result (although it did make me go “ewwwww” while looking at it) so I uploaded it to my YouTube channel.

Within a few minutes the video had been removed by one of the YouTube bots and flagged as pornography. “Pornographic or sexually explicit content that is meant to be sexually gratifying is not allowed on YouTube.” Now, I don’t know about you, but the following pictures do not cause any arousal or sexual gratification in me.

Style Transfer GAN

Style Transfer GAN

Style Transfer GAN

Style Transfer GAN

Style Transfer GAN

If you think you see any intimate lady parts in the above images they were not in the style transfer source image. All the evil looking sores, boils, and other nasty looking pareidolia features all came from the depths of the neural network. The only thing that seems to be a direct copy from the source style image are the flesh tones.

I decided to lodge an appeal explaining how the imagery was the result of a neural network style transfer and not pornography (all in the limited 300 characters they give you to plead your case). I was hoping a real person would take a look at the movie and decide on if it was porn or not and respond to my appeal. All that happened was shortly after the appeal was lodged the movie info showed “Appeal rejected. No further action can be taken on your part.” So, after nearly 13 years on YouTube I now have my first warning and no way of talking with a real person to discuss what happened.

I understand that from YouTube’s perspective even if they employed a million dedicated staff to watch and manually review videos that get flagged it would still probably not be enough, but relying on a neural network and/or “AI” as the decider without any human intervention is not the answer. Maybe they should at least have manual reviews for abuse claims/detection on channels that have been active for over 5 or 10 years without any prior warnings or strikes. I am not sure there is a workable solution to the problem.

Anyway, I joined Bit Chute so you can now watch the movie in all its controversial glory here.

Tutorial

I also created the following tutorial that covers style transfer within Visions of Chaos in much more detail.

Jason.