Long Short-Term Memory Music Composer

Automatic Music Composition

For almost as long as I have been able to program I have wanted to create software that could compose music.

I have had some primitive attempts at automated music composition over the years in Visions of Chaos. Mostly based on simple math formulas or genetic mutations of random note sequences. After recently having some success learning and implementing TensorFlow and neural networks for cellular automata searching I was keen to try using neural networks for music creation.

How It Works

The composer works by training a long short-term memory (LSTM) neural network. LSTM networks are good at predicting “what comes next” in a sequence of data. Another page that goes into more depths about LSTMs is here.

LSTM

The LSTM network is fed a bunch of different note sequences (in this case single channel midi files). Once the network has been trained sufficiently it is then able to create music that is similar to the training material.

LSTM

The above diagrams of LSTM internals may look daunting, but using TensorFlow and/or Keras makes creating and experimenting with LSTMs much simpler.

Source Music to Train Model

For these simpler LSTM composer networks you want source songs with just a single midi channel. Solo piano midi files work well for this. I found single piano midi files at Classical Piano Midi Page and mfiles that I used for training my models.

Music of different composers is put into separate folders. That way the user can select Bach, click the Compose button and have a piece of music generated that (hopefully) sounds a little like Bach.

LSTM Model

The model I based my code on was this example by Sigurður Skúli Sigurgeirsson which he describes in more detail here.

I ran the included lstm.py script and after 15 hours it had finished training. When I used the predict.py to generate midi files they all disappointingly contained just a repeated single note. I reran it two times more and got the same results.

The original model is


model = Sequential()
model.add(CuDNNLSTM(512,input_shape=(network_input.shape[1], network_input.shape[2]),return_sequences=True))
model.add(Dropout(0.3))
model.add(CuDNNLSTM(512, return_sequences=True))
model.add(Dropout(0.3))
model.add(CuDNNLSTM(512))
model.add(Dense(256))
model.add(Dropout(0.3))
model.add(Dense(n_vocab))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='rmsprop',metrics=["accuracy"])

Once I had plotting added to the script I saw why the model does not work. The accuracy never rises as it should over time. See the good plots further down this post to see what the accuracy plot should like look in a working model.

I have no idea why, but I gave up on that model and started tweaking settings.


model = Sequential()
model.add(CuDNNLSTM(512, input_shape=(network_input.shape[1], network_input.shape[2]), return_sequences=True))
model.add(Dropout(0.2))
model.add(BatchNormalization())
model.add(CuDNNLSTM(256))
model.add(Dropout(0.2))
model.add(BatchNormalization())
model.add(Dense(128, activation="relu"))
model.add(Dropout(0.2))
model.add(BatchNormalization())
model.add(Dense(n_vocab))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam',metrics=["accuracy"])

Smaller and fewer LSTM layers. I added BatchNormalization too after seeing it in a sentdex video. There are most likely better models, but this one worked OK for the rest of my training sessions.

Note that in both models I have replaced LSTM with CuDNNLSTM. This results in much faster LSTM training by using Cuda. If you do not have a Cuda capable GPU you would need to change these back to LSTM. Thanks to sentdex for this tip. Training new models and composing midi files is approximately 5 times as fast using CuDNNLSTM.

How Long Should You Train Your Model For

Depending on how long you train the model (how many epochs) determines how similar to the source music the results will be. Too few epochs and the output will have too many repeated notes. Too many epochs and the model will overfit and just copy the source music.

But how do you know how many epochs to stop at?

A simple method is to add a callback that saves the model and a plot of accuracy and loss every 50 epochs during a 500 epoch training run. That way, once the training is done you have 50 epoch increment models and graphs that show you exactly how the training is going.

Here are the graph results from a sample run saving every 50 epochs merged into an animated GIF file.

That is the sort of graph you want to see. Loss should drop down and stay down. Accuracy should rise and stay up near 100%.

You want to use a model with the epoch count corresponding to when the graphs first hit their limits. For the above graph this would be 150 epochs. Using any of the models beyond this would be using a model trained too long and most likely result in a model that just copies the source material.

The model those graphs are showing was trained on the “Anthems” midi files from here.


Sample midi output from the 150 epoch model.


Sample midi output from the 100 epoch model.

Even the 100 epoch model may be copying the source too closely. This could be due to the relatively small sample of midi files to train against. The training works better with more total notes.

When Training Goes Bad

The above is an example of what can and does happen sometimes during training. The loss is decreasing and accuracy increasing as they usually do and then suddenly they both crap out. At that point you may as well stop training. The model will not (at least from my experience) ever start training correctly again. In this case the saved 100 epoch model is too random and the 150 is just past the point the model failed. I am now saving every 25 epochs to be sure I get that sweet spot of the best trained model before it trains too much or fails.

Another example of training failing. This model was trained on the midi files from here. In this case it was going good until just after epoch 200. Using the epoch 200 model gives the following Midi output.

Without plotting you never know when or if the training has problems and if you may be able to still get a good model without having to retrain from scratch.

More Examples


75 epoch model based on Chopin.


50 epoch model based on Christmas Midi files.


100 epoch model based on Christmas Midi files. Not sure how “Christmasy” they sound?


300 epoch model based on Bach Midi files from here and here.


200 epoch model based on a single Balakirew Midi file from here.


200 epoch model based on Debussy.


175 epoch model based on Mozart.


100 epoch model based on Schubert.


200 epoch model based on Schumann.


200 epoch model based on Tchaikovsky.


175 epoch model based on folk songs.


100 epoch model based on nursery rhymes.


100 epoch model based on wedding music.


200 epoch model based on my own midi files that come from my YouTube movies soundtracks. This may be overtrained slightly as it generates what is more of a medly of my short 1 or 2 bar midi files.

Sheet Music

Once you have the midi file, you can use online tools like SolMiRe to convert them into sheet music. The following is the 200 epoch Softology midi file above.

Availability

The LSTM Composer is now included as part of Visions of Chaos.

LSTM Music Composer

You select a style from a dropdown list and click Compose. As long as you have the Python and TensorFlow pre-reqs installed (see here for instructions), within seconds (if you have a fast GPU) you will have a new machine composed midi file to listen to and use for any other purpose. No copyright. No royalties need to be paid. If you don’t like the results you can click Compose again a few seconds later you will have a new composition to listen to.

The results so far could not be considered full songs, but they all do contain interesting smaller sequences of notes that I will be using when creating music in the future. In this way the LSTM composer can be a good inspiration starting point for new songs.

Python Source

Here are the LSTM training and prediction Python scripts I am using. You do not need to have Visions of Chaos installed for these scripts to work and the training and midi generation will both work from the command line.

This is the training script “lstm_music_train.py”


# based on code from https://github.com/Skuldur/Classical-Piano-Composer
# to use this script pass in;
# 1. the directory with midi files
# 2. the directory you want your models to be saved to
# 3. the model filename prefix
# 4. how many total epochs you want to train for
# eg python -W ignore "C:\\LSTM Composer\\lstm_music_train.py" "C:\\LSTM Composer\\Bach\\" "C:\\LSTM Composer\\" "Bach" 500

import os
import tensorflow as tf

# ignore all info and warning messages
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2' 
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)

import glob
import pickle
import numpy
import sys
import keras
import matplotlib.pyplot as plt

from music21 import converter, instrument, note, chord
from datetime import datetime
from keras.models import Sequential
from keras.layers.normalization import BatchNormalization
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import CuDNNLSTM
from keras.layers import Activation
from keras.utils import np_utils
from keras.callbacks import TensorBoard
from shutil import copyfile

# name of midi file directory, model directory, model file prefix, and epochs
mididirectory = str(sys.argv[1])
modeldirectory = str(sys.argv[2])
modelfileprefix = str(sys.argv[3])
modelepochs = int(sys.argv[4])

notesfile = modeldirectory + modelfileprefix + '.notes'

# callback to save model and plot stats every 25 epochs
class CustomSaver(keras.callbacks.Callback):
	def __init__(self):
		self.epoch = 0	
	# This function is called when the training begins
	def on_train_begin(self, logs={}):
		# Initialize the lists for holding the logs, losses and accuracies
		self.losses = []
		self.acc = []
		self.logs = []
	def on_epoch_end(self, epoch, logs={}):
		# Append the logs, losses and accuracies to the lists
		self.logs.append(logs)
		self.losses.append(logs.get('loss'))
		self.acc.append(logs.get('acc')*100)
		# save model and plt every 50 epochs
		if (epoch+1) % 25 == 0:
			sys.stdout.write("\nAuto-saving model and plot after {} epochs to ".format(epoch+1)+"\n"+modeldirectory + modelfileprefix + "_" + str(epoch+1).zfill(3) + ".model\n"+modeldirectory + modelfileprefix + "_" + str(epoch+1).zfill(3) + ".png\n\n")
			sys.stdout.flush()
			self.model.save(modeldirectory + modelfileprefix + '_' + str(epoch+1).zfill(3) + '.model')
			copyfile(notesfile,modeldirectory + modelfileprefix + '_' + str(epoch+1).zfill(3) + '.notes');
			N = numpy.arange(0, len(self.losses))
			# Plot train loss, train acc, val loss and val acc against epochs passed
			plt.figure()
			plt.subplots_adjust(hspace=0.7)
			plt.subplot(2, 1, 1)
			# plot loss values
			plt.plot(N, self.losses, label = "train_loss")
			plt.title("Loss [Epoch {}]".format(epoch+1))
			plt.xlabel('Epoch')
			plt.ylabel('Loss')
			plt.subplot(2, 1, 2)
			# plot accuracy values
			plt.plot(N, self.acc, label = "train_acc")
			plt.title("Accuracy % [Epoch {}]".format(epoch+1))
			plt.xlabel("Epoch")
			plt.ylabel("Accuracy %")
			plt.savefig(modeldirectory + modelfileprefix + '_' + str(epoch+1).zfill(3) + '.png')
			plt.close()
			
# train the neural network
def train_network():

	sys.stdout.write("Reading midi files...\n\n")
	sys.stdout.flush()

	notes = get_notes()

	# get amount of pitch names
	n_vocab = len(set(notes))

	sys.stdout.write("\nPreparing note sequences...\n")
	sys.stdout.flush()

	network_input, network_output = prepare_sequences(notes, n_vocab)

	sys.stdout.write("\nCreating CuDNNLSTM neural network model...\n")
	sys.stdout.flush()

	model = create_network(network_input, n_vocab)

	sys.stdout.write("\nTraining CuDNNLSTM neural network model...\n\n")
	sys.stdout.flush()

	train(model, network_input, network_output)

# get all the notes and chords from the midi files
def get_notes():

	# remove existing data file if it exists
	if os.path.isfile(notesfile):
		os.remove(notesfile)
	
	notes = []

	for file in glob.glob("{}/*.mid".format(mididirectory)):
		midi = converter.parse(file)

		sys.stdout.write("Parsing %s ...\n" % file)
		sys.stdout.flush()

		notes_to_parse = None

		try: # file has instrument parts
			s2 = instrument.partitionByInstrument(midi)
			notes_to_parse = s2.parts[0].recurse() 
		except: # file has notes in a flat structure
			notes_to_parse = midi.flat.notes

		for element in notes_to_parse:
			if isinstance(element, note.Note):
				notes.append(str(element.pitch))
			elif isinstance(element, chord.Chord):
				notes.append('.'.join(str(n) for n in element.normalOrder))

	with open(notesfile,'wb') as filepath:
		pickle.dump(notes, filepath)

	return notes

# prepare the sequences used by the neural network
def prepare_sequences(notes, n_vocab):
	sequence_length = 100

	# get all pitch names
	pitchnames = sorted(set(item for item in notes))

	 # create a dictionary to map pitches to integers
	note_to_int = dict((note, number) for number, note in enumerate(pitchnames))

	network_input = []
	network_output = []

	# create input sequences and the corresponding outputs
	for i in range(0, len(notes) - sequence_length, 1):
		sequence_in = notes[i:i + sequence_length] # needs to take into account if notes in midi file are less than required 100 ( mod ? )
		sequence_out = notes[i + sequence_length]  # needs to take into account if notes in midi file are less than required 100 ( mod ? )
		network_input.append([note_to_int[char] for char in sequence_in])
		network_output.append(note_to_int[sequence_out])

	n_patterns = len(network_input)

	# reshape the input into a format compatible with CuDNNLSTM layers
	network_input = numpy.reshape(network_input, (n_patterns, sequence_length, 1))
	# normalize input
	network_input = network_input / float(n_vocab)

	network_output = np_utils.to_categorical(network_output)

	return (network_input, network_output)

# create the structure of the neural network
def create_network(network_input, n_vocab):

	'''
	""" create the structure of the neural network """
	model = Sequential()
	model.add(CuDNNLSTM(512, input_shape=(network_input.shape[1], network_input.shape[2]), return_sequences=True))
	model.add(Dropout(0.3))
	model.add(CuDNNLSTM(512, return_sequences=True))
	model.add(Dropout(0.3))
	model.add(CuDNNLSTM(512))
	model.add(Dense(256))
	model.add(Dropout(0.3))
	model.add(Dense(n_vocab))
	model.add(Activation('softmax'))
	model.compile(loss='categorical_crossentropy', optimizer='rmsprop',metrics=["accuracy"])
	'''
	
	model = Sequential()
	
	model.add(CuDNNLSTM(512, input_shape=(network_input.shape[1], network_input.shape[2]), return_sequences=True))
	model.add(Dropout(0.2))
	model.add(BatchNormalization())
	
	model.add(CuDNNLSTM(256))
	model.add(Dropout(0.2))
	model.add(BatchNormalization())
	
	model.add(Dense(128, activation="relu"))
	model.add(Dropout(0.2))
	model.add(BatchNormalization())
	
	model.add(Dense(n_vocab))
	model.add(Activation('softmax'))
	model.compile(loss='categorical_crossentropy', optimizer='adam',metrics=["accuracy"])
	
	return model

# train the neural network
def train(model, network_input, network_output):
	
	# saver = CustomSaver()
	# history = model.fit(network_input, network_output, epochs=modelepochs, batch_size=50, callbacks=[tensorboard])
	history = model.fit(network_input, network_output, epochs=modelepochs, batch_size=50, callbacks=[CustomSaver()])

	# evaluate the model
	print("\nModel evaluation at the end of training")
	train_acc = model.evaluate(network_input, network_output, verbose=0)
	print(model.metrics_names)
	print(train_acc)
	
	# save trained model
	model.save(modeldirectory + modelfileprefix + '_' + str(modelepochs) + '.model')

	# delete temp notes file
	os.remove(notesfile)
	
if __name__ == '__main__':
	train_network()

This is the midi generation script “lstm_music_predict.py”


# based on code from https://github.com/Skuldur/Classical-Piano-Composer
# to use this script pass in;
# 1. path to notes file
# 2. path to model
# 3. path to midi output
# eg python -W ignore "C:\\LSTM Composer\\lstm_music_predict.py" "C:\\LSTM Composer\\Bach.notes" "C:\\LSTM Composer\\Bach.model" "C:\\LSTM Composer\\Bach.mid"

# ignore all info and warning messages
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2' 
import tensorflow as tf
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)

import pickle
import numpy
import sys
import keras.models

from music21 import instrument, note, stream, chord
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import Activation

# name of weights filename
notesfile = str(sys.argv[1])
modelfile = str(sys.argv[2])
midifile = str(sys.argv[3])

# generates a piano midi file
def generate():
	sys.stdout.write("Loading notes data file...\n\n")
	sys.stdout.flush()

	#load the notes used to train the model
	with open(notesfile, 'rb') as filepath:
		notes = pickle.load(filepath)

	sys.stdout.write("Getting pitch names...\n\n")
	sys.stdout.flush()

	# Get all pitch names
	pitchnames = sorted(set(item for item in notes))
	# Get all pitch names
	n_vocab = len(set(notes))

	sys.stdout.write("Preparing sequences...\n\n")
	sys.stdout.flush()

	network_input, normalized_input = prepare_sequences(notes, pitchnames, n_vocab)

	sys.stdout.write("Loading LSTM neural network model...\n\n")
	sys.stdout.flush()

	model = create_network(normalized_input, n_vocab)

	sys.stdout.write("Generating note sequence...\n\n")
	sys.stdout.flush()

	prediction_output = generate_notes(model, network_input, pitchnames, n_vocab)

	sys.stdout.write("\nCreating MIDI file...\n\n")
	sys.stdout.flush()

	create_midi(prediction_output)

# prepare the sequences used by the neural network
def prepare_sequences(notes, pitchnames, n_vocab):
	# map between notes and integers and back
	note_to_int = dict((note, number) for number, note in enumerate(pitchnames))

	sequence_length = 100
	network_input = []
	output = []
	for i in range(0, len(notes) - sequence_length, 1):
		sequence_in = notes[i:i + sequence_length]
		sequence_out = notes[i + sequence_length]
		network_input.append([note_to_int[char] for char in sequence_in])
		output.append(note_to_int[sequence_out])

	n_patterns = len(network_input)

	# reshape the input into a format compatible with LSTM layers
	normalized_input = numpy.reshape(network_input, (n_patterns, sequence_length, 1))
	# normalize input
	normalized_input = normalized_input / float(n_vocab)

	return (network_input, normalized_input)

# create the structure of the neural network
def create_network(network_input, n_vocab):
	model = keras.models.load_model(modelfile)
	return model

# generate notes from the neural network based on a sequence of notes
def generate_notes(model, network_input, pitchnames, n_vocab):
	# pick a random sequence from the input as a starting point for the prediction
	start = numpy.random.randint(0, len(network_input)-1)

	int_to_note = dict((number, note) for number, note in enumerate(pitchnames))

	pattern = network_input[start]
	prediction_output = []

	# generate 500 notes
	for note_index in range(500):
		prediction_input = numpy.reshape(pattern, (1, len(pattern), 1))
		prediction_input = prediction_input / float(n_vocab)

		prediction = model.predict(prediction_input, verbose=0)

		index = numpy.argmax(prediction)
		result = int_to_note[index]
		prediction_output.append(result)

		pattern.append(index)
		pattern = pattern[1:len(pattern)]

		if (note_index + 1) % 50 == 0:
			sys.stdout.write("{} out of 500 notes generated\n".format(note_index+1))
			sys.stdout.flush()

	return prediction_output

# convert the output from the prediction to notes and create a midi file from the notes
def create_midi(prediction_output):
	offset = 0
	output_notes = []

	# create note and chord objects based on the values generated by the model
	for pattern in prediction_output:
		# pattern is a chord
		if ('.' in pattern) or pattern.isdigit():
			notes_in_chord = pattern.split('.')
			notes = []
			for current_note in notes_in_chord:
				new_note = note.Note(int(current_note))
				new_note.storedInstrument = instrument.Piano()
				notes.append(new_note)
			new_chord = chord.Chord(notes)
			new_chord.offset = offset
			output_notes.append(new_chord)
		# pattern is a note
		else:
			new_note = note.Note(pattern)
			new_note.offset = offset
			new_note.storedInstrument = instrument.Piano()
			output_notes.append(new_note)

		# increase offset each iteration so that notes do not stack
		offset += 0.5

	midi_stream = stream.Stream(output_notes)

	midi_stream.write('midi', fp=midifile)

if __name__ == '__main__':
	generate()

Model File Sizes

One downside to including neural networks with Visions of Chaos is file size. If model generation was quicker I would just include a button so the end user could train the model(s) themselves. But seeing as some of these training sessions can take days to train multiple models that is not really practical. A better solution is for me to do all the training and testing work and only include the best working models. This also means that the end user just has to click a button and the trained models are then used for the music compositions.

Each of the models are 22 MB in size. Not a huge amount in today’s Internet, but when Visions of Chaos tends to very slowly increase in size over the years, having it suddenly go from a 70 MB to a 91 MB (due to the last cellular automata search model) install is a big jump. For now I only include 1 model with the main Visions of Chaos installer. I put a link to another 1 GB worth of models for users who want to use more. Alternatively they can use the script above to create their own models based on their own midi files.

What’s Next?

The LSTM composer as shown in this post is the most basic usage of neural networks to compose music.

I have found other neural network music composers that I will experiment with next so expect more music composition options to be included with Visions of Chaos in the future.

Jason.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s