Originally Published on Medium.com on July 26th, 2019
Data science and art make a natural pair; projects in both fields need to be approached with an exploratory workflow and an inquisitive mindset. The application of data science is now aiding the music industry at every step of the production journey; from composition to recording to consumption.
In this post I will talk about how deep learning can be used for music generation. We will use the Keras library in Python to develop an RNN (recurrent neural network) which can create techno music.
The Python Music21 toolkit allows us to work with MIDI (Musical Instrument Digital Interface) files. These store information on a note’s start, duration and pitch, but not the sound. A DJ then has the ability to select or change the instrument or synthesizer to generate the sound, using tools such as Logic and GarageBand.
The Data
For this project,I downloaded 38 freely available MIDI files from http://www.partnersinrhyme.com
Data Preparation
1. Convert notes to string format
Using the Music21 .PartitionByInstrument function, we are able break up the file into individual instruments, if there are multiple instruments. Each instrument contains a list of notes which are extracted and appended to a list. We can then save these into a file using pickle.
notes = []for file in glob.glob("midi_songs/*.mid"): midi = converter.parse(file)print("Parsing %s" % file)notes_to_parse = None try: s2 = instrument.partitionByInstrument(midi) notes_to_parse = s2.parts[0].recurse() except: notes_to_parse = midi.flat.notes for element in notes_to_parse: if isinstance(element, note.Note): notes.append(str(element.pitch)) elif isinstance(element, chord.Chord): notes.append('.'.join(str(n) for n in element.normalOrder))with open('data/notes', 'wb') as filepath: pickle.dump(notes, filepath)
2. Generate the input and output sequence
Neural networks work better with integers, so we can write a function which creates a dictionary which maps each note into an integer and then generate sequences which are the inputs and outputs of the model.
note_to_int = dict((note, number) for number, note in enumerate(pitchnames))
The sequence length is a hyperparameter that needs to be tuned. If it’s too short, the model will not learn enough to produce a song with a varied melody but increasing the sequence length also increases training time.
I set it to 30, which means that the model predicts the next note based on the previous 30 notes. You can play around with this number to see what works best with your data, this will vary with different musical genres.
3. Build the LSTM structure
For this model, we use a stacked LSTM. LSTMs are a type of recurrent neural network which use a gating mechanism to encode long term patterns, making them extremely useful for working with sequential data such as music. ‘Stacked’ means that the model is comprised of multiple hidden LSTM layers, where each one contains multiple memory cells. This creates a deeper, and more robust model.
One important thing to note here is that we set return sequences to True (it is False by default), as seen in the code below. This means that at each time step, a new character is generated as the output.
model = Sequential()
model.add(LSTM(
512,
input_shape=(network_input.shape[1], network_input.shape[2]),
return_sequences=True
))
model.add(Dropout(0.3))
model.add(LSTM(512, return_sequences=True))
model.add(Dropout(0.3))
model.add(LSTM(512))
model.add(Dense(256))
model.add(Dropout(0.3))
model.add(Dense(n_vocab))
model.add(Activation(‘softmax’))
model.compile(loss=’categorical_crossentropy’, optimizer=’rmsprop’)
The dropout layers are a form of regularisation. They reduce over fitting by simulating that there are a number of different neural networks. We then transform with softmax activation and use categorical cross entropy to calculate the loss which measures the difference between the real next note and our predicted next note.
Training the Model
Now it’s time to train model!
model.fit(network_input, network_output, epochs=30, batch_size=64, callbacks=callbacks_list)
Warning: the first time I trained it, it took 18 hours! You can set up checkpoints using the code below, so that at every epoch the nodes are saved and you can test out the different outputs and stop training when you’re happy with the output.
filepath = “weights-improvement-{epoch:02d}-{loss:.4f}-bigger.hdf5”
checkpoint = ModelCheckpoint(
filepath,
monitor=’loss’,
verbose=0,
save_best_only=True,
mode=’min’
)
callbacks_list = [checkpoint]
Music Generation
First we call the notes that we pickled earlier.
with open('data/notes', 'rb') as filepath:
notes = pickle.load(filepath)
We then prepare the sequences as we did in the data preparation section.
sequence_length = 100
network_input = []
output = []
for i in range(0, len(notes) - sequence_length, 1):
sequence_in = notes[i:i + sequence_length]
sequence_out = notes[i + sequence_length]
network_input.append([note_to_int[char] for char in sequence_in])
output.append(note_to_int[sequence_out])
Now we can generate new notes. The starting point is a random index of the input sequence, the model will then predict the next notes to generate the music, this is explained in the diagram below.

The code below shows how the model generates 500 notes. We create a numpy array of all of the predictions, and then index the note with the highest probability. This note is then the prediction output, which is appended to the input sequence in the next iteration as we saw above.
prediction_output = [] for note_index in range(500): prediction_input = numpy.reshape(pattern, (1, len(pattern), 1)) prediction_input = prediction_input / float(n_vocab)prediction = model.predict(prediction_input, verbose=0)index = numpy.argmax(prediction) result = int_to_note[index] prediction_output.append(result) pattern.append(index) pattern = pattern[1:len(pattern)]
From this, we convert the output back into notes and create a midi file.
for pattern in prediction_output: if ('.' in pattern) or pattern.isdigit(): notes_in_chord = pattern.split('.') notes = [] for current_note in notes_in_chord: new_note = note.Note(int(current_note)) new_note.storedInstrument = instrument.SnareDrum() notes.append(new_note) new_chord = chord.Chord(notes) new_chord.offset = offset output_notes.append(new_chord) else: new_note = note.Note(pattern) new_note.offset = offset new_note.storedInstrument = instrument.SnareDrum() output_notes.append(new_note)offset += 0.5
Check out this YouTube link to listen to the final output: https://youtu.be/BdB_hrUXrZU
The beats generated by this model, such as that in the link above, can then be used by DJs as transition tracks between their existing set list. I hope you enjoy the track and would love to hear any feedback! If you’d like to get in touch, feel free to reach out on LinkedIn. The full code can be viewed on my GitHub: https://github.com/leesurkis/generating-techno-music.
References:
This code was based on a tutorial for generating Piano music:https://towardsdatascience.com/how-to-generate-music-using-a-lstm-neural-network-in-keras-68786834d4c5
The Piano music full code can be viewed here:https://github.com/Skuldur/Classical-Piano-Composer