Advertisement
ERENARD63

Improvise a Jazz Solo with an LSTM Network

May 21st, 2018
795
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 12.57 KB | None | 0 0
  1. ##Improvise a Jazz Solo with an LSTM Network
  2.  
  3. from __future__ import print_function
  4. import IPython
  5. import sys
  6. from music21 import *
  7. import numpy as np
  8. from grammar import *
  9. from qa import *
  10. from preprocess import *
  11. from music_utils import *
  12. from data_utils import *
  13. from keras.models import load_model, Model
  14. from keras.layers import Dense, Activation, Dropout, Input, LSTM, Reshape, Lambda, RepeatVector
  15. from keras.initializers import glorot_uniform
  16. from keras.utils import to_categorical
  17. from keras.optimizers import Adam
  18. from keras import backend as K
  19.  
  20. IPython.display.Audio('./data/30s_seq.mp3')
  21.  
  22. ##INITIALISATION
  23. X, Y, n_values, indices_values = load_music_utils()
  24. print('shape of X:', X.shape)
  25. print('number of training examples:', X.shape[0])
  26. print('Tx (length of sequence):', X.shape[1])
  27. print('total # of unique values:', n_values)
  28. print('Shape of Y:', Y.shape)
  29.  
  30. #In this part you will build and train a model that will learn musical patterns. To do so, you will need to build a model that takes in X of shape (m,Tx,78)
  31. # and Y of shape (Ty,m,78)
  32. # . We will use an LSTM with 64 dimensional hidden states. Lets set n_a = 64.
  33.  
  34. n_a = 64
  35.  
  36. #Here's how you can create a Keras model with multiple inputs and outputs. If you're building an RNN where even at test time entire input sequence x⟨1⟩,x⟨2⟩,…,x⟨Tx⟩
  37. # were given in advance, for example if the inputs were words and the output was a label, then Keras has simple built-in functions to build the model. However, for sequence generation, at test time we don't know all the values of x⟨t⟩
  38. # in advance; instead we generate them one at a time using x⟨t⟩=y⟨t−1⟩
  39. #. So the code will be a bit more complicated, and you'll need to implement your own for-loop to iterate over the different time steps.
  40. #The function djmodel() will call the LSTM layer Tx
  41. # times using a for-loop, and it is important that all Tx
  42. # copies have the same weights. I.e., it should not re-initiaiize the weights every time---the Tx
  43. # steps should have shared weights. The key steps for implementing layers with shareable weights in Keras are:
  44. #Define the layer objects (we will use global variables for this).
  45. #Call these objects when propagating the input.
  46. #We have defined the layers objects you need as global variables. Please run the next cell to create them. Please check the Keras documentation to make sure you understand what these layers are: Reshape(), LSTM(), Dense().
  47.  
  48. reshapor = Reshape((1, 78)) # Used in Step 2.B of djmodel(), below
  49. LSTM_cell = LSTM(n_a, return_state = True) # Used in Step 2.C
  50. densor = Dense(n_values, activation='softmax') # Used in Step 2.D
  51.  
  52.  
  53. # GRADED FUNCTION: djmodel
  54.  
  55. def djmodel(Tx, n_a, n_values):
  56. """
  57. Implement the model
  58.  
  59. Arguments:
  60. Tx -- length of the sequence in a corpus
  61. n_a -- the number of activations used in our model
  62. n_values -- number of unique values in the music data
  63.  
  64. Returns:
  65. model -- a keras model with the
  66. """
  67.  
  68. # Define the input of your model with a shape
  69. X = Input(shape=(Tx, n_values))
  70.  
  71. # Define s0, initial hidden state for the decoder LSTM
  72. a0 = Input(shape=(n_a,), name='a0')
  73. c0 = Input(shape=(n_a,), name='c0')
  74. a = a0
  75. c = c0
  76.  
  77. ### START CODE HERE ###
  78. # Step 1: Create empty list to append the outputs while you iterate (≈1 line)
  79. outputs = []
  80.  
  81. # Step 2: Loop
  82. for t in range(Tx):
  83.  
  84. # Step 2.A: select the "t"th time step vector from X.
  85. x = Lambda(lambda x: X[:,t,:])(X)
  86. # Step 2.B: Use reshapor to reshape x to be (1, n_values) (≈1 line)
  87. x = reshapor(x)
  88. # Step 2.C: Perform one step of the LSTM_cell
  89. a, _, c = LSTM_cell(x, initial_state=[a, c])
  90. # Step 2.D: Apply densor to the hidden state output of LSTM_Cell
  91. out = densor(a)
  92. # Step 2.E: add the output to "outputs"
  93. outputs.append(out)
  94.  
  95. # Step 3: Create model instance
  96. model = Model([X,a0,c0],outputs)
  97.  
  98. ### END CODE HERE ###
  99.  
  100. return model
  101.  
  102. #Run the following cell to define your model. We will use Tx=30, n_a=64 (the dimension of the LSTM activations), and n_values=78.
  103.  
  104. model = djmodel(Tx = 30 , n_a = 64, n_values = 78)
  105.  
  106. #You now need to compile your model to be trained. We will Adam and a categorical cross-entropy loss.
  107.  
  108. opt = Adam(lr=0.01, beta_1=0.9, beta_2=0.999, decay=0.01)
  109.  
  110. #Finally, lets initialize a0 and c0 for the LSTM's initial state to be zero.
  111. m = 60
  112. a0 = np.zeros((m, n_a))
  113. c0 = np.zeros((m, n_a))
  114.  
  115. #Lets now fit the model! We will turn Y to a list before doing so, since the cost function expects Y to be provided in this format (one list item per time-step). So list(Y) is a list with 30 items, where each of the list items is of shape (60,78). Lets train for 100 epochs.
  116.  
  117. model.fit([X, a0, c0], list(Y), epochs=100)
  118.  
  119. model.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])
  120.  
  121.  
  122. # - Generating music
  123. #You now have a trained model which has learned the patterns of the jazz soloist. Lets now use this model to synthesize new music.
  124. #At each step of sampling, you will take as input the activation a and cell state c from the previous state of the LSTM, forward propagate by one step, and get a new output activation as well as cell state. The new activation a can then be used to generate the output, using densor as before.
  125. #To start off the model, we will initialize x0 as well as the LSTM activation and and cell value a0 and c0 to be zeros.
  126. #Exercise: Implement the function below to sample a sequence of musical values. Here are some of the key steps you'll need to implement inside the for-loop that generates the Ty
  127. # output characters:Step 2.A: Use LSTM_Cell, which inputs the previous step's c and a to generate the current step's c and a.
  128. #Step 2.B: Use densor (defined previously) to compute a softmax on a to get the output for the current step.
  129. #Step 2.C: Save the output you have just generated by appending it to outputs.
  130. #Step 2.D: Sample x to the be "out"'s one-hot version (the prediction) so that you can pass it to the next LSTM's step. We have already provided this line of code, which uses a Lambda function.
  131. #x = Lambda(one_hot)(out)
  132. #[Minor technical note: Rather than sampling a value at random according to the probabilities in out, this line of code actually chooses the single most likely note at each step using an argmax.]
  133.  
  134. # GRADED FUNCTION: music_inference_model
  135.  
  136. def music_inference_model(LSTM_cell, densor, n_values = 78, n_a = 64, Ty = 100):
  137. """
  138. Uses the trained "LSTM_cell" and "densor" from model() to generate a sequence of values.
  139.  
  140. Arguments:
  141. LSTM_cell -- the trained "LSTM_cell" from model(), Keras layer object
  142. densor -- the trained "densor" from model(), Keras layer object
  143. n_values -- integer, umber of unique values
  144. n_a -- number of units in the LSTM_cell
  145. Ty -- integer, number of time steps to generate
  146.  
  147. Returns:
  148. inference_model -- Keras model instance
  149. """
  150.  
  151. # Define the input of your model with a shape
  152. x0 = Input(shape=(1, n_values))
  153.  
  154. # Define s0, initial hidden state for the decoder LSTM
  155. a0 = Input(shape=(n_a,), name='a0')
  156. c0 = Input(shape=(n_a,), name='c0')
  157. a = a0
  158. c = c0
  159. x = x0
  160.  
  161. ### START CODE HERE ###
  162. # Step 1: Create an empty list of "outputs" to later store your predicted values (≈1 line)
  163. outputs = []
  164.  
  165. # Step 2: Loop over Ty and generate a value at every time step
  166. for t in range(Ty):
  167.  
  168. # Step 2.A: Perform one step of LSTM_cell (≈1 line)
  169. a, _, c = LSTM_cell(x, initial_state=[a, c])
  170.  
  171. # Step 2.B: Apply Dense layer to the hidden state output of the LSTM_cell (≈1 line)
  172. out = densor(a)
  173.  
  174. # Step 2.C: Append the prediction "out" to "outputs". out.shape = (None, 78) (≈1 line)
  175. outputs.append(out)
  176.  
  177. # Step 2.D: Select the next value according to "out", and set "x" to be the one-hot representation of the
  178. # selected value, which will be passed as the input to LSTM_cell on the next step. We have provided
  179. # the line of code you need to do this.
  180. x = Lambda(one_hot)(out)
  181.  
  182. # Step 3: Create model instance with the correct "inputs" and "outputs" (≈1 line)
  183. inference_model = Model([x0,a0,c0],outputs)
  184.  
  185. ### END CODE HERE ###
  186.  
  187. return inference_model
  188.  
  189. #Run the cell below to define your inference model. This model is hard coded to generate 50 values
  190.  
  191. inference_model = music_inference_model(LSTM_cell, densor, n_values = 78, n_a = 64, Ty = 50)
  192.  
  193. #Finally, this creates the zero-valued vectors you will use to initialize x and the LSTM state variables a and c.
  194. x_initializer = np.zeros((1, 1, 78))
  195. a_initializer = np.zeros((1, n_a))
  196. c_initializer = np.zeros((1, n_a))
  197.  
  198.  
  199. #Implement predict_and_sample(). This function takes many arguments including the inputs [x_initializer, a_initializer, c_initializer]. In order to predict the output corresponding to this input, you will need to carry-out 3 steps:
  200. #Use your inference model to predict an output given your set of inputs. The output pred should be a list of length Ty
  201. # where each element is a numpy-array of shape (1, n_values).
  202. #Convert pred into a numpy array of Ty
  203. # indices. Each index corresponds is computed by taking the argmax of an element of the pred list. Hint.
  204. # Convert the indices into their one-hot vector representations.
  205.  
  206. # GRADED FUNCTION: predict_and_sample
  207.  
  208. def predict_and_sample(inference_model, x_initializer = x_initializer, a_initializer = a_initializer,
  209. c_initializer = c_initializer):
  210. """
  211. Predicts the next value of values using the inference model.
  212.  
  213. Arguments:
  214. inference_model -- Keras model instance for inference time
  215. x_initializer -- numpy array of shape (1, 1, 78), one-hot vector initializing the values generation
  216. a_initializer -- numpy array of shape (1, n_a), initializing the hidden state of the LSTM_cell
  217. c_initializer -- numpy array of shape (1, n_a), initializing the cell state of the LSTM_cel
  218.  
  219. Returns:
  220. results -- numpy-array of shape (Ty, 78), matrix of one-hot vectors representing the values generated
  221. indices -- numpy-array of shape (Ty, 1), matrix of indices representing the values generated
  222. """
  223.  
  224. ### START CODE HERE ###
  225. # Step 1: Use your inference model to predict an output sequence given x_initializer, a_initializer and c_initializer.
  226. pred = inference_model.predict([x_initializer, a_initializer, c_initializer])
  227. # Step 2: Convert "pred" into an np.array() of indices with the maximum probabilities
  228. indices = np.argmax(pred,2)
  229. # Step 3: Convert indices to one-hot vectors, the shape of the results should be (1, )
  230. results = to_categorical(indices)
  231. ### END CODE HERE ###
  232.  
  233. return results, indices
  234.  
  235.  
  236. # 3.3 - Generate music
  237. # Finally, you are ready to generate music. Your RNN generates a sequence of values. The following code generates music by first calling your predict_and_sample() function. These values are then post-processed into musical chords (meaning that multiple values or notes can be played at the same time).
  238. # Most computational music algorithms use some post-processing because it is difficult to generate music that sounds good without such post-processing. The post-processing does things such as clean up the generated audio by making sure the same sound is not repeated too many times, that two successive notes are not too far from each other in pitch, and so on. One could argue that a lot of these post-processing steps are hacks; also, a lot the music generation literature has also focused on hand-crafting post-processors, and a lot of the output quality depends on the quality of the post-processing and not just the quality of the RNN. But this post-processing does make a huge difference, so lets use it in our implementation as well.
  239.  
  240. out_stream = generate_music(inference_model)
  241.  
  242.  
  243. #References
  244. #The ideas presented in this notebook came primarily from three computational music papers cited below. The implementation here also took significant inspiration and used many components from Ji-Sung Kim's github repository.
  245. #Ji-Sung Kim, 2016, deepjazz
  246. #Jon Gillick, Kevin Tang and Robert Keller, 2009. Learning Jazz Grammars
  247. #Robert Keller and David Morrison, 2007, A Grammatical Approach to Automatic Improvisation
  248. #François Pachet, 1999, Surprising Harmonies
  249. #We're also grateful to François Germain for valuable feedback.
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement