Today even look at three papers, not very detailed to see, nor concrete to achieve, but probably understand some new ideas. These three papers, an overview of the Decoder-encoder model, an extension of the model, the first proposed attention mechanism, the last one elaborated the LSTM and GRU work mechanism. After reading, I have a deeper understanding of the field of machine translation, as well as the application of lstm.
Let's talk about the principle of generating melody.
In the previous music theory section, we knew the chord progression and the direction of the problem, there are a lot of chord combinations can play a good role. These good chords are connected together and the sequencer is made. Pop chords tend to start with the root chord and eventually return to the root chord. This makes it possible to create a closed loop of four bars or eight bars. Most tunes use this tension and the principle of resolution. Also know that with experience, you can know that the commonly used chords are emotional (http://www.wenkuxiazai.com/doc/614fe01b52d380eb62946d2a.html). The progression of some chords, In addition to complying with some of the rules that must be followed (such as the location of the main chord), other trends can be linked to emotions. To give an example,
1, C-am-f-g (Reference track: Tsai Chin-as you are gentle)
2, C-g-am-f (Reference track: beyond-the main song)
These two examples are very bright.
1, Am-f-c-g (Reference track: Huang Yida-I understand, set me free)
2, Am-c-g-am (Reference track: Beyond-Gray track main song)
These two examples are more gentle and gentle. The main reason for the difference between the two sides is the big minor problem.
Source: (1) http://www.wenkuxiazai.com/doc/614fe01b52d380eb62946d2a.html
(2) https://www.douban.com/note/345221364/
That is to say, we first set a sentiment tone, and then choose a chord direction, and even a plurality of chord direction, all reasonable.
However, this method is flawed, that is, the big and minor only two selected (in the short music clip), like the big turn minor to complete the emotional change is very difficult things. There is an example of C becoming a C7 chord in the juvenile brocade, so the transpose should be difficult to generate automatically.
Next topic, how to fill in the melody according to the chord?
Word: Take the chord tone as the tonic. You can add some notes to the highest sound in your chord progression to create your melody or bassline. This created melody follows the footsteps of the chords, so it is easy to be catchy and remembered. Also according to the law of tension, when each chord is played, the ear will always expect the next chord and the highest tone as the tonic. Remember, only when the "tension" chord is resolved does the brain feel satisfied before it is considered good.
=========================================================
The next step is how to capture these chord trends through deep learning.
First we extract a number of chords in a lot of songs in the direction of 8 or 16 as a group, as a piece of data, so as to form a closed paragraph, and can also demonstrate the sense of change. Then we manually specify what the emotional expression of this piece of music looks like.
After that, we use GRU (a variant of lstm), based on the previously defined chords, to predict the resulting chords, thus generating a complete chord orientation. As a result of full learning, this chord trend is likely to be in line with the chord trend requirements.
If we want to consider the accuracy, we can artificially add some classical chords as attention, but with my current level, making this model should not be a simple thing.
=========================================================
Talk about the whole process of the project:
1, enter a picture.
2, through the cognitive service for picture tone, entity recognition.
3, transformed into a vector, the emotional classification by convolution neural network.
4, the emotion vector input lstm, the chord generation.
5, according to the chord, and then through a different network to generate the main melody (this piece is not ready to do exactly how to do).
The preparation of the dataset is to prepare two datasets (and possibly the third one later), labeling the picture-emotion, emotion-chord orientation, so that the network can be trained well.
The next blog post will show the preparation of datasets and LSTM Network setup.
"Music sequence generation for Python image features" How to generate melodies (outlines) and the entire process of the project