NMF (non-negative matrix decomposition), because its decomposition of the matrix is non-negative, in some practical problems have a very good explanation, so the use is very wide. Here, I would like to introduce the application of NMF in the multi-part music. To translate the paper is the use of NMF transcription multi-part music of the mountain, simple and easy to understand how to use the NMF to the piano score translation, it is worth a look.
Summary
In this paper, we propose a new method for analyzing polyphonic music fragments (such as piano notes) composed of notes in fixed harmonic format. Because the note structure is fixed, we can transform the music into a linear transformation, using nonnegative matrix decomposition to estimate the harmonic structure of the notes and where each note appears. This method is very simple and requires no prior information, and the estimation and identification can be done by using the observed values.
1. Introduction
Multi-part music transcription is difficult to solve over a long period of time. Although many systems based on prior knowledge have obtained good recognition results, they are very complex. In this paper, we propose a lightweight approach, which is similar to scenario analysis and is data-driven, but does not take advantage of any prior knowledge of the structure of music. The basic starting point of our approach is to compress redundant data [1], the idea of redundant compression has developed rapidly in the past few years, and has achieved very good results in many applications. Recent redundancy compression has been used to solve the multi-part music transcription problem [2][3] and has yielded very encouraging results. In this article, we will solve the same problem from different angles. In fact, the transcription of multi-part music can be accomplished only by the non-negative matrix decomposition of the music spectrum.
2. Non-negative decomposition 2.1 definition and cost function
Inspired by Paatero's positive matrix decomposition [5], Lee and Seung first proposed nonnegative matrix decomposition (NMF) [1]. Given a nonnegative matrix of size m*n, the goal of X,NMF is to look for two nonnegative matrices W (m*r) and H (r*n) and then use the product of W and H to approximate the X (where R is less than M and N) and minimize the error. Define the cost function as follows:
Wherein, represents the Frobenius norm. In addition, we can also define additional cost functions:
where the Hadamard product (the point multiplication of a matrix element) is represented, and the same is the point division of the matrix element. Both of these cost functions are met, at which cost equals 0. The functions in Equation 2 are somewhat similar to Kullback-liebler convergence. The formulas for calculating W and H are in Appendix A.
The NMF can also be interpreted as a descending decomposition, so that there will be, which is a nonnegative matrix of size r*m, + represents the inverse of the Moore-penros matrix. The following formula allows us to associate the NMF with PCA, ICA, and so on. In fact, when the cost function is Equation 1 o'clock, the NMF is equivalent to a rotation (ratation) of the PCA (PCA uses the same cost function, but the constraints are orthogonal constraints). Based on this fact, we suspect that the NMF is equivalent to the non-negative ICA decomposition satisfying the Plumbley condition [6].
In more popular terms, theNMF is actually using the line of the matrix H to characterize the rows of the matrix X , The columns of the matrix W to characterize the matrix the column of X. Parameter R is used to measure the ability of characterization. if r=m, it can be decomposed precisely, at which point W and H do not provide more useful information. When we lower R, the elements of W and H begin to describe the main element of X. If we choose a reasonable r, we can extract the principal elements of X. A simple example is presented in the next section.
NMF of 2.2 amplitude spectra
Given an audio signal:
where g (*) is a gate function of a period, is any scalar, and is much smaller. Then calculate the L-length amplitude spectrum of this audio X (t) =| | DFT ([s (t) ... s (t + L)]) | |. Each column of x (t) can be stacked together to form a nonnegative matrix X (m*n), where N represents the total number of frame signals, and m=l/2+1 is the result of each frame's transformation, representing the frequency. Figure 1 is a matrix X of the amplitude spectrum.
The following is an NMF decomposition of x. Before we do this, we'll introduce some of the features of X. In addition to a few obvious lines of energy, the rest of the energy is very low, so you can see some kind of fixed pattern. In other explanations, the amplitude spectrum above is a very high degree of redundancy (a condition that a compression engineer would most expect to see). The NMF decomposition of the matrix gives us insight into how redundancy is eliminated using compact matrices. We use the cost function as the formula 1,r=2, the result of decomposition is Figure 2.
Checking the results of the decomposition will allow us to get some important information. The two lines of the H matrix contain two time series depicting the horizontal axis of x (each line represents the strength and weakness of each frequency, as opposed to the bright line of X). Each column of the W matrix depicts the longitudinal axis of x (each column represents a frequency value, which corresponds to two frequencies in X). The x matrix is restored by the nth row of the Group H and the nth Line of W (W is horizontally drawn in Figure 2).
In a later chapter, we will further validate the idea by turning X into the amplitude spectrum that contains the music signal. As we can see, when x is the amplitude spectrum of the actual music, the decomposed W and h matrices contain the spectrum of notes appearing in the music and the time information of the notes. The above example also tells us that the NMF can also handle the situation where the notes overlap (the upper and lower spectrums in Figure 1 overlap), and the examples of actual music are described below.
3. Decomposition results of the actual music amplitude spectrum
In this section, we will show the results of the NMF decomposition of the piano and the problems that may be encountered. We first study the non-overlapping notes, then study the overlapping notes, and finally the whole section of the multi-part piano piece decomposition. All the piano pieces come from the Keith Jarrett played in G minor Bach Fugue xvi[7], sampling rate 44100Hz, mono.
3.1 Non-overlapping notes
The first example first looks at the fugue fragments with only a few overlapping notes. There is no obvious overlap between the following fragment notes (the continuation is a weaker overlap).
This piece of music contains four different notes of five events. First, the amplitude spectrum of the signal is obtained, and then the equation 1 is chosen as the cost function to r=4 the NMF decomposition. The length of the DfT is 4096, Gahannin window. Decomposition result 3.
It can be seen clearly that each row of the H matrix represents the time information of four notes (at which moment, the strength changes), and each column of the W matrix is their corresponding spectrum. The lowest effective frequencies in each column are 193.7hz,301.4hz,204.5hz and 322.9Hz, respectively, corresponding to the notes and. The deviation from the base frequency of the actual note is due to insufficient frequency resolution of the DFT. In general, a single peak does not recognize a specific note, but a combination of the base frequency and harmonics can determine which note each row of the W matrix represents.
After doing the above experiment, it is easy to think of if R is not equal to the actual number of notes. When R is less than the number of actual notes, we do not have enough ability to distinguish these notes, only to partially analyze the spectrum above. On the other hand, we can also choose a very large r, at this time depending on the choice of algorithm will have different results. The NMF using equation 1 tends to disperse the energy of the dominant note into the different ranks of W and H (making H more average). To avoid this effect, we can modify the cost function to:
This formula not only ensures that the results we decompose can approximate the original signal, but also avoids generating a very low energy H-matrix. Equation 4 is similar to the cost function described in hoyer[8]. Parameters are used to adjust the balance between good signal reconstruction and low energy. Equation 4 Forces the NMF to avoid spreading the energy of the note into multiple lines of H, thus increasing the energy value of a single line, and other lower energy lines become noise. However, using Equation 4 requires a good value to be determined.
If we are not sure of the value of R, the easiest way is to choose Equation 2 as the cost function. At this point, the extra notes calculated by the H matrix is a series of small peaks, the corresponding W matrix column is a very low energy spectrum (can be considered as silent or ambient noise). The results of the four notes of the above using r=5 for the NMF decomposition obtained 4.
In general, the identification of non-note components is very easy and is not a problem in our analysis.
3.2 Overlapping notes
The second part of the Fugue, which contains overlapping notes, begins to be analyzed:
The score above has 10 events with seven different notes. The second countdown time has two overlapping notes (and). The r=7 decomposition of the above score using equation 2, the result of 5.
It is disappointing to note that although we have successfully decomposed, we have obtained a non-note composition, and two overlapping notes are also used as an ingredient. It is easy to explain that, as previously stated, the NMF is only decomposed according to the current signal and does not rely on prior knowledge, so each independent event is treated as a new ingredient. So in particular, we emphasize that the purpose of this approach is not to extract notes, but to extract independent events. Since we do not provide enough information to extract the notes, the algorithm can only treat independent events as notes.
All notes can be identified as independent events by providing more data. The first two parts of the piano together we have a separate note, which is enough for us to distinguish two overlapping notes, 6 shows. As you can see, by adding a separate note, we can transcribe perfectly.
The above process can also be extended well when the piano is getting longer. Figure 7 shows the results of the decomposition of the first 7 segments of the Fugue when r=27. After decomposition and convergence, we removed two components that did not contain notes. Comparing the results with the Fugue score, we found that the recognition results have shown the entire score well. One of the few errors that occurs is that it will be treated as an ingredient, and two is not tracked (possibly because it appears only once).
4. Conclusion
In this paper, we propose a multi-part audio transcription method based on NMF decomposition. The new approach does not require very strong computational power and complex algorithmic design to achieve very good results. However, one drawback of this approach is that the notes in the music clip must have a fixed harmonic format. Future work will try to solve this problem with other more expressive abilities.
5. References
[1] Barlow, h.b. "Sensory mechanisms, thereduction of redundancy, and intelligence". In Symposium on the Mechanizationof thought Processes. National Physical Laboratory Symposium No. 10. (1959)
[2] Smaragdis, P. "Redundancy Reduction forcomputational Audition, a unifying approach". Ph.D. dissertation, MAS Department,massachusetts Institute of Technology, (2001).
[3] Plumbley, M.D, S.A. Abdallah, J.p.bello, M.E. Davies, G.monti and M.B. "Automatic Music transcription andaudio Source separation". In Cybernetics and Systems, (6), pp 603-627, (2002).
[4] Lee, D.D and H.s Seung, "Learning theparts of objects by non-negative matrix factorization". In Nature 401,pp788-791, (1999).
[5] Paatero, P. "Least squares formulationof robust nonnegative factor analysis". In Chemometrics and Intelligentlaboratory Systems Notoginseng, Pp23-35, (1997).
[6] Plumbley, M.D. "Conditions fornon-negative Independent Component Analysis". In IEEE Signal Processingletters, 9 (6), pp177-180, (2002).
[7] Jarrett, K. J.s Bach, Daswohltemperierte Klavier, Buch I, ECM Records, CD 2, Track 8 (1988).
[8] Hoyer, P. "Non-negative Sparse coding", neural Networks for Signal processing XII, Martigny, Switzerland, (2002).
"Non-negative Matrix factorization for polyphonic Music transcription" translations