Summary: Kirell Benzi, Vassilis Kalofolias, Xavier Bresson and Pierre vandergheynst Signal processing Laboratory 2 (LTS2), Swis S Federal Institute of Technology (EPFL)
Kirell Benzi,
Vassilis Kalofolias,
Xavier Bresson and Pierre vandergheynst
Signal processing Laboratory 2 (LTS2),
Swiss Federal Institute of Technology (EPFL)
Code See: HTTPS://GITHUB.COM/HXSYLZPF/RECOG
Summary
This paper formally formalized a new song recommendation algorithm, which will consider the problem of song recommendation to matrix completion, and pass based on nonnegative matrix decomposition (non-negative matrix factorization, NMF), combined with a content-based filtering approach to the full variation of the combined graph (total variation, TV) on the graph, solves this problem. The related graphs encode the adjacency information of a song and the similarity of songs by using the combination of rich information such as audio, metadata, and social features. We prove that this hybrid recommendation system, which incorporates several well-known methods, has a broad application foreground and is more effective than the fusion algorithm. By experimenting on real data, we have confirmed that our models can be recommended for songs based only on the information of low-rank matrices or the combination of graph-based information and both.
Keywords: recommendation system, graph, nonnegative matrix decomposition, full variation, audio features
An introduction
Tasks such as recommending movies on Netflix, recommending friends on Facebook, or recommending jobs on LinkedIn have attracted more and more attention over the past few years. The famous low-rank matrix decomposition algorithm that most of the Netflix prize winners prefer requires explicit user ratings as input. Some other similar approach is to address users ' ambiguous feedback problems by reflecting user preferences for items by manipulating them. Specific to the song and song list recommendations, but also has a variety of methods, including a simple content-based approach, there are a variety of mixed models. Recently, the regularization of graphs has been proposed to improve the effect of matrix complement algorithm.
The main contributions of this paper are as follows:
L? A hybrid sound mixing system is designed and implemented, which integrates collaborative filtering and content filtering in mathematics.
L? In this paper, a new Perspective (TV) is introduced, and its effect is better than that of widely used Tikhonov in the background of recommender system.
L? A well-defined iterative optimization model based on near-end splitting method.
A large number of experiments have proved that the recommendation system we put forward has a good performance.
The song recommendation algorithm in this paper
1. Song recommendation algorithm
Let's say we have n playlists, each containing part of the M song. We define the matrix C∈{0,1}NXM, and the element Cij in the matrix is 1, which means that song J is included in the list I, otherwise 0. We then define a weight matrix Ω∈{0,1}NXM, when the input Cij may be 1 o'clock, ωij=1, otherwise equal to a very small value ε (we use the ε=0.1). The idea of ambiguous feedback problem is applied here. An element of 0 in matrix C does not mean that the song has nothing to do with the song, but is more likely to be irrelevant.
The goal of the training phase is to find an approximate low-rank representation that makes the ab≈c, where a∈r+nxr,b∈r+rxm are non-negative, and R is small. This problem is called non-negative matrix decomposition (NMF) and has aroused a wide range of concerns. In contrast to other matrix decomposition methods, NMF is able to learn the various parts of an object (the song list in this article) because it only uses additive factors. The disadvantage of the NMF method is that it is np-hard. Therefore, it is important to find a local minimum point for regularization. In our question, we use graphs of songs and songs to determine factors A and B. The formula for our model is calculated as follows:
(1)
where (°) represents the point-level multiplication operator, θa,θb∈r+. We use a weighted KL divergence as a measure of distance between C and AB, and studies have shown that for different NMF settings, this is more accurate than the Frobenius paradigm. The second item in the formula is the full variation of the line of a in the song chart, so punishing it increases the piecewise constant signal. The third item in the formula is similar to the second item, which is the full variation of the column of B. Finally, the model we presented uses the reference [9, 16] and extends it to the graph using the TV half-paradigm.
1.1 Regularization of graphs using full variational points
In our NMF-based recommender system, each song I is projected into a low-dimensional space by the line I Ai in matrix A. In order to learn the low-rank representation of a single Ai, we define the Ωaii similarity between the songs by the low rank representation of the song list. We can deduce from the definition of the TV regularization item that
‖a‖tva=∑i∑i ' ~ iωaii ' ‖ai-ai ' ‖1 so when the two songs I and I ' are similar, then they are connected in the diagram, and the weights of the sides connecting the two songs are Ωaii ' Very large (here Ωaii ' ≈ 1). In addition, the corresponding low-dimensional vector representation (Ai,ai ') between the distance is too large to be punished, which makes in the low-dimensional space, (Ai,ai ') distance will remain relatively close. Similarly, each song J is represented in a low-dimensional space by a list of Bj in Matrix B. If the two songs (J,j ') are close (Ωbii ' ′≈1), then the regularization ‖b‖ (BJ,BJ ') and the graph follow the above rules.
The idea of reference [10] is similar to this article, by Tikhonov regularization to introduce the information of the graph into the model, for example through Dirichlet energy 1/2∑i∑i ' ~ iωaii ' ‖ai-ai ' ‖22. However, this method facilitates a smooth change between the columns of a, while the penalty for the TV item of the method diagram in this paper facilitates piecewise constant signals with potential mutation edges between the column AI and AI. This is useful for tasks that need to be found in multiple categories, such as clustering, or similar songs that are involved in the recommended systems in this article belong to different catalogs.
We analyze in detail in the 4th section, the use of songs and songs can significantly improve the recommendations, and TV items are better than Tikhonov.
1.2 Primal-Dual optimization
For matrices A and B, the optimization problem is global and non-convex, but each convex. A common method is to fix A to optimize B, and then fix B to optimize a, repeatedly until convergence. Here we describe the optimization method as an example of fixing A and optimizing B. The same method can be applied to a when fixed B. We re-write the above questions in the following form:
F (AB) + G (KBB) (2)
which
F (AB) =kl (ω° (c‖ab)) = (a???? IJCIJ (log) +ωij (AB) IJ (3)
(4)
Where KB∈RNEXM is the gradient operator of the graph, NE is the number of edges in Figure B. The conjugate functions f and g of the function F and G. are equivalent to the saddle point problem:
(5)
Which Y1∈rnxm,y2∈rnexr. We define the most recent and time interval σ1,σ2,τ1,τ2:
(6)
The way to iterate is when k≥0:
Y1K+1 = proxσ1F?(Y1K+ σ1ABK) (7)
y2k+1 = proxσ2g? (Y2K+Σ2KBBK) (8)
Bk+1= (bk-τ1aty1k+1-(ktby2k+1) T) + (9)
Where ProX is the nearest operator, (?) + = max (?, 0). In our question, we chose the standard Arrow-hurwicz time interval σ1 =τ1 = 1?‖a‖,σ2 =τ2 = 1?‖k‖, where ‖?‖ is the operator norm.
The most recent solution is:
(10)
Where shrink is the soft reduction operator. Note that the same algorithm can also be applied to Tikhonov regularization, for example, by changing the first expression above to proxσ2g* (Y) =y???? °±?? ˉt?? ¥?°?a?? Kbba?? 1????? ¢?? OG (KB B) =‖kbb‖22. Regularization in the formula (10) uses a symmetric deformation of the KL dispersion, but unlike the method we use, the Tikhonov regularization does not have an analytic solution. So the objective function is not as satisfying as ours. An effective primitive-duality optimization method. We reserve this asymmetric KL model and call it GNMF to compare TV with Tikhonov regularization.
1.3 Recommended Songs
After we have learned the Matrix A and B through the formula (1), we want to know some songs when Cin (1-1), can recommend a new song Crec. We also want to make real-time recommendations, so we define a quick recommendation
Given some song Cin, we first study a good expression in the low-rank space of a song list by solving a regular minimum square problem: ain=arg min a∈r1xr| | Ωin. (Cin-ab) | | 22+ε| | a| | 22. Its analytic Solution ain= (Btωinb+εi)-1 (btωincin) is easier to calculate when R is very small (we make ε= 0.01).
A song with a similar representation to a given song list is also useful for our recommended songs list. So in the low-dimensional space, we use weighted and arec=σni=1ωiai/σni=1ωi to represent the recommended song list. Here Weight ωi=e-| | ain-ai| | 22/σ2, depending on the distance from the representation of the other song Ain, and Σ=mean ({| | ain-ai| | 2}ni=1)/4. The low-rank representation of the final recommended song list is:
CREC=ARECB (11)
Here Crec is not two yuan, but a continuous value, indicating the ranking of the song.
2. Pictures of songs and song lists
2.1 Chart of the song list
The chart of the song list contains information about the similarity of the pairs of the song studios. The node of the graph is the song single, the weight of the edge represents the distance between two songs, when the weight is very large (ΩAII ' ≈ 1), two songs have a very high similarity. In our model, the calculation of the weights of edges in a song chart is not only related to external information such as metadata, but also to internal information, such as song information in the song list. We use the predefined Art of the Mix song category to annotate the user's song list. The calculation of the weights of the edges in the chart of the song is defined as
ωAii’=Υ1δcat{i}=cat{i’}+Υ2simcos(Ci,Ci’)
Where cat represents the label of the song, CI is the line I of Matrix C simcos (p,q) =
ptq/| | p| |.| | q| | is the cosine similarity distance between the song vectors of the two song list. The cosine similarity of two songs is similar in proportion to the root mean of the sum of the two song single lengths. The two positive parameters υ1 and υ2 meet υ1 +υ2 = 1, which determines the relative importance of the similarity between the song's label and the level of the song element. To control the edge probability density of each cluster and to make our models more flexible, we retain a subset of 20% of the edges between the nodes of the same cluster. In the experiment we find that υ2 = 0.3 can achieve better results. The effect of a single graph is measured by dividing the graph by using the standard Louvain method. The number of blocks is automatically given by a tree diagram of the modular coefficients that are cut in the largest area of the module. The modular coefficients of the graphs used in section 4th are 0.63 when using only the cosine similarity (υ2 = 0). If we add information to the metadata, connect 20% of all the song pairs in each category and make υ2 = 0.3, the modularity factor increases to 0.82.
2.2 Picture of the song
The second figure used in our model is the similarity graph of the song. The graph of songs is composed of echonest features extracted from audio signals combined with meta-data information and social information of audio tracks. Table 2-1 shows the features used to build a song map.
To improve the quality of our audio features, we trained a large interval nearest neighbor model (Large Margin Nearest neighbors,lmnn) using the song types extracted from the LASTFM related tags. To extract the actual music type, we used these tags to be weighted by their popularity (according to LASTFM) Levenshtein distance and the type of music defined in the ID3 tag. Finally, we use K nearest neighbor (K=5) to construct a graph of the song, wherein, for J's K nearest neighbor, a song J ', the weight of the edge between the two songs J and J ' Ωbjj ' =exp (-| | Xj-xj ' | | 1/σ), the parameter σ is a scale parameter that represents the average distance between K neighbors. The resulting graph has a high modularity factor (0.64) and an unsupervised accuracy rate of about 65% using K-NN.
3. Experimental results
In this section, we compare our model with 3 other recommended systems by experimenting on a real data set. Our test datasets are extracted from the Art of the Mix corpus built by McFee. We have previously extracted the above features in this database. Evaluating a music recommendation system is a well-known problem. In this article, we use a classic method of evaluating the model of the recommended system using indirect feedback, Mean percentage Ranking (MPR), and the accuracy of the song Order classification, that is, the percentage of songs in a song list that have appeared in the category of queries in the past.
3.1 Models
We will first compare our model to a graph-only approach (known as cosine only). For a given input, this model uses the cosine similarity to calculate T nearest song (here T=50), by weighting all the songs in the song with the cosine similarity to calculate a histogram, as shown in formula (11). The second model is the use of the KL dispersion of the NMF, we become the NMF. The last model, GNMF, is based on the KL divergence using the Tikhonov regularization, and the diagram in our model is applied.
3.2 Queries
We tested our model with 3 different queries. In all 3 queries, a query ctest contains the s=3 song as input, and the system returns the closest k=30 song as output as a single song. The first query is a random query that randomly selects songs from all categories of songs, with the result being the benchmark for comparison only. The second Test query, randomly selects 3 songs in a single song in the Test set. The third kind of sampling query, under a category randomly selected 3 song. This query simulates the recommendation system for a user to query a song list through a song category.
3.3 Training
We use a subset of 70% randomly selected from all the songs as the training set, because our model is not joint convex, the initialization may have an impact on the performance of the system, so we use the commonly used NNDSVD technology to get a good approximation solution. In all of our experiments, the results of r=15 are good, which means that each row has 5-20 non-0 elements. The best parameters Θa = 18 and θb= 1 Use the method of grid search. To prevent overfitting, we use an early stop method when the MPR of the validation set has just stopped growing.
3.4 Validation Set
We build the songs in the validation set by manually querying in the Unused song list category. For each category, we have randomly selected s=3 songs in songs that were previously created in the user-marked category of the song.
3.5 Results
The results of the model, that is, the classification accuracy of the different models of the song and the MPR we are listed in Table 3-1 and table 3-2. As we have expected, for random queries, all models are unable to return to the song based on the input song, and the NMF with no false information is performing poorly with the use of collaborative filtering. This can be understood to be the result of sparse data sets, the dataset contains only 5-20 non-0 elements per row, and the sparsity is only 0.11-0.46%. The better the co-filtering model behaves when there are more observed levels, the cosine model is more efficient at class accuracy because it directly uses the cosine distance between the input song and the song sheet. However, its MPR shows that even if the situation is complicated, our model behaves better when the song is recommended.
4. Conclusion
In this paper, we introduce a new flexible song recommendation system, which combines the collaborative filtering information of the song list with the song similarity information contained in the graph. We use a primitive-dual optimization pattern to get a highly parallel algorithm that can be used to process large datasets. We chose the TV of the graph instead of the Tikhonov regularization, and by comparing our system with 3 different algorithms in a real music song single data set, we showed the good experimental results of our model.
Original link
Read more about dry goods, please scan the following two-dimensional code:
A song recommendation algorithm combining nonnegative matrix decomposition and graph total variation