Analysis of Social search and recommendation-comparison and analysis of common recommendation Algorithms

Source: Internet
Author: User

Collective filtering (collaborative filtering)
Item Based
: Applicable to a large number of users and a large number of entries, with more e-commerce applications
Advantage: Compared with user-based products, the product similarity is high.
Disadvantage: It is also found that the product similarity is high, and it is difficult to find out what the user may like.
User based: Applicable to constantly updating entries, such as link recommendations. Social websites often use the following methods to find similar users.
Advantage: you can find the entries that you may like and choose from multiple entries.
Disadvantage: The operation is large, because the user's similar user group changes a lot, it is necessary to regularly update the similar user Matrix
Achieve basically the same through Euclidean distance, Pearson coefficient and so on to find the similarity between each other

Content-based (Content-based classification)
The Text Mining Application extracts each keyword from the content provided or crawled by the application. The more common keywords, the higher the similarity between the two items. In this way, you can make recommendations even if there is no score data.
Applicable to: Cold Start problems such as the interest provided during registration and so on for newly registered users, or cold start problems such as newly added items with no user rating

Slope one
Based on the difference between the user's score for entry A and the user's score, the user will score for entry B.AlgorithmPoor Accuracy
Advantage: the algorithm is fast and can solve the problem of no recommendation when there is little data
Disadvantages: poor accuracy. Most of the recommended products are popular.

Clustering: No prior knowledge, no supervised learning
Applicable to multi-dimensional and continuous variables, the basic idea is conceptual modeling and iterative optimization.

Consortium hierarchical clustering algorithm:
1: Define the initial tree, similar to the B + tree. All data is stored on the leaf node.
2: gradually increase the level value and determine which elements can constitute a new cluster.
Eg: Rock-linked clustering: process non-linear data, such as keywords, Boolean values, enumeration, etc., and reflect the distance by using the similarity of kibana.
Split-type hierarchical clustering algorithm:
Split data into smaller clusters from top to bottom

K-means algorithm: it is the most efficient (space complexity and time complexity), but cannot process class data or exception points (actually far from any clustering point)
Determine the number of centers by configuring K values
Adjust the center point position iteratively based on the distance from the data to the center point until the center point is not significantly changed.
(Euclidean distance and Kullback-Leibler divergence can be used for distance)

DBSCAN: density-based spatial clustering algorithm (complexity increases compared with K-means, but exception points can be effectively removed)
Used to locate the density and noise in the dataset Based on the EPS (cluster circle radius) and minipoints (minimum number of data points in the cluster)

Birch: Clustering Algorithm for large-scale data through the data extrusion Algorithm

Clustering Algorithms for big data scale should:
1: Data addressing is reduced through sampling and indexing.
2: Give the best result at any time
3: the algorithm can be paused and restored.
4: fully considering the memory limit

 

UGC

Graph-based

Not complete...

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.