There are many similarity implementations in the Mahout recommendation system that implement calculations that do not have a similarity between user or item. For data sources with different data volumes and data types, different similarity calculation methods are required to improve the recommended performance, and in Mahout, a large number of components for computing similarity are provided, each of which
Last time we talked about Mahout's Computational project module Mahout Math. This contains a lot of commonly used mathematical calculations or statistical aspects, there are many things that may be used, so there is a good understanding of the needs of these foundations. Mahout provides a number of tools for the command-line, listed below all the commands, of course, this will change, and each has a differe
In reality, the recommended systems are generally based on the collaborative filtering algorithm, such algorithms usually need to calculate the user and user or project and project similarity, for data and data types of different data sources, need different similarity calculation method to improve the recommended performance, The mahout provides a large number of components for computing similarity, and these components implement different methods of
algorithm during text preprocessing, and adjusting training parameters when training classifier. Repeat the process until a satisfactory model is obtained.
to categorize text:Once the text classifier is created, you can enter a text and output a category.
Step1: Upload the original data sport and User-sport folders needed to HDFs
Sport folder:Used to train text classifiers to contain multiple subfolders, each of which is a categorized article in a real-world project where the raw data need
In reality, the recommendation system is generally based on collaborative filtering algorithms, which usually need to calculate the user and user or project and project similarity, for data volume and data types of different data sources, need different similarity calculation method to improve the recommended performance, In Mahout, a large number of components are provided for computing similarity, and these components implement different similarity
compelling vision for big data and Hadoop, and the ultimate expectation of many companies for big data platforms. as more data becomes available, the value of future big data platforms depends more on how much AI is being calculated. Now machine learning is slowly spanning the ivory tower, from a small number of academics to research the science and technology issues into many enterprises are validating the use of data analysis tools, and has become
Recommendation algorithms in Mahout include User-based Recommender, Item-based Recommender, and Slope-One Recommender.
1. User-based Recommender
The main idea of this algorithm is: the product most similar to user u is probably the product that user u prefers.
1. for each product of user u without preference i2 for each user of user v3 with preference for product I, the similarity s between user u and v is calculated. // In fact, online computing is n
/* * This program is written to test custom grouplens evaluation * */package Byuser;import Java.io.file;import Org.apache.mahout.cf.taste.common.tasteexception;import Org.apache.mahout.cf.taste.eval.RecommenderBuilder; Import Org.apache.mahout.cf.taste.eval.recommenderevaluator;import Org.apache.mahout.cf.taste.impl.eval.averageabsolutedifferencerecommenderevaluator;import Org.apache.mahout.cf.taste.impl.neighborhood.nearestnuserneighborhood;import Org.apache.mahout.cf.taste.impl.recommender.gen
canopy.
Canopy clustering is often used as an initial step for stricter clustering techniques, such as K-means clustering. Through an initial clustering, You can significantly reduce the number of consumed distance measurements by ignoring the points of the initial canopies.
The canopy clustering algorithm is often used to pre-process the K-means clustering algorithm to find the appropriate K-value and cluster center.
The biggest problem with K-means is that users must give the number of K
Version: hadoop2.4+mahout0.9
When you invoke the Cloud Platform Mahout algorithm in a Web program, you sometimes encounter problems where you cannot find a path, such as the Org.apache.mahout.clustering.classify.ClusterClassifier in the class
public void Readfromseqfiles (Configuration conf, path Path) throws IOException {
Configuration config = new Configurat Ion ();
list
More Wonderful content: http://www.bianceng.cnhttp://www.biance
Hadoop Foundation----Hadoop Combat (vi)-----HADOOP management Tools---Cloudera Manager---CDH introduction
We have already learned about CDH in the last article, we will install CDH5.8 for the following study. CDH5.8 is now a relatively new version of Hadoop with more than hadoop2.0, and it already contains a number of
1. Construct Affinity Matrix W
2. The structure of the matrix D
3. Laplace Matrix L
4. Compute the eigenvector fiedler vector corresponding to the second small eigenvalue (spectrum) of the L matrix
5. Using the Fiedler vector as the initial center of the Kmean cluster, the Kmeans cluster
Affinity Matrix: W_ij=exp (-(d (S_i,s_j)/2o^2)) d (s_i,s_j) = | | s_i,s_j| |. o for pre-set parameters.
Degree matrix: D_ii =sum (w_i)
Canonical similarity matrix: d^ ( -1/2) *w*d^ (1/2), i.e.: W (I,J)/(
If we need to add some filtering rules (such as item creation time within one year) when building a recommendation system through mahout, we need to use the Idrescorer interface, which is the source code as follows:
Package org.apache.mahout.cf.taste.recommender; /** *
The interface sets out two methods that must be implemented:
1.rescore method
Function: Defines the logic for a new score. According to the new rule, the item for the specified
Objective: To introduce the use of a common recommendation algorithm (user collaborative filtering).Application scenario: After the XXX project runs for a period of time, there will be a lot of video information in the system, and usually the app gives the user push messages (1-3/day),Then this requires us to push more effectively based on the user's behavioral characteristics.Tool Introduction: The use of mahout collaborative filtering algorithmTest
1. After configuring Maven under Eclipse-jee (the version I choose), add Mahout-core to the Pom.xml depencies, and the version is optional
The configuration is then downloaded automatically. will be downloaded by default to/home/cc/.m2/below a folder inside, the file is hidden.
Http://blog.sina.com.cn/s/blog_6dc9c7cb0101bmch.html
See this blog post for detailed procedures.
2. Add all the jar files under the
Chapter 2 mapreduce IntroductionAn ideal part size is usually the size of an HDFS block. The execution node of the map task and the storage node of the input data are the same node, and the hadoop performance is optimal (Data Locality optimization, avoid data transmission over the network ).
Mapreduce Process summary: reads a row of data from a file, map function processing, Return key-value pairs; the system sorts the map results. If there are multi
Canopy algorithm, simple process, easy to implement, it is the algorithm
(1) Set the sample set to S, determine two thresholds T1 and T2, and t1>t2.
(2) To take a sample point P belongs to S, as a canopy, recorded as C, remove p from S.
(3) Calculate the distance of all points to P in s Dist
(4) If the DIST
(5) If dist
(6) Repeat (2) ~ (5) until S is empty.
The above procedure shows that the point of the dist
Canopy has the effect of eliminating isolated points, and K-means is powerless
hadoop1.0.4,mahout0.5.
Mahout inside the implementation of the read clustering algorithm, called Clusterdumper, this class output format is generally as follows:
Vl-2{n=6 c=[1.833, 2.417] r=[0.687, 0.344]}
Weight: Point :
1.0: [1.000, 3.000]
...
1.0: [3.000, 2.500]
vl-11{n=7 c=[2.857, 4.714] r=[0.990, 0.364]}
Weight: Point :
1.0: [1.000, 5.000]
...
1.0: [4.000, 4.500]
vl-14{n=8 c=[4.750, 3.438] r=[0.433
From: http://blog.csdn.net/heyutao007/article/details/8612906Mahout supports 2 kinds of m/r jobs to implement itembase collaborative filteringI.itemsimilarityjobIi. RecommenderjobBelow we analyze the Recommenderjob, the version is mahout-distribution-0.7SOURCE Package Location: Org.apache.mahout.cf.taste.hadoop.item.RecommenderJobRecommenderjob the first few stages and itemsimilarityjob are the same, but Itemsimilarityjob calculates the similarity mat
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.