Reprint please indicate source: http://blog.csdn.net/u010670689/article/details/418951051. Principle:
The four formulas listed above are equivalent, where e is the mathematical expectation, CoV represents the covariance, and N indicates the number of variables to be evaluated.
Mathematical expectation, covariance interpretation article link: http://blog.csdn.net/u010670689/article/details/41896399
The value of the correlation coefficient is between –1 and +1, which is –1≤r≤+1. The properties are as follows:
- When r>0, two variables are positively correlated, and when r<0, they are negatively correlated.
- When |r|=1, it means that the two variables are completely linearly related, that is, a function relationship.
- When r=0, it indicates the relationship between the two variables of the wireless correlation.
- When 0<|r|<1, there is a certain degree of linear correlation between the two variables. The closer the |r| is to 1, the closer the linear relationship between the two variables is, and the closer the |r| is to 0, the weaker the linear correlation between the two variables.
- Generally can be divided into three levels: |r|<0.4 for low-grade linear correlation, 0.4≤|r|<0.7 is significant correlation; 0.7≤|r|<1 is highly linear dependent.
2.java implementation: Using Formula Two to achieve
Package Youling.studio.pearson;import Java.util.arraylist;import Java.util.hashmap;import java.util.Iterator; Import Java.util.list;import java.util.map;import org.apache.log4j.logger;/** * * */public class Similarity {static Logger Logger = Logger.getlogger (Similarity.class.getName ()); map<string, double> rating_map = new hashmap<string, double> (); list<double> rating_map_list = new arraylist<double> (); /** * @param args */public static void main (string[] args) {Similarity similarity1 = new similarity (); Similarity1.rating_map_list.add (20d); Similarity1.rating_map_list.add (7d); Similarity1.rating_map_list.add (26d); Similarity similarity2 = new similarity (); Similarity2.rating_map_list.add (7d); Similarity2.rating_map_list.add (3d); Similarity2.rating_map_list.add (6d); Logger.info ("" + Similarity1.getsimilarity_bydim (similarity2)); More than 0.8, belonging to highly correlated SimilaritY similarity3 = new similarity (); Similarity3.rating_map_list.add (12d); Similarity3.rating_map_list.add (4d); Similarity3.rating_map_list.add (8d); Similarity similarity4 = new similarity (); Similarity4.rating_map_list.add (3d); Similarity4.rating_map_list.add (1d); Similarity4.rating_map_list.add (2d); Logger.info ("" + Similarity3.getsimilarity_bydim (similarity4)); The result is that the 1.0 ratio is actually the front and back is a multiple relationship} public Double Getsimilarity_bydim (similarity u) {if (This.rating_map_lis T.size ()!=u.rating_map_list.size ()) {return null; } Double sim = 0d; The last Pearson correlation coefficient double common_items_len = this.rating_map_list.size (); Number of operands double this_sum = 0d; The first correlation number and double u_sum = 0d; The second correlation number and double this_sum_sq = 0d; The sum of squares of the first correlation number and double u_sum_sq = 0d; The second correlation number squared and double p_sum = 0d; Sum of two correlation number products and for (int i = 0;i<this.rating_map_list.size (); i++) {Double This_grade = THIS.RAting_map_list.get (i); Double U_grade = U.rating_map_list.get (i); Scoring sum//squared and//product and this_sum + = This_grade; U_sum + = U_grade; THIS_SUM_SQ + = Math.pow (This_grade, 2); U_SUM_SQ + = Math.pow (U_grade, 2); P_sum + = This_grade*u_grade; } logger.info ("Common_items_len:" +common_items_len); Logger.info ("P_sum:" +p_sum); Logger.info ("This_sum:" +this_sum); Logger.info ("U_sum:" +u_sum); Double num = common_items_len * p_sum-this_sum * u_sum; Double den = math.sqrt ((Common_items_len * THIS_SUM_SQ-MATH.POW (This_sum, 2)) * (Common_items_len * U_SUM_SQ-MATH.POW ( U_sum, 2)); Logger.info ("+ num +": "+ den); Sim = (Den = = 0)? 1:num/den; Return SIM; } }
3. Scope of application:
When the standard deviation of two variables is not zero, the correlation coefficients are defined, and the Pearson correlation coefficient applies To:
(1), two variables are linear relations, are continuous data.
(2), two variables are generally normal, or nearly normal single-peak distribution.
The observed values of (3) and two variables are paired, and each pair of observations is independent of each other.
---
Pearson correlation coefficient principle, and Java implementation