Pearson correlation coefficient principle, and Java implementation

Source: Internet
Author: User


Reprint please indicate source: http://blog.csdn.net/u010670689/article/details/418951051. Principle:



The four formulas listed above are equivalent, where e is the mathematical expectation, CoV represents the covariance, and N indicates the number of variables to be evaluated.

Mathematical expectation, covariance interpretation article link: http://blog.csdn.net/u010670689/article/details/41896399


The value of the correlation coefficient is between –1 and +1, which is –1≤r≤+1. The properties are as follows:

    • When r>0, two variables are positively correlated, and when r<0, they are negatively correlated.
    • When |r|=1, it means that the two variables are completely linearly related, that is, a function relationship.
    • When r=0, it indicates the relationship between the two variables of the wireless correlation.
    • When 0<|r|<1, there is a certain degree of linear correlation between the two variables. The closer the |r| is to 1, the closer the linear relationship between the two variables is, and the closer the |r| is to 0, the weaker the linear correlation between the two variables.
    • Generally can be divided into three levels: |r|<0.4 for low-grade linear correlation, 0.4≤|r|<0.7 is significant correlation; 0.7≤|r|<1 is highly linear dependent.


2.java implementation: Using Formula Two to achieve

Package Youling.studio.pearson;import Java.util.arraylist;import Java.util.hashmap;import java.util.Iterator;  Import Java.util.list;import java.util.map;import org.apache.log4j.logger;/** * * */public class Similarity {static    Logger Logger = Logger.getlogger (Similarity.class.getName ());    map<string, double> rating_map = new hashmap<string, double> ();     list<double> rating_map_list = new arraylist<double> ();        /** * @param args */public static void main (string[] args) {Similarity similarity1 = new similarity ();        Similarity1.rating_map_list.add (20d);        Similarity1.rating_map_list.add (7d);        Similarity1.rating_map_list.add (26d);        Similarity similarity2 = new similarity ();        Similarity2.rating_map_list.add (7d);        Similarity2.rating_map_list.add (3d);        Similarity2.rating_map_list.add (6d); Logger.info ("" + Similarity1.getsimilarity_bydim (similarity2)); More than 0.8, belonging to highly correlated SimilaritY similarity3 = new similarity ();        Similarity3.rating_map_list.add (12d);        Similarity3.rating_map_list.add (4d);        Similarity3.rating_map_list.add (8d);        Similarity similarity4 = new similarity ();        Similarity4.rating_map_list.add (3d);        Similarity4.rating_map_list.add (1d);        Similarity4.rating_map_list.add (2d); Logger.info ("" + Similarity3.getsimilarity_bydim (similarity4)); The result is that the 1.0 ratio is actually the front and back is a multiple relationship} public Double Getsimilarity_bydim (similarity u) {if (This.rating_map_lis       T.size ()!=u.rating_map_list.size ()) {return null; } Double sim = 0d; The last Pearson correlation coefficient double common_items_len = this.rating_map_list.size (); Number of operands double this_sum = 0d; The first correlation number and double u_sum = 0d; The second correlation number and double this_sum_sq = 0d; The sum of squares of the first correlation number and double u_sum_sq = 0d; The second correlation number squared and double p_sum = 0d; Sum of two correlation number products and for (int i = 0;i<this.rating_map_list.size (); i++) {Double This_grade = THIS.RAting_map_list.get (i);   Double U_grade = U.rating_map_list.get (i);   Scoring sum//squared and//product and this_sum + = This_grade;   U_sum + = U_grade;   THIS_SUM_SQ + = Math.pow (This_grade, 2);   U_SUM_SQ + = Math.pow (U_grade, 2);       P_sum + = This_grade*u_grade;        } logger.info ("Common_items_len:" +common_items_len);        Logger.info ("P_sum:" +p_sum);        Logger.info ("This_sum:" +this_sum);        Logger.info ("U_sum:" +u_sum);        Double num = common_items_len * p_sum-this_sum * u_sum; Double den = math.sqrt ((Common_items_len * THIS_SUM_SQ-MATH.POW (This_sum, 2)) * (Common_items_len * U_SUM_SQ-MATH.POW (        U_sum, 2));        Logger.info ("+ num +": "+ den); Sim = (Den = = 0)?           1:num/den;   Return SIM;   }    }


3. Scope of application:

When the standard deviation of two variables is not zero, the correlation coefficients are defined, and the Pearson correlation coefficient applies To:

(1), two variables are linear relations, are continuous data.

(2), two variables are generally normal, or nearly normal single-peak distribution.

The observed values of (3) and two variables are paired, and each pair of observations is independent of each other.






---


Pearson correlation coefficient principle, and Java implementation

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.