Python implementation: KL distance, Jensen-shannon distance __python

Source: Internet
Author: User


Kullback–leibler Divergence:kl distance is from the angle of information entropy, also known as relative entropy, to measure the difference between two probability distributions in the same event space.
Calculation formula:

Cross entropy of =p and Q: Information entropy of P
Property: (1) KL (p| | Q >= 0, no maximum (2) asymmetric KL (p| | Q)!= KL (q| | P) (3) does not satisfy the trigonometric inequalities

Here's the problem: how to deal with P and Q as 0 in the probability distribution. How the "Log of zero" error is resolved in Python. A simple method is to add a minimum value to the probability distribution, so that the probability value is not 0, but has no effect on the probability distribution. In MATLAB, ESP represents the minimum, and the NumPy library in Python has the spacing function to represent the minimum value,
</pre><pre name= "code" class= "Python" >import NumPy as NP from
Math import log
def KLD (p,q):
    p,q =zip (*filter (Lambda (x,y): x!=0 or y!=0, zip (p,q)) #去掉二者都是0的概率值
    p=p+np.spacing (1)
    q=q+np.spacing (1)
    Print p,q return
    sum ([_p * log (_p/_q,2) to (_p,_q) in Zip (p,q)])
P=np.ones (5)/5.0
q=[0,0,0.5,0.2,0.3]
print KLD (p,q)
Results: 19.489850642923379
<span style= "font-family:arial, Helvetica, Sans-serif; Color:rgb (255, 0, 0); Background-color:rgb (255, 255, 255); ><strong>
</strong></span>
<span style= "font-family:arial, Helvetica, Sans-serif; Color:rgb (255, 0, 0); Background-color:rgb (255, 255, 255); ><strong> improve </strong></span><span style= "font-family:arial, Helvetica, Sans-serif; Background-color:rgb (255, 255, 255); ": Because of its asymmetry, can not fully represent the one-way relationship between the two distributions, so someone proposed the jensen-shannon distance, calculate each distribution and the average distribution of KL distance and then the mean value." </span>
M= (P+Q)/2 Jsd=1/2*kl (p| | M) +1/2*kl (q| | M) When log's base is 2 o'clock, JSD's range "0,1" when Log's base is E, JSD's range "0,log (e,2)"
Application Example: Calculates the letter distribution distance of two strings. such as: ' Absfjowswls ' and ' AHOAFBAQQQ ' the probability distribution of the occurrence of each letter is different.
import string from Math import log import NumPy as NP Kld=lambda p,q:sum (_p * log (_p,2)-_p * log (_q,2 for (_p,_q) in Zip (P,q)]) def jsd_core (p,q):     P,q=zip (*filter (Lambda (x,y): x!=0 or y!=0, zip (p,q)) #去掉二 The probability value of 0     M = [0.5* (_p+_q) for _p,_q in Zip (p,q)]     p=p+np.spacing (1)     Q=q+np.spacin G (1)     m=m+np.spacing (1) #     Print p,q,m     return 0.5*kld (p,m) +0.5*kld (q,m) reg=lamb Da X:[x.count (i) for I-string.lowercase]   #频数分布 rate=lambda y:[round (I*1.0/sum (Reg (y)), 4) for I in Reg (y)]  # Probability distribution s1= ' Ahaebssa ' s2= ' awohwsess ' Print jsd_core (rate (S1), rate (S2)) 
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.