1:Cauchy distribution
Probability density function
The Cauchy distribution has the probability density function
-
= { 1 \over \pi } \left[ { \gamma \over (x - x_0)^2 + \gamma^2 } \right], ">
where x0 is the location parameter, specifying the location of the peak of the distribution, and γ is the scale parameter which specifies the half-width at half-maximum (HWHM). γ is also equal to half the interquartile range and is sometimes called the probable error. Cauchy himself exploited such a density function in 1827, with infinitesimal scale parameter, in defining a Dirac delta function (see there).
Probability density function
The purple curve is the standard Cauchy distribution
The special case when x0 = 0 and γ = 1 is called the standard Cauchy distribution with the probability density function
Cumulative distribution function
The cumulative distribution function (cdf) is:
Cumulative distribution function
2:p-stable distributions
根據上面原理,很容易證明標準常態分佈是2-stable。
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
問題:
1:怎麼預先計算k值
通過隨機從dataset中取小量點,讓後按照演算法計算一邊,通過遞增k值,找到一個k值使得計算時間最小。
2:怎麼放bucket裡面
每個點,都有L個K元向量,其實向量中的每個元素都是同一種性質的,只是用了不同hash函數hash的話。至於具體怎麼分布的就要看h1這個函數了。
3:怎麼保證精確度
manual手冊上有詳細說明,其實為什麼作者選用標準常態分佈,就是因為標準常態分佈是2-stable,這樣在精確度方面就有了數學的保證