Fifth day of Learning Big data: Python implementation of least squares (ii)

Last Update:2016-04-29 Source: Internet

Author: User

Tags sin

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1.numpy.random.normal

Numpy.random.normal

Numpy.random. Normal ( loc=0.0, scale=1.0, size=none )

Draw random samples from a normal (Gaussian) distribution.

The probability density function of the normal distribution, first derived by De Moivre and years later by both Gauss and Laplace independently [R250], is often called the bell curve because of it characteristic shape (see the example Belo W).

The normal distributions occurs often in nature. For example, it describes the commonly occurring distribution of samples influenced by a large number of tiny, random dist Urbances, each with its own unique distribution [R250].

Parameters:	Loc : Float Mean ("centre") of the distribution. Scale: Float Standard deviation (spread or "width") of the distribution. size : int or tuple of ints, optional output shape. If the given shape is, e.g., (m, n, k) , then m * n * k samples is drawn. Default is None, with which case a single value is returned.

Parameters:

Loc : Float

Mean ("centre") of the distribution.

Scale: Float

Standard deviation (spread or "width") of the distribution.

size : int or tuple of ints, optional

output shape. If the given shape is, e.g., (m, n, k) , then m * n * k samples is drawn. Default is None, with which case a single value is returned.

See Also

Scipy.stats.distributions.norm: probability density function, distribution or cumulative density function, etc.

Notes

The probability density for the Gaussian distribution is

The where is the mean and the standard deviation. The square of the deviation, is called the variance.

The function has a peak at the mean, and its ' spread ' increases with the standard deviation (the function reaches 0.607 Times its maximum at and [R250]). This implies, Numpy.random.normal are more likely to return samples lying close to the mean, rather than those far away .

References

[r249"

wikipedia, " Normal distribution ", http://en.wikipedia.org/wiki/normal_distribution

[r250"

(1, 2, 3, 4) p. R Peebles Jr, "Central Limit theorem" in "Probability, random Variables and random Signal Principles ", 4th ed., 2001, pp. Wuyi, Wuyi.

Examples

Draw samples from the distribution:

>>>

>>>mu, Sigma = 0, 0.1 # mean and standard deviation>>>s = NP.Random.Normal(mu, Sigma,  +)

Verify the mean and the variance:

>>>

>>>  abs   ( mu  - np   mean   ( s   <  0.01  true

>>>

>>>  abs   ( sigma  - np   std   ( s   ddof  =  1   <  0.01  true

Display the histogram of the samples, along with the probability density function:

>>>

>>>Import Matplotlib.pyplot  as PLT>>>Count, Bins, ignored = PLT.hist(s,  -, normed=True)>>>PLT.plot(Bins, 1/(Sigma * NP.sqrt(2 * NP.Pi)) *...                NP.Exp( - (Bins - mu)**2 / (2 * Sigma**2) ),...          linewidth=2, Color=' R ')>>>PLT.Show()

(Source code, PNG, PDF)

2.numpy.random.randn

Import NumPy as NP
NP.RANDOM.RANDN (2,3)

Array ([[0.59941534,  1.0991949,  1.36316028],       [ -0.01979197,  1.30783162, 0.69808199]])

This means that it is randomly extracted from the standard positive distribution.

3.scipy.optimize.leastsq

Least squares

Import NumPy as NP
From scipy.optimize import leastsq

#待拟合的函数, X is the variable, p is the parameter
def fun (x, p):
A, B = P
return a*x + b

#计算真实数据和拟合数据之间的误差, p is the parameter to be fitted, and X and Y are the corresponding real data respectively.
def residuals (p, x, y):
return Fun (x, p)-Y

#一组真实数据, in the case of a=2, b=1
X1 = Np.array ([1, 2, 3, 4, 5, 6], dtype=float)
y1 = Np.array ([3, 5, 7, 9, one, all], dtype=float)

#调用拟合函数, the first parameter is the difference function that needs to be fitted, the second is the fitting initial value, and the third is the other parameter that passes in the function
R = leastsq (residuals, [1, 1], args= (x1, y1))

#打印结果, R[0] stores the results of a fit, r[1], r[2] for other information
Print R[0]

After running, the fitting result is

[2.1.]

But in the actual use of the process, I fit the function is not so simple, one of the difficulties is to fit the function is a piecewise function, you need to determine the value of the argument, and then give a different function equation, for example, such a piecewise function: When x > 3 o'clock, y = ax + B, when x <= 3 , y = ax–b, write it in Python code:

def fun (x, p):
A, B = P
if (x > 3):
return a*x + b
Else
Return A*x-b

If we were to fit with the original difference function, we would get an error like this:

Valueerror:the truth value of an array with more than one element is ambiguous. Use A.any () or A.all ()

The reason is simple, we now the fun function can only calculate a single value, if the incoming or an array, will naturally error. So what do we do? I was also very depressed, so in the scipy maillist to seek help, foreign cattle and cattle are very enthusiastic, quickly pointed out the problem. In fact, I understand the difference function is wrong, the LEASTSQ function to pass in the difference function to return is actually an array, so we can modify the difference function:

def residuals (p, x, y):
temp = Np.array ([0,0,0,0,0,0],dtype=float)
For I in range (0, Len (x)):
Temp[i] = Fun (X[i], p)
Return temp-y

Import NumPy as NP #惯例
Import scipy as SP #惯例
From scipy.optimize import leastsq #这里就是我们要使用的最小二乘的函数
Import Pylab as Pl

m = 9 #多项式的次数

def real_func (x):
Return Np.sin (2*np.pi*x) #sin (2 pi x)

Def fake_func (P, x):
f = np.poly1d (P) #多项式分布的函数
return f (x)

#残差函数
def residuals (p, y, x):
Return Y-fake_func (p, x)

#随机选了9个点, as X
x = Np.linspace (0, 1, 9)
A lot of points #画图的时候需要的 "continuous"
X_show = Np.linspace (0, 1, 1000)

y0 = Real_func (x)
#加入正态分布噪音后的y
y1 = [Np.random.normal (0, 0.1) + Y for y in y0]

#先随机产生一组多项式分布的参数
P0 = Np.random.randn (M)

PLSQ = LEASTSQ (residuals, P0, args= (y1, x))

Print (' Fitting Parameters: ', plsq[0]) #输出拟合参数

Pl.plot (X_show, Real_func (x_show), label= ' real ')
Pl.plot (X_show, Fake_func (plsq[0], x_show), label= ' fitted curve ')
Pl.plot (x, y1, ' Bo ', label= ' with Noise ')
Pl.legend ()
Pl.show ()

Fifth day of Learning Big data: Python implementation of least squares (ii)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More