The previous series has talked about various kinds of knowledge, including drawing curves, scatter plots, power distributions and so on, and it becomes very important how to fit a straight line in a pile of scatter plots. This article mainly describes the Curve_fit function that calls the SCIPY extension package to achieve the curve fitting, simultaneously calculates the fitting function, the parameter and so on. Hope the article is helpful to you, if there are errors or deficiencies in the article, please Haihan ~
recommended in the previous article:
"Python Data Mining Course" I. Installation of Python and crawlers introduction
"Python Data Mining Course" two. Kmeans clustering data analysis and Anaconda introduction
"Python Data Mining Course" three. Kmeans clustering code implementation, operation and optimization
"Python Data Mining Course" four. Decision tree DTC data analysis and IRIS data set analysis
"Python Data Mining Course" five. Linear regression knowledge and prediction of diabetic cases
"Python Data Mining Course" six. Numpy, Pandas and Matplotlib package basics
"Python Data Mining Course" seven. PCA dimensionality reduction operation and subplot mapping
"Python Data Mining Course" eight. Association rules mining and apriori implementation shopping recommendations
"Python Data Mining Course" nine. Regression model linearregression Simple analysis of oxide data
"Python Data Mining Course" 10. Pandas, Matplotlib, PCA drawing Practical Code Supplement
"Python Data Mining Course" 11. Visual analysis of Pandas and matplotlib combined with SQL statements
"Python Data Mining Course" 12. Pandas, matplotlib combined with SQL statement comparison Chart analysis
"Python Data Mining Course" 13. Wordcloud Word cloud configuration process and frequency analysis
I. SCIPY INTRODUCTION
SciPy (pronounced "sigh Pie") is an open source mathematical, scientific, and engineering computing package. It is a convenient, easy-to-use, scientific and Engineered Python toolkit that includes statistics, optimization, integration, linear algebra modules, Fourier transforms, signal and image processing, ordinary differential equation solvers, and more.
Official address: https://www.scipy.org/
SciPy commonly used modules and functions as shown in:
Highly recommended Liu Shin's article: scipy High-end scientific computing-Liu scars
scipy optimization and fitting uses the Optimize module, which provides useful algorithms for function minimums (scalar or multidimensional), curve fitting, and finding the root of an equation.
Official Introduction: Scipy.optimize.curve_fit
The following is a detailed introduction from the examples, including:
1. Call the Numpy.polyfit () function to achieve a two-time polynomial-fitting;
2.Pandas import data, call SCIPY implementation of the sub-square fit;
3. Achieve the Np.exp () Form E of the sub-square fit;
4. Realize the form fitting of three parameters;
5. Finally, the power-rate graph analysis introduces some of his ideas and problems.
two. Curve fitting
1. Polynomial-fitting
First, the x, y coordinates are defined by Numpy.arange, then the Polyfit () function is called for 3 polynomial fitting, and finally the Matplotlib function is called for scatter plot (x, y) coordinates and the predicted curve is plotted.
Full code:
#encoding =utf-8 Import numpy as Npimport Matplotlib.pyplot as plt# define x, y scatter coordinates x = Np.arange (1, 1) num = [4.00, 5.20, 5 .6.80, 7.34, 8.57, 9.86, 10.12, 12.56, 14.32, 15.42, 16.50, 18.92, 19.58, 20.00]y = Np.array (num) #用3次多项式拟合f1 = Np.polyfit (x, Y, 3) p1 = np.poly1d (f1) print (p1) #也可使用yvals =np.polyval (F1, x) yvals = p1 (x) #拟合y值 # Plot plot1 = Plt.plot (x , y, ' s ', label= ' original values ') Plot2 = Plt.plot (x, Yvals, ' R ', label= ' polyfit values ') Plt.xlabel (' x ') Plt.ylabel (' Y ') Plt.legend (loc=4) #指定legend的位置右下角plt. Title (' polyfitting ') plt.show () plt.savefig (' Test.png ')
The output is as shown, including the blue square scatter and the red fit curve.
Polynomial functions are: y=-0.004669 x3 + 0.1392 x2 + 0.04214 x + 4.313
Add: Give the function, can use Origin to draw, also more convenient.
2.e b/x-square fitting
The following uses SciPy's Curve_fit () to the above data for the b/x-square fitting of E. The data set is as follows:
x = Np.arange (1, 1) num = [4.00, 5.20, 5.900, 6.80, 7.34, 8.57, 9.86, 10.12, 12.56, 14.32, 15.42, 16.50, 18.92, 19.58, 20.00]y = Np.array (num)
where x coordinates from 1 to 15,y correspond to the NUM array, such as the first point (1, 4.00), and the last point (15, 20.00).
Then call the Curve_fit () function, the core step:
(1) Define the type of function that needs to be fitted, such as:
def func (x, A, B):
Return A*np.exp (b/x)
(2) Call popt, Pcov = Curve_fit (func, x, y) function to fit and store the fitting coefficients in popt, a=popt[0], b=popt[1];
(3) Call the Func (x, a, b) function, where x represents the horizontal table, and A, b represents the corresponding parameter.
The complete code is as follows:
#encoding =utf-8 Import numpy as Npimport Matplotlib.pyplot as Pltfrom scipy.optimize import curve_fit# Custom function e exponential form def Func (x, A, b): return A*np.exp (b/x) #定义x, y scatter coordinate x = Np.arange (1, 1) num = [4.00, 5.20, 5.900, 6.80, 7.34, 8.57, 9. 10.12, 12.56, 14.32, 15.42, 16.50, 18.92, 19.58, 20.00]y = Np.array (num) #非线性最小二乘法拟合popt, Pcov = Curve_fit (func, X, Y) #获取popt里面是拟合系数a = popt[0] b = popt[1]yvals = func (x,a,b) #拟合y值print U ' factor A: ', aprint u ' coefficient b: ', b# drawing plot1 = Plt.plot (x, Y, ' s ', label= ' original values ') Plot2 = Plt.plot (x, Yvals, ' R ', label= ' polyfit values ') Plt.xlabel (' x ') Plt.ylabel (' Y ') Plt.legend (loc=4) #指定legend的位置右下角plt. Title (' Curve_fit ') plt.show () plt.savefig (' Test2.png ')
the drawing is shown below, and the fit effect is not as good as a polynomial.
3.aX B-Order fitting
The third approach is to import data through pandas, because the data is usually stored in a CSV, Excel, or database, so here is a B-a*x form that is plotted with read-write data.
Suppose a data.csv file exists locally, as shown in the dataset:
then call the pandas extension package to read the data and get the X, Y value display, this code is as follows:
#导入数据及x, y scatter coordinates data = Pd.read_csv ("data.csv") print Dataprint (data.shape) print (Data.head (5)) #显示前5行数据x = data[' x ') #获取x列y = data[' y '] #获取y列print xprint y
For example, print y output results:
0 4.001 5.202 5.903 6.804 7.345 8.576 9.867 10.128 12.569 14.3210 15.4211 16.5012 18.9213 19.5814 20.00name:y, Dtype:float64
The final complete fitting code looks like this:
#encoding =utf-8 Import numpy as Npimport Matplotlib.pyplot as Pltfrom scipy.optimize import Curve_fitimport Pandas as PD #自定义函数 e exponential form def func (x, A, b): return A*pow (x,b) #导入数据及x, y scatter coordinates data = Pd.read_csv ("data.csv") Print Dataprint ( Data.shape) Print (Data.head (5)) #显示前5行数据x = data[' x ']y = data[' y ']print xprint y# Nonlinear least squares fitting popt, Pcov = Curve_fit ( Func, x, y) #获取popt里面是拟合系数a = popt[0] b = popt[1]yvals = func (x,a,b) #拟合y值print U ' factor A: ', aprint u ' coefficient b: ', b# plot plot1 = plt.pl OT (x, y, ' s ', label= ' original values ') Plot2 = Plt.plot (x, Yvals, ' R ', label= ' polyfit values ') Plt.xlabel (' x ') Plt.ylabel (' Y ') plt.legend (loc=4) #指定legend的位置右下角plt. Title (' Curve_fit ') plt.savefig (' Test3.png ') plt.show ()
the output looks like this:
4. Three parameter fitting
Finally, the official example is presented, which tells about passing three parameters, usually a*e (b/x) +c form.
import NumPy as Npimport Matplotlib.pyplot as Pltfrom scipy.optimize import curve_fitdef func (x, A, B, c): Return a * NP.EXP (- b * x) + C # define the data to is fit with some noisexdata = Np.linspace (0, 4, x) y = func (xdata, 2.5, 1.3, 0.5) Y_noise = 0.2 * Np.random.normal (size=xdata.size) ydata = y + y_noiseplt.plot (xdata, Ydata, ' B-', label= ' data ') # Fit for the Paramete Rs A, B, C of the function ' func ' popt, Pcov = Curve_fit (func, XData, Ydata) Plt.plot (XData, func (XData, *popt), ' R ', label = ' fit ') # constrain the optimization to the region of ' 0 < A < 3 ', ' 0 < b < 2 ' # and ' ' 0 < C < 1 ':p op T, Pcov = Curve_fit (func, XData, Ydata, bounds= (0, [3., 2., 1.])) Plt.plot (XData, func (XData, *popt), ' g--', label= ' fit-with-bounds ') Plt.xlabel (' x ') Plt.ylabel (' Y ') plt.legend () Plt.show ()
The output looks like this:
three. Power law distribution fitting and doubt
Here is my experiment with power distribution, because it involves secrecy, so only a few questions are raised.
Figure 1 is a polynomial fitting result, which basically conforms to the graph trend.
Figure 2 is the power exponent fitting result, the power exponent is 1.18 also conforms to the human basic activity law.
questions:
1. Why the graph of power law distribution is not good, but the index is very good;
2. The calculation of power exponent and fitting is only good for the middle part of the effect;
3.E b/x, polynomial equation, x B-time which effect is good?
Finally, I hope this article will help you, especially my students and friends who have access to data mining and machine learning. This text is mainly about fitting, recording some code snippets, as online notes, but also hope to help you. At the same time, the following paper will opensource series of articles after writing.
A drunk a light dance, a dream a reincarnation. A song and a Life, a dream of a lifetime.
(By:eastmount 2017-05-07 3:30 P.M. http://blog.csdn.net/eastmount/)
"Python Data Mining Course" 14. SciPy call Curve_fit to implement curve fitting