This article focuses on the contents of the 1.2Python libraries and functions in the first chapter of the Python Machine learning Time Guide. Learn the workflow of machine learning.
I. Acquisition and inspection of data
Requests getting data
Pandans processing Data
1 ImportOS2 ImportPandas as PD3 ImportRequests4 5PATH = R'E:/python machine learning blueprints/chap1/1.2/'6R = Requests.get ('Https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data')7With open (PATH +'Iris.data','W') as F:8 f.write (R.text)9 Os.chdir (PATH)TenDF = pd.read_csv (PATH +'Iris.data', names=['sepal Length','sepal Width','Petal Length','Petal Width','class']) OneDf.head ()
Note: 1, requests library for access to data API interaction excuse, pandas is a data analysis tool, both can be dedicated to follow-up research
2, read_csv after the path name can not contain Chinese, otherwise will be reported oserror:initializing from file failed.
The results of the above program operation:
Sepal length sepal width petal length petal width class
0 5.1 3.5 1.4 0.2 Iris-setosa
1 4.9 3.0 1.4 0.2 Iris-setosa
2 4.7 3.2 1.3 0.2 Iris-setosa
3 4.6 3.1 1.5 0.2 iris-setosa
4 5.0 3.6 1.4 0.2 Iris-setosa
5 5.4 3.9 1.7 0.4 Iris-setosa
6 4.6 3.4 1.4 0.3 Iris-setosa
7 5.0 3.4 1.5 0.2 iris-setosa ...
Notable actions in the book for DF Processing:
(1) Filter:
df[(df[' class '] = = ' Iris-setosa ') & (df[' petal width ']>2.2)]
Sepal length sepal width petal length petal width class
6.3 3.3 6.0 2.5 Iris-virginica
109 7.2 3.6 6.1 2.5 Iris-virginica
5.8 2.8 5.1 2.4 Iris-virginica
6.4 3.2 5.3 2.3 Iris-virginica
118 7.7 2.6 6.9 2.3 Iris-virginica
6.9 3.2 5.7 2.3 Iris-virginica
135 7.7 3.0 6.1 2.3 Iris-virginica
136 6.3 3.4 5.6 2.4 Iris-virginica
6.7 3.1 5.6 2.4 Iris-virginica
141 6.9 3.1 5.1 2.3 Iris-virginica
143 6.8 3.2 5.9 2.3 Iris-virginica
144 6.7 3.3 5.7 2.5 Iris-virginica
145 6.7 3.0 5.2 2.3 Iris-virginica
148 6.2 3.4 5.4 2.3 Iris-virginica
(2) Re-save the filtered data and reset the index
VIRGINICADF = df[(df[' class '] = = ' Iris-virginica ') & (df[' petal width ']>2.2)].reset_index (drop=true)
(3) To obtain the statistic information of each column
Df.describe ()
(4) per row-column correlation coefficient
Df.corr ()
Matplotlib Plotting data
(1) Bar chart (hist)
1 ImportMatplotlib.pyplot as Plt2Plt.style.use ('Ggplot')3 #%matplotlib Inline4 ImportNumPy as NP5 6Fig,ax = Plt.subplots (figsize= (6,4))7Ax.hist (df['Petal Width'], color='Black')8Ax.set_ylabel ('Count', fontsize=12)9Ax.set_xlabel ('Width', fontsize=12)TenPlt.title ('Iris Petal Width', Fontsize=14, y=1.01) OnePlt.show ()
Note:%matplotlib inline is ipython in the statement, first temporarily sealed off, figsize= (6,4) do not miss out =. To show, add the show () function at the end
(2) Scatter chart (scatter)
1Fig,ax = Plt.subplots (figsize= (6,6))2Ax.scatter (df['Petal Width'],df['Petal Length'], color='Green')3Ax.set_ylabel ('Petal Width', fontsize=12)4Ax.set_xlabel ('Petal Length', fontsize=12)5Plt.title ('Petal Scatterplot')6Plt.show ()
(3) Direct Line drawing (plot)
1Fig,ax = Plt.subplots (figsize= (6,6))2Ax.scatter (df['Petal Width'],df['Petal Length'], color='Green')3Ax.set_ylabel ('Petal Width', fontsize=12)4Ax.set_xlabel ('Petal Length', fontsize=12)5Plt.title ('Petal Scatterplot')6Plt.show ()
(4) Stacked bar chart
1Fig,ax = Plt.subplots (figsize= (6,6))2Bar_width =. 83Labels = [x forXinchDf.columnsif 'length' inchXor 'width' inchx]4ver_y = [df[df['class']=='Iris-versicolor'][x].mean () forXinchLabels]5vir_y = [df[df['class']=='Iris-virginica'][x].mean () forXinchLabels]6set_y = [df[df['class']=='Iris-setosa'][x].mean () forXinchLabels]7x =Np.arange (len (labels))8Ax.bar (x,vir_y,bar_width,bottom=set_y,color='Darkgrey')9Ax.bar (x,set_y,bar_width,bottom=ver_y,color=' White')TenAx.bar (x,ver_y,bar_width,color='Black') OneAx.set_xticks (x+ (BAR_WIDTH/2)) AAx.set_xticklabels (labels,rotation=-70,fontsize=12) -Ax.set_title ('Mean Feature measurement by Class', y=1.01) -Ax.legend ([' virginica ', ' setosa ', ' versicolor '])
Note: 8-10 lines of code are sorted by three types from high to low if the reversed order result is incorrect.
Statistical visualization of Seaborn Library
The correlation and characteristics of data can be easily obtained by using Seaborn Library.
Second, prepare
1. Map
Df[' class ' = Df[' class '].map ({' Iris-setosa ': ' SET ', ' iris-virginica ': ' VIR ', ' iris-versicolor ': ' VER '})
Sepal length sepal width petal length petal width class
0 5.1 3.5 1.4 0.2 SET
1 4.9 3.0 1.4 0.2 SET
2 4.7 3.2 1.3 0.2 SET
3 4.6 3.1 1.5 0.2 SET
4 5.0 3.6 1.4 0.2 SET
5 5.4 3.9 1.7 0.4 SET
6 4.6 3.4 1.4 0.3 SET
2. Apply to row or column operations
Applied in a column:df[' width petal '] = df[' petal width '].apply (lambda v:1 if v >=1.3 else 0)
Add a column width petal in df if petal width is more than 1.3 1, otherwise 0. Note that Lambda V is the return value of df[' petal width ')
Apply in Data frame:df[' width area ']=df.apply[lambda r:r[' petal length '] * r[' petal width '], Axis=1]
Note: Axis=1 indicates that a function is applied to a row, and axis=0 represents a function for the column. So the above lambda R returns each row
3, Applymap to all units to perform operations
Df.applymap (Lambda V:np.log (v) if isinstance (v,float) Else v)
4. GROUPBY Group data based on selected column alias
Df.groupby (' class '). Mean ()
The data is categorized by class and given the average value separately
Sepal length sepal width petal length petal width
Class
Iris-setosa 5.006 3.418 1.464 0.244
Iris-versicolor 5.936 2.770 4.260 1.326
Iris-virginica 6.588 2.974 5.552 2.026
Df.groupby (' class '). Describe ()
Data is split by class and descriptive statistics are given separately
Petal length \
Count mean std min 25% 50% 75% max
Class
Iris-setosa 50.0 1.464 0.173511 1.0 1.4 1.50 1.575 1.9
Iris-versicolor 50.0 4.260 0.469911 3.0 4.0 4.35 4.600 5.1
Iris-virginica 50.0 5.552 0.551895 4.5 5.1 5.55 5.875 6.9
Petal width ... sepal length sepal width \
Count mean ... 75% max Count mean
Class ...
Iris-setosa 50.0 0.244 ... 5.2 5.8 50.0 3.418
Iris-versicolor 50.0 1.326 ... 6.3 7.0 50.0 2.770
Iris-virginica 50.0 2.026 ... 6.9 7.9 50.0 2.974
Std min 25% 50% 75% max
Class
Iris-setosa 0.381024 2.3 3.125 3.4 3.675 4.4
Iris-versicolor 0.313798 2.0 2.525 2.8 3.000 3.4
Iris-virginica 0.322497 2.2 2.800 3.0 3.175 3.8
Df.groupby (' class ') [' Petal width '].agg ({' Delta ': Lambda X:x.max ()-x.min (), ' Max ': Np.max, ' min ': np.min})
Delta Max Min
Class
Iris-setosa 0.5 0.6 0.1
Iris-versicolor 0.8 1.8 1.0
Iris-virginica 1.1 2.5 1.4
III. Modelling and evaluation
Here is just a simple use of two functions, detailed content see the following content
Note: This chapter covers a library that you can learn more about:
requests, Pandans, Matplotlib, Seaborn, Statsmodels, Scikit-learn
"Python Machine learning Time Guide"-Python machine learning ecosystem