"Python Machine learning Time Guide"-Python machine learning ecosystem

Source: Internet
Author: User

This article focuses on the contents of the 1.2Python libraries and functions in the first chapter of the Python Machine learning Time Guide. Learn the workflow of machine learning.

I. Acquisition and inspection of data

Requests getting data

Pandans processing Data

1 ImportOS2 ImportPandas as PD3 ImportRequests4 5PATH = R'E:/python machine learning blueprints/chap1/1.2/'6R = Requests.get ('Https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data')7With open (PATH +'Iris.data','W') as F:8 f.write (R.text)9 Os.chdir (PATH)TenDF = pd.read_csv (PATH +'Iris.data', names=['sepal Length','sepal Width','Petal Length','Petal Width','class']) OneDf.head ()

Note: 1, requests library for access to data API interaction excuse, pandas is a data analysis tool, both can be dedicated to follow-up research

2, read_csv after the path name can not contain Chinese, otherwise will be reported oserror:initializing from file failed.

The results of the above program operation:

Sepal length sepal width petal length petal width class
0 5.1 3.5 1.4 0.2 Iris-setosa
1 4.9 3.0 1.4 0.2 Iris-setosa
2 4.7 3.2 1.3 0.2 Iris-setosa
3 4.6 3.1 1.5 0.2 iris-setosa
4 5.0 3.6 1.4 0.2 Iris-setosa
5 5.4 3.9 1.7 0.4 Iris-setosa
6 4.6 3.4 1.4 0.3 Iris-setosa
7 5.0 3.4 1.5 0.2 iris-setosa ...

Notable actions in the book for DF Processing:

(1) Filter:

df[(df[' class '] = = ' Iris-setosa ') & (df[' petal width ']>2.2)]

Sepal length sepal width petal length petal width class
6.3 3.3 6.0 2.5 Iris-virginica
109 7.2 3.6 6.1 2.5 Iris-virginica
5.8 2.8 5.1 2.4 Iris-virginica
6.4 3.2 5.3 2.3 Iris-virginica
118 7.7 2.6 6.9 2.3 Iris-virginica
6.9 3.2 5.7 2.3 Iris-virginica
135 7.7 3.0 6.1 2.3 Iris-virginica
136 6.3 3.4 5.6 2.4 Iris-virginica
6.7 3.1 5.6 2.4 Iris-virginica
141 6.9 3.1 5.1 2.3 Iris-virginica
143 6.8 3.2 5.9 2.3 Iris-virginica
144 6.7 3.3 5.7 2.5 Iris-virginica
145 6.7 3.0 5.2 2.3 Iris-virginica
148 6.2 3.4 5.4 2.3 Iris-virginica

(2) Re-save the filtered data and reset the index

VIRGINICADF = df[(df[' class '] = = ' Iris-virginica ') & (df[' petal width ']>2.2)].reset_index (drop=true)

(3) To obtain the statistic information of each column

Df.describe ()

(4) per row-column correlation coefficient

Df.corr ()

Matplotlib Plotting data

(1) Bar chart (hist)

1 ImportMatplotlib.pyplot as Plt2Plt.style.use ('Ggplot')3 #%matplotlib Inline4 ImportNumPy as NP5 6Fig,ax = Plt.subplots (figsize= (6,4))7Ax.hist (df['Petal Width'], color='Black')8Ax.set_ylabel ('Count', fontsize=12)9Ax.set_xlabel ('Width', fontsize=12)TenPlt.title ('Iris Petal Width', Fontsize=14, y=1.01) OnePlt.show ()

Note:%matplotlib inline is ipython in the statement, first temporarily sealed off, figsize= (6,4) do not miss out =. To show, add the show () function at the end

(2) Scatter chart (scatter)

1Fig,ax = Plt.subplots (figsize= (6,6))2Ax.scatter (df['Petal Width'],df['Petal Length'], color='Green')3Ax.set_ylabel ('Petal Width', fontsize=12)4Ax.set_xlabel ('Petal Length', fontsize=12)5Plt.title ('Petal Scatterplot')6Plt.show ()

(3) Direct Line drawing (plot)

1Fig,ax = Plt.subplots (figsize= (6,6))2Ax.scatter (df['Petal Width'],df['Petal Length'], color='Green')3Ax.set_ylabel ('Petal Width', fontsize=12)4Ax.set_xlabel ('Petal Length', fontsize=12)5Plt.title ('Petal Scatterplot')6Plt.show ()

(4) Stacked bar chart

1Fig,ax = Plt.subplots (figsize= (6,6))2Bar_width =. 83Labels = [x forXinchDf.columnsif 'length' inchXor 'width' inchx]4ver_y = [df[df['class']=='Iris-versicolor'][x].mean () forXinchLabels]5vir_y = [df[df['class']=='Iris-virginica'][x].mean () forXinchLabels]6set_y = [df[df['class']=='Iris-setosa'][x].mean () forXinchLabels]7x =Np.arange (len (labels))8Ax.bar (x,vir_y,bar_width,bottom=set_y,color='Darkgrey')9Ax.bar (x,set_y,bar_width,bottom=ver_y,color=' White')TenAx.bar (x,ver_y,bar_width,color='Black') OneAx.set_xticks (x+ (BAR_WIDTH/2)) AAx.set_xticklabels (labels,rotation=-70,fontsize=12) -Ax.set_title ('Mean Feature measurement by Class', y=1.01) -Ax.legend ([' virginica ', ' setosa ', ' versicolor '])

Note: 8-10 lines of code are sorted by three types from high to low if the reversed order result is incorrect.

Statistical visualization of Seaborn Library

The correlation and characteristics of data can be easily obtained by using Seaborn Library.

Second, prepare

1. Map

Df[' class ' = Df[' class '].map ({' Iris-setosa ': ' SET ', ' iris-virginica ': ' VIR ', ' iris-versicolor ': ' VER '})

Sepal length sepal width petal length petal width class
0 5.1 3.5 1.4 0.2 SET
1 4.9 3.0 1.4 0.2 SET
2 4.7 3.2 1.3 0.2 SET
3 4.6 3.1 1.5 0.2 SET
4 5.0 3.6 1.4 0.2 SET
5 5.4 3.9 1.7 0.4 SET
6 4.6 3.4 1.4 0.3 SET

2. Apply to row or column operations

Applied in a column:df[' width petal '] = df[' petal width '].apply (lambda v:1 if v >=1.3 else 0)

Add a column width petal in df if petal width is more than 1.3 1, otherwise 0. Note that Lambda V is the return value of df[' petal width ')

Apply in Data frame:df[' width area ']=df.apply[lambda r:r[' petal length '] * r[' petal width '], Axis=1]

Note: Axis=1 indicates that a function is applied to a row, and axis=0 represents a function for the column. So the above lambda R returns each row

3, Applymap to all units to perform operations

Df.applymap (Lambda V:np.log (v) if isinstance (v,float) Else v)

4. GROUPBY Group data based on selected column alias

Df.groupby (' class '). Mean ()

The data is categorized by class and given the average value separately

Sepal length sepal width petal length petal width
Class
Iris-setosa 5.006 3.418 1.464 0.244
Iris-versicolor 5.936 2.770 4.260 1.326
Iris-virginica 6.588 2.974 5.552 2.026

Df.groupby (' class '). Describe ()

Data is split by class and descriptive statistics are given separately

Petal length \
Count mean std min 25% 50% 75% max
Class
Iris-setosa 50.0 1.464 0.173511 1.0 1.4 1.50 1.575 1.9
Iris-versicolor 50.0 4.260 0.469911 3.0 4.0 4.35 4.600 5.1
Iris-virginica 50.0 5.552 0.551895 4.5 5.1 5.55 5.875 6.9

Petal width ... sepal length sepal width \
Count mean ... 75% max Count mean
Class ...
Iris-setosa 50.0 0.244 ... 5.2 5.8 50.0 3.418
Iris-versicolor 50.0 1.326 ... 6.3 7.0 50.0 2.770
Iris-virginica 50.0 2.026 ... 6.9 7.9 50.0 2.974


Std min 25% 50% 75% max
Class
Iris-setosa 0.381024 2.3 3.125 3.4 3.675 4.4
Iris-versicolor 0.313798 2.0 2.525 2.8 3.000 3.4
Iris-virginica 0.322497 2.2 2.800 3.0 3.175 3.8

Df.groupby (' class ') [' Petal width '].agg ({' Delta ': Lambda X:x.max ()-x.min (), ' Max ': Np.max, ' min ': np.min})

Delta Max Min
Class
Iris-setosa 0.5 0.6 0.1
Iris-versicolor 0.8 1.8 1.0
Iris-virginica 1.1 2.5 1.4

III. Modelling and evaluation

Here is just a simple use of two functions, detailed content see the following content

Note: This chapter covers a library that you can learn more about:

requests, Pandans, Matplotlib, Seaborn, Statsmodels, Scikit-learn

"Python Machine learning Time Guide"-Python machine learning ecosystem

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.