Python machine learning time Guide-python machine learning ecosystem

Last Update:2017-08-13 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article focuses on the contents of the 1.2Python libraries and functions in the first chapter of the Python machine learning time Guide. Learn the workflow of machine Learning.

I. Acquisition and inspection of data

Requests getting data

Pandans processing Data

1 ImportOS2 ImportPandas as PD3 ImportRequests4 5PATH = R'E:/python machine learning blueprints/chap1/1.2/'6R = Requests.get ('Https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data')7With open (PATH +'Iris.data','W') as F:8 F.write (r.text)9 Os.chdir (PATH)TenDF = pd.read_csv (PATH +'Iris.data', names=['sepal Length','sepal Width','Petal Length','Petal Width','class']) oneDf.head ()

Note: 1, requests Library for access to data API interaction excuse, Pandas is a data analysis tool, both can be dedicated to follow-up research

2, read_csv After the path name can not contain chinese, otherwise will be reported oserror:initializing from file Failed.

The results of the above program Operation:

Sepal length sepal width petal length petal width class
0 5.1 3.5 1.4 0.2 Iris-setosa
1 4.9 3.0 1.4 0.2 Iris-setosa
2 4.7 3.2 1.3 0.2 Iris-setosa
3 4.6 3.1 1.5 0.2 iris-setosa
4 5.0 3.6 1.4 0.2 Iris-setosa
5 5.4 3.9 1.7 0.4 Iris-setosa
6 4.6 3.4 1.4 0.3 Iris-setosa
7 5.0 3.4 1.5 0.2 iris-setosa ...

Notable actions in the book for DF Processing:

(1) Filter:

df[(df[' class '] = = ' Iris-setosa ') & (df[' Petal width ']>2.2)]

Sepal length sepal width petal length petal width class
6.3 3.3 6.0 2.5 Iris-virginica
109 7.2 3.6 6.1 2.5 Iris-virginica
5.8 2.8 5.1 2.4 Iris-virginica
6.4 3.2 5.3 2.3 Iris-virginica
118 7.7 2.6 6.9 2.3 Iris-virginica
6.9 3.2 5.7 2.3 Iris-virginica
135 7.7 3.0 6.1 2.3 Iris-virginica
136 6.3 3.4 5.6 2.4 Iris-virginica
6.7 3.1 5.6 2.4 Iris-virginica
141 6.9 3.1 5.1 2.3 Iris-virginica
143 6.8 3.2 5.9 2.3 Iris-virginica
144 6.7 3.3 5.7 2.5 Iris-virginica
145 6.7 3.0 5.2 2.3 Iris-virginica
148 6.2 3.4 5.4 2.3 Iris-virginica

(2) re-save the filtered data and reset the index

VIRGINICADF = df[(df[' Class '] = = ' Iris-virginica ') & (df[' Petal width ']>2.2)].reset_index (drop=true)

(3) to obtain the statistic information of each column

Df.describe ()

(4) per row-column correlation coefficient

Df.corr ()

Matplotlib Plotting data

(1) Bar Chart (hist)

1 ImportMatplotlib.pyplot as Plt2Plt.style.use ('Ggplot')3 #%matplotlib Inline4 ImportNumPy as NP5 6Fig,ax = Plt.subplots (figsize= (6,4))7Ax.hist (df['Petal Width'], color='Black')8Ax.set_ylabel ('Count', fontsize=12)9Ax.set_xlabel ('Width', fontsize=12)TenPlt.title ('Iris Petal Width', fontsize=14, y=1.01) onePlt.show ()

Note:%matplotlib Inline is Ipython in the statement, first temporarily sealed off, figsize= (6,4) do not miss out =. To show, add the show () function at the end

(2) Scatter Chart (scatter)

1Fig,ax = Plt.subplots (figsize= (6,6))2Ax.scatter (df['Petal Width'],df['Petal Length'], color='Green')3Ax.set_ylabel ('Petal Width', fontsize=12)4Ax.set_xlabel ('Petal Length', fontsize=12)5Plt.title ('Petal Scatterplot')6Plt.show ()

(3) Direct Line Drawing (plot)

1Fig,ax = Plt.subplots (figsize= (6,6))2Ax.scatter (df['Petal Width'],df['Petal Length'], color='Green')3Ax.set_ylabel ('Petal Width', fontsize=12)4Ax.set_xlabel ('Petal Length', fontsize=12)5Plt.title ('Petal Scatterplot')6Plt.show ()

(4) Stacked bar Chart

1Fig,ax = Plt.subplots (figsize= (6,6))2Bar_width =. 83Labels = [x forXinchDf.columnsif 'length' inchXor 'width' inchx]4Ver_y = [df[df['class']=='Iris-versicolor'][x].mean () forXinchlabels]5Vir_y = [df[df['class']=='Iris-virginica'][x].mean () forXinchlabels]6Set_y = [df[df['class']=='Iris-setosa'][x].mean () forXinchlabels]7x =np.arange (len (LABELS))8Ax.bar (x,vir_y,bar_width,bottom=set_y,color='Darkgrey')9Ax.bar (x,set_y,bar_width,bottom=ver_y,color=' white')TenAx.bar (x,ver_y,bar_width,color='Black') oneAx.set_xticks (x+ (BAR_WIDTH/2)) aAx.set_xticklabels (labels,rotation=-70,fontsize=12) -Ax.set_title ('Mean Feature measurement by Class', y=1.01) -Ax.legend ([' virginica ', ' setosa ', ' versicolor '])

Note: 8-10 lines of code are sorted by three types from high to low if the reversed order result is Incorrect.

Statistical visualization of Seaborn Library

The correlation and characteristics of data can be easily obtained by using Seaborn Library.

second, Prepare

1. Map

df[' class ' = df[' class '].map ({' iris-setosa ': ' SET ', ' Iris-virginica ': ' VIR ', ' iris-versicolor ': ' VER '})

Sepal length sepal width petal length petal width class
0 5.1 3.5 1.4 0.2 SET
1 4.9 3.0 1.4 0.2 SET
2 4.7 3.2 1.3 0.2 SET
3 4.6 3.1 1.5 0.2 SET
4 5.0 3.6 1.4 0.2 SET
5 5.4 3.9 1.7 0.4 SET
6 4.6 3.4 1.4 0.3 SET

2. Apply to row or column operations

Applied in a column:df[' width Petal '] = df[' Petal width '].apply (lambda v:1 if v >=1.3 else 0)

Add a column width petal in df if petal width is more than 1.3 1, otherwise 0. Note that Lambda V is the return value of df[' petal width ')

Apply in Data frame:df[' width area ']=df.apply[lambda r:r[' petal Length '] * r[' petal width '], axis=1]

Note: Axis=1 indicates that a function is applied to a row, and axis=0 represents a function for the Column. So the above lambda R returns each row

3, Applymap to all units to perform operations

Df.applymap (lambda v:np.log (v) if isinstance (v,float) else v)

4. GROUPBY Group data based on selected column alias

Df.groupby (' class '). mean ()

The data is categorized by class and given the average value separately

Sepal length sepal width petal length petal width
Class
Iris-setosa 5.006 3.418 1.464 0.244
Iris-versicolor 5.936 2.770 4.260 1.326
Iris-virginica 6.588 2.974 5.552 2.026

Df.groupby (' class '). describe ()

Data is split by class and descriptive statistics are given separately

Petal length \
Count mean std min 25% 50% 75% max
Class
Iris-setosa 50.0 1.464 0.173511 1.0 1.4 1.50 1.575 1.9
Iris-versicolor 50.0 4.260 0.469911 3.0 4.0 4.35 4.600 5.1
Iris-virginica 50.0 5.552 0.551895 4.5 5.1 5.55 5.875 6.9

Petal width ... sepal length sepal width \
Count mean ... 75% max Count mean
Class ...
Iris-setosa 50.0 0.244 ... 5.2 5.8 50.0 3.418
Iris-versicolor 50.0 1.326 ... 6.3 7.0 50.0 2.770
Iris-virginica 50.0 2.026 ... 6.9 7.9 50.0 2.974

Std min 25% 50% 75% max
Class
Iris-setosa 0.381024 2.3 3.125 3.4 3.675 4.4
Iris-versicolor 0.313798 2.0 2.525 2.8 3.000 3.4
Iris-virginica 0.322497 2.2 2.800 3.0 3.175 3.8

Df.groupby (' class ') [' petal width '].agg ({' Delta ': Lambda x:x.max ()-x.min (), ' max ': np.max, ' min ': Np.min})

Delta Max Min
Class
Iris-setosa 0.5 0.6 0.1
Iris-versicolor 0.8 1.8 1.0
Iris-virginica 1.1 2.5 1.4

III. Modelling and evaluation

Here is just a simple use of two functions, detailed content see the following content

Note: This chapter covers a library that you can learn more about:

requests, pandans, matplotlib, seaborn, statsmodels, scikit-learn

Python machine learning time Guide-python machine learning ecosystem

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Python machine learning time Guide-python machine learning ecosystem

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Python machine learning time Guide-python machine learning ecosystem

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support