Analysis of Python data processing

Last Update:2018-05-02 Source: Internet

Author: User

Tags new set sin

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article has shared with you about the Python data processing related content as well as the key explanation, to this knowledge point interested friend may refer to the study.

Numpy, Pandas is the Python data processing often used in two frames, are written in C language, so the speed of operation. Matplotlib is a Python drawing tool that allows you to draw the previously processed data through an image. Just read the grammar before, no system learning summary, this blog summary of the three framework of the API.

Here are the brief descriptions and differences of these three frameworks:

Numpy: Often used for data generation and some operations
Pandas: Built based on NumPy, is an upgraded version of NumPy
Powerful drawing tools in the Matplotlib:python

Numpy

Numpy Quick Start Tutorials can be consulted: Numpy tutorial

NumPy Property

Ndarray.ndim: Dimensions

Ndarray.shape: Number of rows and columns, for example (3, 5)

Ndarray.size: Number of elements

Ndarray.dtype: Element type

NumPy Create

Array (object, Dtype=none): Create data using Python's list or tuple

Zeors (Shape, dtype=float): Create data that is all 0

Ones (Shape, Dtype=none): Create data that is all 1

Empty (Shape, dtype=float): Create data that is not initialized

Arange ([Start,]stop, [Step,]dtype=none): Create a fixed-interval data segment

Linspace (Start, Stop, num=50, Dtype=none): Create data evenly in a given range

NumPy operations

Plus, minus: A + B, a-a

By: B*2, 10*np.sin (a)

Second party: b**2

Judgment: a<35, Output true or false array

Matrix multiplication: Np.dot (A, b) or A.dot

Other: + =,-+, sin, COS, exp

NumPy Index

Array Indexing method: A[1, 1]

Slices: a[1, 1:3]

Iteration: for item in A.flat

NumPy Other

Reshape (A, newshape): Changes the shape of the data, does not modify the original data, returns a new set of data

Resize (A, new_shape): Changes the shape of the data, changes the original data, does not return the data

Ravel (a): Return as one dimension

Vstack (TUP): Merging up and down

Hstack (tup): Merge left and right

Hsplit (ary, indices_or_sections): Horizontal split N Parts

Vsplit (ary, indices_or_sections): Vertically split n Parts

Copy (a): deep copy

Pandas

Pandas Quick Start tutorial for reference: Minutes to Pandas

Pandas data structure

There are two types of data structures in Pandas: series and dataframe.

Series: The index is on the left and the value is on the right. Here's how to create it:

In [4]: s = PD. Series ([1,3,5,np.nan,6,8]) in [5]: Sout[5]: 0  1.01  3.02  5.03 NaN4 6.05  8.0dtype:float64

DataFrame: is a tabular data structure that has both a row index and a column index, which can be seen as a large dictionary of series. Here's how to create it:

In [6]: Dates = pd.date_range (' 20130101 ', periods=6) in [7]: Datesout[7]: Datetimeindex ([' 2013-01-01 ', ' 2013-01-02 ', ' 2013-01-03 ', ' 2013-01-04 ',        ' 2013-01-05 ', ' 2013-01-06 '],       dtype= ' datetime64[ns] ', freq= ' D ') in [8]: df = PD. DataFrame (Np.random.randn (6,4), Index=dates, Columns=list (' ABCD '))

Pandas viewing data

Index: Indexed

Columns: Column index

Values: Value

Head (N=5): Returns the top n data

Tail (n=5): Returns the post-n data

Describe (): print out the number of data, average, etc.

Sort_index (Axis=1, Ascending=false): Sort by index

Sort_values (by= ' B '): Sort by index value

Pandas selecting data

Array selection method: Df[' A ']

Slice selection method: Df[0:3] or df[' 20130102 ': ' 20130104 ']

Select by tag: df.loc[' 20130102 ': ' 20130104 ', [' A ', ' B ']

Select by location: Df.iloc[3:5,0:2]

Mixed selection: df.ix[:3,[' A ', ' C ']

Condition judgment Choice: DF[DF. A > 0]

Pandas processing lost data

Delete the missing data row: Df.dropna (how= ' any ')

Populating lost Data: Df.fillna (value=5)

Whether the data value is NaN:pd.isna (DF1)

Pandas merging data

Pd.concat ([Df1, DF2, Df3], axis=0): Merging DF

Pd.merge (left, right, on= ' key '): Merge according to key field

Df.append (S, ignore_index=true): Adding data

Pandas Import and Export

Df.to_csv (' Foo.csv '): Save to CSV file

Pd.read_csv (' Foo.csv '): Read from CSV file

Df.to_excel (' foo.xlsx ', sheet_name= ' Sheet1 '): Save to Excel file

Pd.read_excel (' foo.xlsx ', ' Sheet1 ', Index_col=none, na_values=[' na '): read from an Excel file

Matplotlib

Here are just the simplest ways to plot:

Import pandas as Pdimport NumPy as Npimport Matplotlib.pyplot as plt# randomly generates 1000 data = PD. Series (Np.random.randn, Index=np.arange (1000)) # to make it easier to see the effect, we add this data data.cumsum () # Pandas Data can be viewed directly in its visual form Data.plot () plt.show ()

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More