Analysis of Python data processing

Source: Internet
Author: User
Tags new set sin
This article has shared with you about the Python data processing related content as well as the key explanation, to this knowledge point interested friend may refer to the study.

Numpy, Pandas is the Python data processing often used in two frames, are written in C language, so the speed of operation. Matplotlib is a Python drawing tool that allows you to draw the previously processed data through an image. Just read the grammar before, no system learning summary, this blog summary of the three framework of the API.

Here are the brief descriptions and differences of these three frameworks:

    • Numpy: Often used for data generation and some operations

    • Pandas: Built based on NumPy, is an upgraded version of NumPy

    • Powerful drawing tools in the Matplotlib:python

Numpy

Numpy Quick Start Tutorials can be consulted: Numpy tutorial

NumPy Property

Ndarray.ndim: Dimensions

Ndarray.shape: Number of rows and columns, for example (3, 5)

Ndarray.size: Number of elements

Ndarray.dtype: Element type

NumPy Create

Array (object, Dtype=none): Create data using Python's list or tuple

Zeors (Shape, dtype=float): Create data that is all 0

Ones (Shape, Dtype=none): Create data that is all 1

Empty (Shape, dtype=float): Create data that is not initialized

Arange ([Start,]stop, [Step,]dtype=none): Create a fixed-interval data segment

Linspace (Start, Stop, num=50, Dtype=none): Create data evenly in a given range

NumPy operations

Plus, minus: A + B, a-a

By: B*2, 10*np.sin (a)

Second party: b**2

Judgment: a<35, Output true or false array

Matrix multiplication: Np.dot (A, b) or A.dot

Other: + =,-+, sin, COS, exp

NumPy Index

Array Indexing method: A[1, 1]

Slices: a[1, 1:3]

Iteration: for item in A.flat

NumPy Other

Reshape (A, newshape): Changes the shape of the data, does not modify the original data, returns a new set of data

Resize (A, new_shape): Changes the shape of the data, changes the original data, does not return the data

Ravel (a): Return as one dimension

Vstack (TUP): Merging up and down

Hstack (tup): Merge left and right

Hsplit (ary, indices_or_sections): Horizontal split N Parts

Vsplit (ary, indices_or_sections): Vertically split n Parts

Copy (a): deep copy

Pandas

Pandas Quick Start tutorial for reference: Minutes to Pandas

Pandas data structure

There are two types of data structures in Pandas: series and dataframe.

Series: The index is on the left and the value is on the right. Here's how to create it:

In [4]: s = PD. Series ([1,3,5,np.nan,6,8]) in [5]: Sout[5]: 0  1.01  3.02  5.03 NaN4 6.05  8.0dtype:float64

DataFrame: is a tabular data structure that has both a row index and a column index, which can be seen as a large dictionary of series. Here's how to create it:

In [6]: Dates = pd.date_range (' 20130101 ', periods=6) in [7]: Datesout[7]: Datetimeindex ([' 2013-01-01 ', ' 2013-01-02 ', ' 2013-01-03 ', ' 2013-01-04 ',        ' 2013-01-05 ', ' 2013-01-06 '],       dtype= ' datetime64[ns] ', freq= ' D ') in [8]: df = PD. DataFrame (Np.random.randn (6,4), Index=dates, Columns=list (' ABCD '))

Pandas viewing data

Index: Indexed

Columns: Column index

Values: Value

Head (N=5): Returns the top n data

Tail (n=5): Returns the post-n data

Describe (): print out the number of data, average, etc.

Sort_index (Axis=1, Ascending=false): Sort by index

Sort_values (by= ' B '): Sort by index value

Pandas selecting data

Array selection method: Df[' A ']

Slice selection method: Df[0:3] or df[' 20130102 ': ' 20130104 ']

Select by tag: df.loc[' 20130102 ': ' 20130104 ', [' A ', ' B ']

Select by location: Df.iloc[3:5,0:2]

Mixed selection: df.ix[:3,[' A ', ' C ']

Condition judgment Choice: DF[DF. A > 0]

Pandas processing lost data

Delete the missing data row: Df.dropna (how= ' any ')

Populating lost Data: Df.fillna (value=5)

Whether the data value is NaN:pd.isna (DF1)

Pandas merging data

Pd.concat ([Df1, DF2, Df3], axis=0): Merging DF

Pd.merge (left, right, on= ' key '): Merge according to key field

Df.append (S, ignore_index=true): Adding data

Pandas Import and Export

Df.to_csv (' Foo.csv '): Save to CSV file

Pd.read_csv (' Foo.csv '): Read from CSV file

Df.to_excel (' foo.xlsx ', sheet_name= ' Sheet1 '): Save to Excel file

Pd.read_excel (' foo.xlsx ', ' Sheet1 ', Index_col=none, na_values=[' na '): read from an Excel file

Matplotlib

Here are just the simplest ways to plot:

Import pandas as Pdimport NumPy as Npimport Matplotlib.pyplot as plt# randomly generates 1000 data = PD. Series (Np.random.randn, Index=np.arange (1000)) # to make it easier to see the effect, we add this data data.cumsum () # Pandas Data can be viewed directly in its visual form Data.plot () plt.show ()

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.