This article has shared with you about the Python data processing related content as well as the key explanation, to this knowledge point interested friend may refer to the study.
Numpy, Pandas is the Python data processing often used in two frames, are written in C language, so the speed of operation. Matplotlib is a Python drawing tool that allows you to draw the previously processed data through an image. Just read the grammar before, no system learning summary, this blog summary of the three framework of the API.
Here are the brief descriptions and differences of these three frameworks:
Numpy: Often used for data generation and some operations
Pandas: Built based on NumPy, is an upgraded version of NumPy
Powerful drawing tools in the Matplotlib:python
Numpy
Numpy Quick Start Tutorials can be consulted: Numpy tutorial
NumPy Property
Ndarray.ndim: Dimensions
Ndarray.shape: Number of rows and columns, for example (3, 5)
Ndarray.size: Number of elements
Ndarray.dtype: Element type
NumPy Create
Array (object, Dtype=none): Create data using Python's list or tuple
Zeors (Shape, dtype=float): Create data that is all 0
Ones (Shape, Dtype=none): Create data that is all 1
Empty (Shape, dtype=float): Create data that is not initialized
Arange ([Start,]stop, [Step,]dtype=none): Create a fixed-interval data segment
Linspace (Start, Stop, num=50, Dtype=none): Create data evenly in a given range
NumPy operations
Plus, minus: A + B, a-a
By: B*2, 10*np.sin (a)
Second party: b**2
Judgment: a<35, Output true or false array
Matrix multiplication: Np.dot (A, b) or A.dot
Other: + =,-+, sin, COS, exp
NumPy Index
Array Indexing method: A[1, 1]
Slices: a[1, 1:3]
Iteration: for item in A.flat
NumPy Other
Reshape (A, newshape): Changes the shape of the data, does not modify the original data, returns a new set of data
Resize (A, new_shape): Changes the shape of the data, changes the original data, does not return the data
Ravel (a): Return as one dimension
Vstack (TUP): Merging up and down
Hstack (tup): Merge left and right
Hsplit (ary, indices_or_sections): Horizontal split N Parts
Vsplit (ary, indices_or_sections): Vertically split n Parts
Copy (a): deep copy
Pandas
Pandas Quick Start tutorial for reference: Minutes to Pandas
Pandas data structure
There are two types of data structures in Pandas: series and dataframe.
Series: The index is on the left and the value is on the right. Here's how to create it:
In [4]: s = PD. Series ([1,3,5,np.nan,6,8]) in [5]: Sout[5]: 0 1.01 3.02 5.03 NaN4 6.05 8.0dtype:float64
DataFrame: is a tabular data structure that has both a row index and a column index, which can be seen as a large dictionary of series. Here's how to create it:
In [6]: Dates = pd.date_range (' 20130101 ', periods=6) in [7]: Datesout[7]: Datetimeindex ([' 2013-01-01 ', ' 2013-01-02 ', ' 2013-01-03 ', ' 2013-01-04 ', ' 2013-01-05 ', ' 2013-01-06 '], dtype= ' datetime64[ns] ', freq= ' D ') in [8]: df = PD. DataFrame (Np.random.randn (6,4), Index=dates, Columns=list (' ABCD '))
Pandas viewing data
Index: Indexed
Columns: Column index
Values: Value
Head (N=5): Returns the top n data
Tail (n=5): Returns the post-n data
Describe (): print out the number of data, average, etc.
Sort_index (Axis=1, Ascending=false): Sort by index
Sort_values (by= ' B '): Sort by index value
Pandas selecting data
Array selection method: Df[' A ']
Slice selection method: Df[0:3] or df[' 20130102 ': ' 20130104 ']
Select by tag: df.loc[' 20130102 ': ' 20130104 ', [' A ', ' B ']
Select by location: Df.iloc[3:5,0:2]
Mixed selection: df.ix[:3,[' A ', ' C ']
Condition judgment Choice: DF[DF. A > 0]
Pandas processing lost data
Delete the missing data row: Df.dropna (how= ' any ')
Populating lost Data: Df.fillna (value=5)
Whether the data value is NaN:pd.isna (DF1)
Pandas merging data
Pd.concat ([Df1, DF2, Df3], axis=0): Merging DF
Pd.merge (left, right, on= ' key '): Merge according to key field
Df.append (S, ignore_index=true): Adding data
Pandas Import and Export
Df.to_csv (' Foo.csv '): Save to CSV file
Pd.read_csv (' Foo.csv '): Read from CSV file
Df.to_excel (' foo.xlsx ', sheet_name= ' Sheet1 '): Save to Excel file
Pd.read_excel (' foo.xlsx ', ' Sheet1 ', Index_col=none, na_values=[' na '): read from an Excel file
Matplotlib
Here are just the simplest ways to plot:
Import pandas as Pdimport NumPy as Npimport Matplotlib.pyplot as plt# randomly generates 1000 data = PD. Series (Np.random.randn, Index=np.arange (1000)) # to make it easier to see the effect, we add this data data.cumsum () # Pandas Data can be viewed directly in its visual form Data.plot () plt.show ()