Organize pandas operations

Source: Internet
Author: User
Tags join json sort first row
Organize Pandas Operations

This article original, reproduced please identify the source: http://www.cnblogs.com/xiaoxuebiye/p/7223774.html

Import Data:

Pd.read_csv (filename): Import data from CSV file
pd.read_table (filename): Import data from a delimited text file
pd.read_excel (filename) : Importing data from an Excel file
pd.read_sql (query, Connection_object): Importing data from SQL Tables/Libraries
Pd.read_json (json_string) : Import data from JSON-formatted string
pd.read_html (URL): Parse URL, string, or HTML file, extract the Tables form
pd.read_clipboard (): Get content from your clipboard, and passed to Read_table ()
PD. DataFrame (dict): Import data from a Dictionary object, key is the column name, value is the data

Export Data :

Df.to_csv (filename): Export data to CSV file
df.to_excel (filename): Export data to Excel file
df.to_sql (table_name, Connection_ Object): Export data to SQL table
Df.to_json (filename): Export data to a text file in JSON format

To create a test object :

Pd. DataFrame (Np.random.rand (20,5)): Creates a DataFrame object consisting of 20 rows of 5 columns of random number
PD. Series (my_list): Create a Series object from an iterative object my_list
df.index = Pd.date_range (' 1900/1/30 ', periods=df.shape[0]): Add a Date Index

view, check data :

Df.head (n): View the first n rows of the Dataframe object
df.tail (n): View the last n rows of the Dataframe object
df.shape (): View the number of rows and columns//
df.info () : View index, data type, and memory information
df.describe (): View summary statistics for numeric columns
s.value_counts (dropna=false): View unique values and counts for a series object
df.apply ( Pd. series.value_counts): View unique values and counts for each column in the Dataframe object

Data selection :

Df[col]: Based on column name and return column as series
df[[col1, col2]: Return multiple columns in dataframe form
s.iloc[0]: Select data by Location s.loc[
' Index_ One ']: Select data by index
df.iloc[0,:]: Return first row
df.iloc[0,0]: Returns the first element of the first column

Data Statistics :

Df.describe (): View summary statistics for data value columns
Df.mean (): Returns the mean value of all columns
Df.corr (): Returns the correlation coefficient between columns
df.count (): Returns the number
of non-null values in each column Df.max (): Returns the maximum value of each column
df.min (): Returns the minimum value for each column
Df.median (): Returns the median of each column
DF.STD (): Returns the standard deviation for each column

Data Merge :

Df1.append (DF2): Adds rows from DF2 to Df1 's tail
df.concat ([Df1, Df2],axis=1): Adds columns from DF2 to DF1 's tail
df1.join (df2,on=col1, how= ' inner '): Execute SQL Form join for columns of df1 and columns of DF2

Data processing :

Df[df[col] > 0.5]: Select rows with a COL column value greater than 0.5
df.sort_values (col1): Sort data by Column col1, by default ascending
df.sort_values (col2, Ascending=false): Sort data in descending order by column col1
df.sort_values ([Col1,col2], Ascending=[true,false]): Sort by column col1 in ascending order, followed by col2 in descending order
Df.groupby (COL): Returns a GroupBy object grouped by column Col
Df.groupby ([col1,col2]): Returns a GroupBy object grouped by multiple columns
df.groupby ( col1) [col2]: Returns the mean value
df.pivot_table (Index=col1, values=[col2,col3], Aggfunc=max) for column col2 after grouping by column col1 : Create a pivot table Df.groupby (col1) that groups col1 by column and calculates the maximum values for col2 and col3
. Agg (Np.mean): Returns the mean value of all columns grouped by column col1
( Np.mean): Apply function Np.mean data.apply (Np.max,axis=1) to each column in Dataframe
: Apply function to each row in Dataframe Np.max

Data Cleansing :

Df[df[col] > 0.5]: Select rows with a COL column value greater than 0.5
df.sort_values (col1): Sort data by Column col1, by default ascending
df.sort_values (col2, Ascending=false): Sort data in descending order by column col1
df.sort_values ([Col1,col2], Ascending=[true,false]): Sort by column col1 in ascending order, followed by col2 in descending order
Df.groupby (COL): Returns a GroupBy object grouped by column Col
Df.groupby ([col1,col2]): Returns a GroupBy object grouped by multiple columns
df.groupby ( col1) [col2]: Returns the mean value
df.pivot_table (Index=col1, values=[col2,col3], Aggfunc=max) for column col2 after grouping by column col1 : Create a pivot table Df.groupby (col1) that groups col1 by column and calculates the maximum values for col2 and col3
. Agg (Np.mean): Returns the mean value of all columns grouped by column col1
( Np.mean): Apply function Np.mean data.apply (Np.max,axis=1) to each column in Dataframe
: Apply function to each row in Dataframe Np.max

Other operations:

Change column name:

Method 1
a.columns = [' A ', ' B ', ' C ']

Method 2
a.rename (columns={' a ': ' A ', ' B ': ' B ', ' C ': ' C '}, InPlace = True)

inserting rows and columns

Http://www.jianshu.com/p/7df2593a01ce

Related Reference links :

Reference
http://www.qingpingshan.com/rjbc/dashuju/228593.html

10 minutes to fix
http://python.jobbole.com/84416/.

Official document
http://pandas.pydata.org/pandas-docs/stable/index.html

operation index
https://www.dataquest.io/ Blog/images/cheat-sheets/pandas-cheat-sheet.pdf

Advanced

fetch Number (element):

Take a specific data in DF
iloc     Index Locate 
example:
    print df.iloc[0,0]
    print df.iloc[1, 1]
    print DF.ILOC[19, 7]

if DF is a date index +  a,b,c column name

Loc      Locate

df.loc[' 2017-01-01 ', ' a '  ]

Fetch number (line):

    One_row = df.iloc[4]
    one_row2 = df.loc[' 2013-01-02 '
    ) print type (One_row)


takes a row, after which the data type is   Series
can  one_row.iloc[1], and then access the data in the Series

    print one_row.iloc[1]
    print one_row.loc[' A ']

Fetch number (column):

Column2 = Df[' A ']

Column2 is

the data in a series type print type (COLUMN2) accessible

column
    print column2[0]
    print column2[' 2013-01-03 ']

fetch number (slice):

Row mode slice

    dfsub1 = df.iloc[4:5]
    print type (DFSUB1)
    print  dfsub1

    dfsub2 = df.loc[' 2013-01-03 ': ' 2013-01-05 ']
    print  dfsub2

The result of the slice is DF,  and changing the dfsub will change the DF at the same time


----------------------------- --------------------
column mode 
    print ' Get sub by  column mode '
    dfsub = df[[' A ', ' B ']]
    print type ( dfsub)
    print  dfsub


-------------------------------------------------
subset 
row  x  column
mode one:
    print ' Get sub by  row  X column  mode '
    dfsub = df.loc[' 20130102 ': ' 20130104 ', [' A ', ' B ']]
    print type (dfsub)
    print  dfsub
mode two

dfsub = df.iloc[1:3, 1:3]
--- ----------------------------------------------

Fetch number (conditional slice):

Dfsub =  df[  (DF. A > 0) and  (DF. B > 0)  ] The
result type is DF

This thing really wants the select where in the database to 

send a special condition

print df[df > 0]

Line traversal:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.