Organize Pandas Operations
This article original, reproduced please identify the source: http://www.cnblogs.com/xiaoxuebiye/p/7223774.html
Import Data:
Pd.read_csv (filename): Import data from CSV file
pd.read_table (filename): Import data from a delimited text file
pd.read_excel (filename) : Importing data from an Excel file
pd.read_sql (query, Connection_object): Importing data from SQL Tables/Libraries
Pd.read_json (json_string) : Import data from JSON-formatted string
pd.read_html (URL): Parse URL, string, or HTML file, extract the Tables form
pd.read_clipboard (): Get content from your clipboard, and passed to Read_table ()
PD. DataFrame (dict): Import data from a Dictionary object, key is the column name, value is the data
Export Data :
Df.to_csv (filename): Export data to CSV file
df.to_excel (filename): Export data to Excel file
df.to_sql (table_name, Connection_ Object): Export data to SQL table
Df.to_json (filename): Export data to a text file in JSON format
To create a test object :
Pd. DataFrame (Np.random.rand (20,5)): Creates a DataFrame object consisting of 20 rows of 5 columns of random number
PD. Series (my_list): Create a Series object from an iterative object my_list
df.index = Pd.date_range (' 1900/1/30 ', periods=df.shape[0]): Add a Date Index
view, check data :
Df.head (n): View the first n rows of the Dataframe object
df.tail (n): View the last n rows of the Dataframe object
df.shape (): View the number of rows and columns//
df.info () : View index, data type, and memory information
df.describe (): View summary statistics for numeric columns
s.value_counts (dropna=false): View unique values and counts for a series object
df.apply ( Pd. series.value_counts): View unique values and counts for each column in the Dataframe object
Data selection :
Df[col]: Based on column name and return column as series
df[[col1, col2]: Return multiple columns in dataframe form
s.iloc[0]: Select data by Location s.loc[
' Index_ One ']: Select data by index
df.iloc[0,:]: Return first row
df.iloc[0,0]: Returns the first element of the first column
Data Statistics :
Df.describe (): View summary statistics for data value columns
Df.mean (): Returns the mean value of all columns
Df.corr (): Returns the correlation coefficient between columns
df.count (): Returns the number
of non-null values in each column Df.max (): Returns the maximum value of each column
df.min (): Returns the minimum value for each column
Df.median (): Returns the median of each column
DF.STD (): Returns the standard deviation for each column
Data Merge :
Df1.append (DF2): Adds rows from DF2 to Df1 's tail
df.concat ([Df1, Df2],axis=1): Adds columns from DF2 to DF1 's tail
df1.join (df2,on=col1, how= ' inner '): Execute SQL Form join for columns of df1 and columns of DF2
Data processing :
Df[df[col] > 0.5]: Select rows with a COL column value greater than 0.5
df.sort_values (col1): Sort data by Column col1, by default ascending
df.sort_values (col2, Ascending=false): Sort data in descending order by column col1
df.sort_values ([Col1,col2], Ascending=[true,false]): Sort by column col1 in ascending order, followed by col2 in descending order
Df.groupby (COL): Returns a GroupBy object grouped by column Col
Df.groupby ([col1,col2]): Returns a GroupBy object grouped by multiple columns
df.groupby ( col1) [col2]: Returns the mean value
df.pivot_table (Index=col1, values=[col2,col3], Aggfunc=max) for column col2 after grouping by column col1 : Create a pivot table Df.groupby (col1) that groups col1 by column and calculates the maximum values for col2 and col3
. Agg (Np.mean): Returns the mean value of all columns grouped by column col1
( Np.mean): Apply function Np.mean data.apply (Np.max,axis=1) to each column in Dataframe
: Apply function to each row in Dataframe Np.max
Data Cleansing :
Df[df[col] > 0.5]: Select rows with a COL column value greater than 0.5
df.sort_values (col1): Sort data by Column col1, by default ascending
df.sort_values (col2, Ascending=false): Sort data in descending order by column col1
df.sort_values ([Col1,col2], Ascending=[true,false]): Sort by column col1 in ascending order, followed by col2 in descending order
Df.groupby (COL): Returns a GroupBy object grouped by column Col
Df.groupby ([col1,col2]): Returns a GroupBy object grouped by multiple columns
df.groupby ( col1) [col2]: Returns the mean value
df.pivot_table (Index=col1, values=[col2,col3], Aggfunc=max) for column col2 after grouping by column col1 : Create a pivot table Df.groupby (col1) that groups col1 by column and calculates the maximum values for col2 and col3
. Agg (Np.mean): Returns the mean value of all columns grouped by column col1
( Np.mean): Apply function Np.mean data.apply (Np.max,axis=1) to each column in Dataframe
: Apply function to each row in Dataframe Np.max
Other operations:
Change column name:
Method 1
a.columns = [' A ', ' B ', ' C ']
Method 2
a.rename (columns={' a ': ' A ', ' B ': ' B ', ' C ': ' C '}, InPlace = True)
inserting rows and columns
Http://www.jianshu.com/p/7df2593a01ce
Related Reference links :
Reference
http://www.qingpingshan.com/rjbc/dashuju/228593.html
10 minutes to fix
http://python.jobbole.com/84416/.
Official document
http://pandas.pydata.org/pandas-docs/stable/index.html
operation index
https://www.dataquest.io/ Blog/images/cheat-sheets/pandas-cheat-sheet.pdf
Advanced
fetch Number (element):
Take a specific data in DF
iloc Index Locate
example:
print df.iloc[0,0]
print df.iloc[1, 1]
print DF.ILOC[19, 7]
if DF is a date index + a,b,c column name
Loc Locate
df.loc[' 2017-01-01 ', ' a ' ]
Fetch number (line):
One_row = df.iloc[4]
one_row2 = df.loc[' 2013-01-02 '
) print type (One_row)
takes a row, after which the data type is Series
can one_row.iloc[1], and then access the data in the Series
print one_row.iloc[1]
print one_row.loc[' A ']
Fetch number (column):
Column2 = Df[' A ']
Column2 is
the data in a series type print type (COLUMN2) accessible
column
print column2[0]
print column2[' 2013-01-03 ']
fetch number (slice):
Row mode slice
dfsub1 = df.iloc[4:5]
print type (DFSUB1)
print dfsub1
dfsub2 = df.loc[' 2013-01-03 ': ' 2013-01-05 ']
print dfsub2
The result of the slice is DF, and changing the dfsub will change the DF at the same time
----------------------------- --------------------
column mode
print ' Get sub by column mode '
dfsub = df[[' A ', ' B ']]
print type ( dfsub)
print dfsub
-------------------------------------------------
subset
row x column
mode one:
print ' Get sub by row X column mode '
dfsub = df.loc[' 20130102 ': ' 20130104 ', [' A ', ' B ']]
print type (dfsub)
print dfsub
mode two
dfsub = df.iloc[1:3, 1:3]
--- ----------------------------------------------
Fetch number (conditional slice):
Dfsub = df[ (DF. A > 0) and (DF. B > 0) ] The
result type is DF
This thing really wants the select where in the database to
send a special condition
print df[df > 0]
Line traversal: