International - English

Cart Console

Topic Center

Contact Sales

Home > Others

Organize pandas operations

Last Update:2018-07-26 Source: Internet

Author: User

Tags join json sort first row

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Organize Pandas Operations

This article original, reproduced please identify the source: http://www.cnblogs.com/xiaoxuebiye/p/7223774.html

Import Data:

Pd.read_csv (filename): Import data from CSV file
pd.read_table (filename): Import data from a delimited text file
pd.read_excel (filename) : Importing data from an Excel file
pd.read_sql (query, Connection_object): Importing data from SQL Tables/Libraries
Pd.read_json (json_string) : Import data from JSON-formatted string
pd.read_html (URL): Parse URL, string, or HTML file, extract the Tables form
pd.read_clipboard (): Get content from your clipboard, and passed to Read_table ()
PD. DataFrame (dict): Import data from a Dictionary object, key is the column name, value is the data

Export Data :

Df.to_csv (filename): Export data to CSV file
df.to_excel (filename): Export data to Excel file
df.to_sql (table_name, Connection_ Object): Export data to SQL table
Df.to_json (filename): Export data to a text file in JSON format

To create a test object :

Pd. DataFrame (Np.random.rand (20,5)): Creates a DataFrame object consisting of 20 rows of 5 columns of random number
PD. Series (my_list): Create a Series object from an iterative object my_list
df.index = Pd.date_range (' 1900/1/30 ', periods=df.shape[0]): Add a Date Index

view, check data :

Df.head (n): View the first n rows of the Dataframe object
df.tail (n): View the last n rows of the Dataframe object
df.shape (): View the number of rows and columns//
df.info () : View index, data type, and memory information
df.describe (): View summary statistics for numeric columns
s.value_counts (dropna=false): View unique values and counts for a series object
df.apply ( Pd. series.value_counts): View unique values and counts for each column in the Dataframe object

Data selection :

Df[col]: Based on column name and return column as series
df[[col1, col2]: Return multiple columns in dataframe form
s.iloc[0]: Select data by Location s.loc[
' Index_ One ']: Select data by index
df.iloc[0,:]: Return first row
df.iloc[0,0]: Returns the first element of the first column

Data Statistics :

Df.describe (): View summary statistics for data value columns
Df.mean (): Returns the mean value of all columns
Df.corr (): Returns the correlation coefficient between columns
df.count (): Returns the number
of non-null values in each column Df.max (): Returns the maximum value of each column
df.min (): Returns the minimum value for each column
Df.median (): Returns the median of each column
DF.STD (): Returns the standard deviation for each column

Data Merge :

Df1.append (DF2): Adds rows from DF2 to Df1 's tail
df.concat ([Df1, Df2],axis=1): Adds columns from DF2 to DF1 's tail
df1.join (df2,on=col1, how= ' inner '): Execute SQL Form join for columns of df1 and columns of DF2

Data processing :

Df[df[col] > 0.5]: Select rows with a COL column value greater than 0.5
df.sort_values (col1): Sort data by Column col1, by default ascending
df.sort_values (col2, Ascending=false): Sort data in descending order by column col1
df.sort_values ([Col1,col2], Ascending=[true,false]): Sort by column col1 in ascending order, followed by col2 in descending order
Df.groupby (COL): Returns a GroupBy object grouped by column Col
Df.groupby ([col1,col2]): Returns a GroupBy object grouped by multiple columns
df.groupby ( col1) [col2]: Returns the mean value
df.pivot_table (Index=col1, values=[col2,col3], Aggfunc=max) for column col2 after grouping by column col1 : Create a pivot table Df.groupby (col1) that groups col1 by column and calculates the maximum values for col2 and col3
. Agg (Np.mean): Returns the mean value of all columns grouped by column col1
( Np.mean): Apply function Np.mean data.apply (Np.max,axis=1) to each column in Dataframe
: Apply function to each row in Dataframe Np.max

Data Cleansing :

Df[df[col] > 0.5]: Select rows with a COL column value greater than 0.5
df.sort_values (col1): Sort data by Column col1, by default ascending
df.sort_values (col2, Ascending=false): Sort data in descending order by column col1
df.sort_values ([Col1,col2], Ascending=[true,false]): Sort by column col1 in ascending order, followed by col2 in descending order
Df.groupby (COL): Returns a GroupBy object grouped by column Col
Df.groupby ([col1,col2]): Returns a GroupBy object grouped by multiple columns
df.groupby ( col1) [col2]: Returns the mean value
df.pivot_table (Index=col1, values=[col2,col3], Aggfunc=max) for column col2 after grouping by column col1 : Create a pivot table Df.groupby (col1) that groups col1 by column and calculates the maximum values for col2 and col3
. Agg (Np.mean): Returns the mean value of all columns grouped by column col1
( Np.mean): Apply function Np.mean data.apply (Np.max,axis=1) to each column in Dataframe
: Apply function to each row in Dataframe Np.max

Other operations:

Change column name:

Method 1
a.columns = [' A ', ' B ', ' C ']

Method 2
a.rename (columns={' a ': ' A ', ' B ': ' B ', ' C ': ' C '}, InPlace = True)

inserting rows and columns

Http://www.jianshu.com/p/7df2593a01ce

Related Reference links :

Reference
http://www.qingpingshan.com/rjbc/dashuju/228593.html

10 minutes to fix
http://python.jobbole.com/84416/.

Official document
http://pandas.pydata.org/pandas-docs/stable/index.html

operation index
https://www.dataquest.io/ Blog/images/cheat-sheets/pandas-cheat-sheet.pdf

Advanced

fetch Number (element):

Take a specific data in DF
iloc     Index Locate 
example:
    print df.iloc[0,0]
    print df.iloc[1, 1]
    print DF.ILOC[19, 7]

if DF is a date index +  a,b,c column name

Loc      Locate

df.loc[' 2017-01-01 ', ' a '  ]

Fetch number (line):

    One_row = df.iloc[4]
    one_row2 = df.loc[' 2013-01-02 '
    ) print type (One_row)


takes a row, after which the data type is   Series
can  one_row.iloc[1], and then access the data in the Series

    print one_row.iloc[1]
    print one_row.loc[' A ']

Fetch number (column):

Column2 = Df[' A ']

Column2 is

the data in a series type print type (COLUMN2) accessible

column
    print column2[0]
    print column2[' 2013-01-03 ']

fetch number (slice):

Row mode slice

    dfsub1 = df.iloc[4:5]
    print type (DFSUB1)
    print  dfsub1

    dfsub2 = df.loc[' 2013-01-03 ': ' 2013-01-05 ']
    print  dfsub2

The result of the slice is DF,  and changing the dfsub will change the DF at the same time


----------------------------- --------------------
column mode 
    print ' Get sub by  column mode '
    dfsub = df[[' A ', ' B ']]
    print type ( dfsub)
    print  dfsub


-------------------------------------------------
subset 
row  x  column
mode one:
    print ' Get sub by  row  X column  mode '
    dfsub = df.loc[' 20130102 ': ' 20130104 ', [' A ', ' B ']]
    print type (dfsub)
    print  dfsub
mode two

dfsub = df.iloc[1:3, 1:3]
--- ----------------------------------------------

Fetch number (conditional slice):

Dfsub =  df[  (DF. A > 0) and  (DF. B > 0)  ] The
result type is DF

This thing really wants the select where in the database to 

send a special condition

print df[df > 0]

Line traversal:

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

how to organize sockets organize bookmarks in chrome how to organize server rack how to organize scanned documents hexadecimal operations mongodb operations memcached operations

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Organize pandas operations

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support