The dataframe of Python data processing learning Pandas

Last Update:2015-10-26 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

label:

Please forgive me for not writing it once.This article is a record of my own learning process, perfecting the learning knowledge of pandas. For the lack of existing online materials and outdated part of the knowledge of this book using python for data analysis, I had to write it in a recorded situation This article. If the follow-up work is settled and there is time to improve the learning of the pandas library, please forgive me! By LQJ 2015-10-25

Foreword:

First recommend a better Python pandas DataFrame learning URL

Website: http://www.cnblogs.com/chaosimple/p/4153083.html

Explanation:
First of all, Baidu Python pandas DataFrame, the following lists some of the data structures of the DataFrame and explains them.DataFrame and Series are the two main data structures of padans.
If you often use SQL databases or have done data analysis and other related work, you can get started with Python's pandas library faster.The use of pandas library is similar to some syntax of SQL statements, but the language has changed.
text:
import pandas as pd use pd name when referencing pandas
View data using DataFrame (similar to select in SQL):
from pandas import DataFrame #Reference DataFrame from pandas library
df_obj = DataFrame () #Create a DataFrame object
df_obj.dtypes #View the data format of each row
df_obj.head () #View the data of the first few rows, the first five rows by default
df_obj.tail () #View the data of the next few rows, the last 5 rows by default
df_obj.index #View index
df_obj.columns #View column names
df_obj.values #View data values
df_obj.describe #Descriptive statistics
df_obj.T #Transpose
df_obj.sort (columns = ‘’) #sort by column name
df_obj.sort_index (by = [’’, ’’]) # Multi-column sorting, the function is out of date when using time, please use sort_values
df_obj.sort_values (by = [‘‘, ’‘]) Ibid.

Use DataFrame to select data (similar to LIMIT in SQL):
df_obj [‘Customer Name’] #Show data under column names
df_obj [1: 3] #Get 1-3 rows of data, this operation is called slice operation, get row data
df_obj.loc [: 0, ['user number', 'product name']] #Get the data in the selection area, the line range before the comma, and the column range after the comma. Note loc selects data by tags, iloc selects by position data
df_obj [‘package’]. drop_duplicates () #Remove duplicate rows of data
Reset data using DataFrame:
df_obj.at [df_obj.index, ‘Branch_Maintenance Line’] = ‘Owned Hall’ # Set a new value by tag, if you use iat, set a new value by position
Filter data using a DataFrame (similar to WHERE in SQL):
alist = [‘023-18996609823’]
df_obj [‘user number’]. isin (alist) #Put the data to be filtered into the dictionary, use isin to filter the data, return the row index and the result of each row filtering, and return true if it matches
df_obj [df_obj [‘user number’]. isin (alist)] #Get the line whose result is true
Use DataFrame to fuzzy filter data (similar to LIKE in SQL):
df_obj [df_obj [‘package’]. str.contains (r ‘. *? VoiceCDMA. *‘)] # Use regular expressions for fuzzy matching, * matches 0 or unlimited times, and? matches 0 or 1 times
Data conversion using DataFrame (additional explanation later)
df_obj ['branch_maintenance line'] = df_obj ['branch_maintenance line']. str.replace ('Wuxi Branch (. {2,}) branch', '\\ 1') # can use regular expressions formula
df_obj [‘Branch_Maintenance Line’]. drop_duplicates () #Returns data to remove duplicate rows
You can set take_last = ture to keep the last one, or keep the first one. Supplementary note: Note that take_last = ture is out of date, please use keep = ‘last’
Reading text data using pandas:
read_csv (‘D: \ LQJ.csv’, sep = ‘;’, nrows = 2) #First enter the csv text address, then the separator selection and so on
Aggregating data using pandas (similar to GROUP BY or HAVING in SQL):
data_obj [‘User ID’]. groupby (data_obj [’Branch_Maintenance Line’])
data_obj.groupby (‘Branch_Maintenance Line’) [‘User Identity’] #Simplified above
adsl_obj.groupby (‘Branch_Maintenance Line’) [‘User ID’]. agg ([(‘ADSL’, ‘count’)])
#Summarize the user ID by branch, and name the column name of the count column as ADSL
Merging datasets using pandas (similar to JOIN in SQL):
merge (mxj_obj2, mxj_obj1, on = `` User ID '', how = `` inner ′) # mxj_obj1 and mxj_obj2 use the user ID as the key of an overlapping column to merge two data sets, and inner means to take the intersection of the two data sets.
Python data processing learning pandas DataFrame

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

The dataframe of Python data processing learning Pandas

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

The dataframe of Python data processing learning Pandas

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support