Several Commonly Used Methods of Python Data Analysis
Source: Internet
Author: User
Keywordspython data analysis python data analysis
Python is an increasingly popular tool for data analysis. This article introduces several commonly used methods of Python data analysis.
1. If the header or excel index is in Chinese, the output will be wrong
Solution: python version problem! Change to python3 and it will be solved automatically! Of course there are other methods, so I won’t go into it here.
2. If there are many columns, how to output the specified column?
Demand situation: Sometimes there is a lot of data, but as long as only part of the data is analyzed, what should be done?
Df = df.loc[:,['keywords','number of visitors brought by','bounce rate']] #Access the specified column
One row reads data, the second row accesses the specified column
3. How to add a new column to the data frame?
Demand situation: There is a table with unit price and quantity. I want to output a total price column or summarize some data
Solution: directly upload the code
from pandas import read_csv;
import pandas;
df = read_csv("1.csv", sep="|");
#Add the calculation result as a new column
df['result'] = df.price*df.num #The new column name, followed by the corresponding value
print (df)
4. How to calculate the value of the percent sign and output it
Demand situation: a situation that is more painful. Many e-commerce data are percentages, with a percentage sign, which cannot be directly calculated. They need to be converted and then output
Solution:
from pandas import read_csv;
import pandas;
df = read_csv("1.csv", sep="|");
f = df['Strip rate'].str.strip("%").astype(float)/100;
f.round(decimals=2) #Keep 2 digits after the decimal point
f_str = f.apply(lambda x: format(x,'.2%')); # then convert to a percent sign and keep 2 digits (precision can be adjusted)
df['Loss rate'] = f_str #Reassign
5. How to get the imported data has several rows and several columns (values)
Demand situation: Sometimes it is necessary to write a general script, such as random sampling analysis, if the program automatically obtains rows and columns, the script written will obviously be very versatile
Solution:
df.columns.size #Get the number of columns
df.iloc[:, 0].size #Get the number of rows
6. How to sort the data
Demand situation: Needless to say, I will use it everywhere
Solution:
df['Loss rate'].size #sort the data
newDF = df.sort(['Exposure','Number of visitors brought'], ascending=[True, False]); #Multiple sort
7. How to delete the specified column?
Demand situation: Similarly, if you want to get the specified output data, you can use method 2 if you want to get the specified output data, but if you want to get more data columns, only 1-2 rows are not wanted, so you can delete it by specifying Listed method
Solution:
df.columns.delete(1)
One line of code!
Summary: On the whole, the python grammar is quite simple for data analysis, and many requirements are basically just one line of code!
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.