FalseHangzhou FalseShanghai FalseSuzhou True
An important feature of Series is to automatically align data with different indexes in data operations.
In [24]: obj3Out[24]: Beijing 40000Hangzhou 30000Nanjing 26000Shanghai 35000In [25]: obj4Out[25]: Beijing 40000.0Hangzhou 30000.0Shanghai 35000.0Suzhou NaNIn [26]: obj3 + obj4Out[26]: Beijing 80000.0Hangzhou 60000.0Nanjing NaNShanghai 70000.0Suzhou NaN
The index of Series can be modified locally through replication.
In [27]: obj.index = ['Bob',
index-feature name-Attribute-easy to understand
2. filter the row and column data of dataframe
import pandas as pd,numpy as npfrom pandas import DataFramedf = DataFrame(np.arange(20).reshape((4,5)),column = list('abcde'))
1. df [] df. Select column data
Df.Df [['A', 'B']
2. df. loc [[index], [colunm] use tags to select data
When you do not filter rows, enter "
,how=‘left‘) #df_right=pd.merge(df,df1,how=‘right‘)df_outer=pd.merge(df,df1,how=‘outer‘) #并集2. Set the index columndf_inner.set_index(‘id‘)3. Sort by the value of a specific column:df_inner.sort_values(by=[‘age‘])4. Sort by index column:df_inner.sort_index()5. If the value >3000,group column of the Prince column shows high, the low is displayed:df_inner[‘group‘] = np.where(df_inner[‘price‘] > 3000,‘
', ' 110 ')
Replace
Data preprocessing
Sort the data
Df.sort_values (by=[' The number of messages sent by the customer on the Day '])
Sort
PivotTable report in data grouping --excel* * Group Customer chat Records
#如果price列的值 >3000,group column shows high, otherwise show low
df[' group ' = Np.where (df[' customer sends messages on the day '] > 5, ' High ', ' low ')
D
),(2,1),(2,2)]#拿第一个逗号分割的数据,在进行第二维操作,到2结束的列,输入如下array([[3, 4], [6, 7]])Based on the understanding of stepping slices, the two-and three-dimensional are equally well understood and not as complicated as steppingYou can also copy the elements of a slice>>> b[1:,:2] = 1 #广播赋值>>> barray([[0, 1, 2], [1, 1, 5], [1, 1, 8]])>>> b[1:,:2].shape(2L, 2L)>>> b[1:,:2] = np.arange(2,6).reshape(2,2) #对应赋值>>> barray([[0, 1, 2], [2, 3, 5], [4, 5, 8]])Three-dimensional, the same is seq
the last M resultsA two-dimensional array of df. values, which is returned as a numpy. ndarray object.The index of df. index DataFrame. The index cannot be directly assigned with a value.Df. reindex (index = ['row1', 'row2',...]Columns = ['col1', 'col2',...]) reorder based on the new indexDf [m: n] slice, select m ~ N-1 rowsDf [df ['col1']> 1] Select rows that m
operations
Keyerror: ' None of [[2, 3]] is in the [columns] '
Print df.loc[[2, 3]]#.loc can be selected without a column name.
Sex tip total_bill2 Male 3.50 23.683 Male 3.31 23.68
Print Df.iloc[1:3]#.iloc can be the row selection without adding the first column
Sex tip total_bill1 Male 1.66 10.342 Male 3.50 23.68
Print Df.iloc[1:3, ' tip ': ' Total_bill ']
Typeerror:cannot do slice indexing on
Print df.at[3, ' Tip ']print df.iat[3, 1
', DF ['v1']) #2 indicates the insert position, and V6 indicates the column name, DF ['v1 '] is the inserted value print ('insert column:') print (DF, '\ n') print (' * 50)
4. General selection methods:
Operation Method
Method
Result
Select a column
Def [col]
Sequence
Select a row using column tags
Query Write operations Pandas can have powerful query functions like SQL and is simple to do: printtips[[' Total_bill ', ' tip ', ' smoker ', ' time ']] #显示 ' total_bill ', ' tip ', ' Smoker ', ' time ' column, functionally similar to the Select command in SQL printtips[tips[' time ']== ' Dinner ']# Displays data equal to dinner in the time column, functionally similar to the where command in SQL printtips[(tips[' size ']>=5) | (tips[' Total _bill ']>45)]printtips[(tips[' time ']== ' Dinner ')
value
df.pivot_table (Index=col1, values=[col2,col3], Aggfunc=max) for column col2 after grouping by column col1 : Create a pivot table Df.groupby (col1) that groups col1 by column and calculates the maximum values for col2 and col3
. Agg (Np.mean): Returns the mean value of all columns grouped by column col1
( Np.mean): Apply function Np.mean data.apply (Np.max,axis=1) to each column in Dataframe
: Apply function to each row in Dataframe Np.max
Other operations:
Change column name:
Method 1
7jeff-5ryan 3
DataFrame
Pandas reading files
in [+]: df = pd.read_table (' pandas_test.txt ', sep= ', names=[' name ', ' age ')) in [+]: dfout[30]: name age0 Bob 261 Loy A 222 Denny 203 Mars 25
Dataframe Column Selection
Df[name]
In [to]: df[' name ']out[31]: 0 Bob1 Loya2 Denny3 marsname:name, Dtype:object
Dataframe Row Selection
Df.iloc[0,:] #第一个参数是第几行, the se
Row Selection
Df.iloc[0,:] #第一个参数是第几行, the second argument is a column. This refers to row No. 0 all columns df.iloc[:,0] #全部行, No. 0 column
in [+]: df.iloc[0,:]out[32]: Name Bobage 26name:0, Dtype:objectin [all]: df.iloc[:,0]out[33]: 0 Bob1 Loya2 Denny3 Marsna Me:name, Dtype:object
Gets an element that can be iloc, faster by the IAT
In [the]: df.iloc[1,1]out[34]: 22In []: df.iat[1,1]out[35]: 22
Dataframe Block Selection
In [approx]: df.loc[1:2,[' na
3
6
H
7
3
7
I
8
3
8
J
9
3
9
By using *loc, we can select some of the data in the Dataframe.
Df.loc[' a ']
Rev. 0
Test 3
col 0
name:a, Dtype:int64
# df.loc[starting index (included): Terminating index (inclusive)]
df.loc[' a ': ' d ']
Rev
Test
Col
A
0
3
0
B
1
3
1
C
2
3
2
the Dataframe>>>np.sign (DF)>>> last_col=df.columns[-1]>>>np.sign (Df[last_col])#Head (take the first few lines) and tail (take a few lines)>>> Df.head (2)>>> Df.tail (2)#find a row of data by index>>> last_col=df.index[-1]>>>Last_col>>>Df.iloc[last_col]#find a column of data for a row by index>>> Df.iloc[2:9]#Iloc and IAT function the same>>> df.iloc[2,3]>>> df
use anonymous functions5 column names1 Df.columns2Df.columns = ['a','b','C','e','D','F']# Renaming3Df.rename (columns = {'A':'AA','B':'BB','C':'cc','D':'DD','E':'ee','F':'FF'}, Inplace=True)4Df.rename (columns=LambdaX:x[1:].upper (), inplace=true)#You can also use the function inplace parameter to replace the original variable, the deep copy6 Dummy Variable Dummy variables1 PD. Series (['a|b'a|c']). Str.get_dummies ()7 Pure DF Matrix, i.e. does not c
PandasPandas is a popular open source Python project that takes the name of panel data and Python data analysis.Pandas has two important data structures: Dataframe and seriesThe dataframe of PANDAS data structurePandas's DATAFRAME data structure is a tagged two-dimensional object that is very similar to Excel spreadsheets or relational data tables.You can create dataframe in the following ways:1. Create a dataframe from another dataframe2. Generate Dataframe from a numpy array with two-dimension
from:76713387How to iterate through rows in a DataFrame in pandas-dataframe by row iterationHttps://stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandasHttp://stackoverflow.com/questions/7837722/what-is-the-most-efficient-way-to-loop-through-dataframes-with-pandasWhen it comes to manipulating dataframe, we inevitably need to view or manipulate the data row by line, so what's the efficient and fast way to do it?Index ordinalimport pandas as pdinp = [{‘c1‘:10, ‘c2
Recently, analysis and programming joined Planet Python. As the first of its special blogs, I'm here to share how to start data analysis through Python. The specific contents are as follows:
Data importImport a local or web-side CSV file;Data transformation;Data statistics description;Hypothesis TestingSingle sample t test;visualization;Create a custom function.
Data import
This is a critical step, and for subsequent analysis we first need to import the data. In general, the data is in CSV
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.