from:76713387
How to iterate through rows in a DataFrame in pandas-dataframe by row iteration
Https://stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandas
Http://stackoverflow.com/questions/7837722/what-is-the-most-efficient-way-to-loop-through-dataframes-with-pandas
When it comes to manipulating dataframe, we inevitably need to view or manipulate the data row by line, so what's the efficient and fast way to do it?
Index ordinal
import pandas as pdinp = [{‘c1‘:10, ‘c2‘:100}, {‘c1‘:11,‘c2‘:110}, {‘c1‘:12,‘c2‘:120}]df = pd.DataFrame(inp)for x in xrange(len(df.index)): print df[‘c1‘].iloc[x]
This seems to be the most common approach, and it is possible to operate on the dataframe during the iteration.
Enumerate
for i, row in enumerate(df.values): index= df.index[i] print row
Df.values is the Numpy.ndarray type
Here I is the ordinal of the index, row is the Numpy.ndarray type.
Iterrows
Https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.iterrows.html
import Pandas as pdinp = [{ ' C1 ': 10, ' C2 ': 100}, { ' C1 ': 11, ' C2 ': 110}, { ' C1 ': 12, ' C2 ': 120}]DF = PD. DataFrame (INP) for index, row in df.iterrows (): print row[ ' C1 '], Row[ ' C2 ' #10 100 #11 110 #12 +
Each iteration of Df.iterrows () is a tuple
type that contains the index and the data for each row.
- Using the Iterrows method, the resulting row is a series,dataframe dtypes will not be retained.
- The returned series is only a copy of the original dataframe and cannot be modified for the original dataframe;
Itertuples
Http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.itertuples.html
import pandas as pdinp = [{‘c1‘:10, ‘c2‘:100}, {‘c1‘:11,‘c2‘:110}, {‘c1‘:12,‘c2‘:120}]df = pd.DataFrame(inp)for row in df.itertuples(): # print row[0], row[1], row[2] 等同于 print row.Index, row.c1, row.c2
Itertuples returns a Pandas.core.frame.Pandas type.
Itertuples is generally thought to be faster than iterrows.
Zip/itertools.izip
The use of zip and itertools.izip is similar, but the zip returns a list, and Izip returns an iterator. If the volume of data is large, zip performance is less than izip
from Itertools import izipimport Pandas as pdinp = [{ ' C1 ': 10, ' C2 ': 100}, { ' C1 ': 11, ' C2 ': 110}, { ' C1 ': 12, ' C2 ': 120}]DF = PD. DataFrame (INP) for row in izip (Df.index, Df[ ' C1 '], Df[ ' C2 '): print row
Time Assessment
ImportTimeFrom NumPy.Random Import randndf = PD. DataFrame ({' A ': Randn (100000),' B ': Randn (100000)}) Time_stat = []# Range (index) test_list = []t =Time.Time ()for RIn Xrange (Len (DF)): Test_list.append ((Df.index[r], df.iloc[r,0], Df.iloc[r,1])) Time_stat.append (Time.Time ()-T)# enumeratetest_list = []t =Time.Time ()For I, RIn Enumerate (df.values): Test_list.append ((Df.index[i], r[0], r[1])) Time_stat.append (Time.Time ()-T)# iterrowstest_list = []t =Time.Time ()For I,rIn Df.iterrows (): Test_list.append ((Df.index[i], r[' A '], r[' B '])) Time_stat.append (Time.Time ()-T)#itertuplestest_list = []t =Time.Time ()For IRIn Df.itertuples (): Test_list.append ((ir[0], ir[1], ir[2])) Time_stat.append (Time.Time ()-T)# ziptest_list = []t =Time.Time ()for RIn Zip (Df.index, df[' A '], df[' B ']): Test_list.append ((r[0], r[1], r[2])) Time_stat.append (Time.Time ()-T)# iziptest_list = []t =Time.Time ()From Itertools import Izipfor RIn Izip (Df.index, df[' A '], df[' B ']): Test_list.append ((r[0], r[1], r[2])) Time_stat.append (Time.Time ()-t) TIME_DF = PD. DataFrame ({' Items ': [ ' itertuples ', ' Zip ', Izip '], ' time ': Time_stat}) time_df.sort_values ( ' time ') items time5 izip 0.0348694 zip 0.0404403 itertuples 0.0726041 enumerate 0.1740942 iterrows 4.026293 0 Range (Index) 21.921407
Can be found in time spending, Izip > Zip > Itertuples > Enumerate > Iterrows > Range (Index)
How to iterate the rows of Pandas Dataframe