How to iterate the rows of Pandas Dataframe

Source: Internet
Author: User

from:76713387

How to iterate through rows in a DataFrame in pandas-dataframe by row iteration

Https://stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandas

Http://stackoverflow.com/questions/7837722/what-is-the-most-efficient-way-to-loop-through-dataframes-with-pandas

When it comes to manipulating dataframe, we inevitably need to view or manipulate the data row by line, so what's the efficient and fast way to do it?

Index ordinal
import pandas as pdinp = [{‘c1‘:10, ‘c2‘:100}, {‘c1‘:11,‘c2‘:110}, {‘c1‘:12,‘c2‘:120}]df = pd.DataFrame(inp)for x in xrange(len(df.index)): print df[‘c1‘].iloc[x]

This seems to be the most common approach, and it is possible to operate on the dataframe during the iteration.

Enumerate
for i, row in enumerate(df.values):    index= df.index[i]    print row

Df.values is the Numpy.ndarray type
Here I is the ordinal of the index, row is the Numpy.ndarray type.

Iterrows

Https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.iterrows.html

import Pandas as pdinp = [{ ' C1 ': 10,  ' C2 ': 100}, { ' C1 ': 11,  ' C2 ': 110}, { ' C1 ': 12, ' C2 ': 120}]DF = PD. DataFrame (INP) for index, row in df.iterrows (): print row[ ' C1 '], Row[ ' C2 '  #10 100 #11 110 #12 +   

Each iteration of Df.iterrows () is a tuple type that contains the index and the data for each row.

    1. Using the Iterrows method, the resulting row is a series,dataframe dtypes will not be retained.
    2. The returned series is only a copy of the original dataframe and cannot be modified for the original dataframe;
Itertuples

Http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.itertuples.html

import pandas as pdinp = [{‘c1‘:10, ‘c2‘:100}, {‘c1‘:11,‘c2‘:110}, {‘c1‘:12,‘c2‘:120}]df = pd.DataFrame(inp)for row in df.itertuples(): # print row[0], row[1], row[2] 等同于 print row.Index, row.c1, row.c2

Itertuples returns a Pandas.core.frame.Pandas type.

Itertuples is generally thought to be faster than iterrows.

Zip/itertools.izip

The use of zip and itertools.izip is similar, but the zip returns a list, and Izip returns an iterator. If the volume of data is large, zip performance is less than izip

from Itertools import izipimport Pandas as pdinp = [{ ' C1 ': 10,  ' C2 ': 100}, { ' C1 ': 11,  ' C2 ': 110}, { ' C1 ': 12, ' C2 ': 120}]DF = PD. DataFrame (INP) for row in izip (Df.index, Df[ ' C1 '], Df[ ' C2 '): print row    

Time Assessment
ImportTimeFrom NumPy.Random Import randndf = PD. DataFrame ({' A ': Randn (100000),' B ': Randn (100000)}) Time_stat = []# Range (index) test_list = []t =Time.Time ()for RIn Xrange (Len (DF)): Test_list.append ((Df.index[r], df.iloc[r,0], Df.iloc[r,1])) Time_stat.append (Time.Time ()-T)# enumeratetest_list = []t =Time.Time ()For I, RIn Enumerate (df.values): Test_list.append ((Df.index[i], r[0], r[1])) Time_stat.append (Time.Time ()-T)# iterrowstest_list = []t =Time.Time ()For I,rIn Df.iterrows (): Test_list.append ((Df.index[i], r[' A '], r[' B '])) Time_stat.append (Time.Time ()-T)#itertuplestest_list = []t =Time.Time ()For IRIn Df.itertuples (): Test_list.append ((ir[0], ir[1], ir[2])) Time_stat.append (Time.Time ()-T)# ziptest_list = []t =Time.Time ()for RIn Zip (Df.index, df[' A '], df[' B ']): Test_list.append ((r[0], r[1], r[2])) Time_stat.append (Time.Time ()-T)# iziptest_list = []t =Time.Time ()From Itertools import Izipfor RIn Izip (Df.index, df[' A '], df[' B ']): Test_list.append ((r[0], r[1], r[2])) Time_stat.append (Time.Time ()-t) TIME_DF = PD. DataFrame ({' Items ': [ ' itertuples ',  ' Zip ',  Izip '],  ' time ': Time_stat}) time_df.sort_values ( ' time ') items time5 izip 0.0348694 zip 0.0404403 itertuples 0.0726041 enumerate 0.1740942 iterrows 4.026293 0 Range (Index) 21.921407       

Can be found in time spending, Izip > Zip > Itertuples > Enumerate > Iterrows > Range (Index)

How to iterate the rows of Pandas Dataframe

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.