Query Write operations
Pandas can have powerful query functions like SQL and is simple to do:
print tips[[' Total_bill ', ' tip ', ' smoker ', ' time ']] #显示 ' total_bill ', ' tip ', ' Smoker ', ' time ' column, functionally similar to the Select command in SQL print tips[tips[' time '] == ' Dinner ']# Displays data equal to dinner in the time column, functionally similar to the where command in SQL print tips[(tips[' size '] >= 5) | (tips[' Total _bill '] > 45)]print tips[(tips[' time '] == ' Dinner ') & (tips[' tip ') > 5.00)]# | features similar to the or command in SQL, & features similar to the and commands in SQL #index and label queries df.iloc[i:j,k:p]# Iloc operation index, output the value of line I to line J and column K and P df.loc[' 20130102 ': ' 20130104 ', [' A ', ' B ']] #loc操作label, output behavior ' 20130102 ': ' 20130104 ', Listed as ' A ', ' B ' df.at[dates[0], ' a '] #返回特定行label和列label的数值 #map function Operation df[' Oid '] = df[' Name '].map (lambda x: int (X.split (' - ') [0]) #删除列del df[' smoker ') #增加列df [' Smoker '] = np.nan# delete Row df = df.drop ([I for i in range (1,100)],axis=0) #删除100行 # Add Row df = df.append ( Pd. DataFrame (Index=[i for i in&nbsP;range (100,200)],columns=df.columns), ignore_index=true) #增加一百行
The
uses pandas to write a one-dimensional relational table into the two-dimensional open relationship table, the code is as follows:
Def one2two (filepath,col_value): ' The relationship table is an OID field and a did field, Two fields correspond to a number co_value, which converts the values in the OID and did fields into a two-dimensional data table with the OID as the column and did as rows. " df = pd.read_csv (filepath)  NEWDF = PD. DataFrame (columns=df[' Oid '].unique (), index=df[' did '].unique ()) time = len ( Newdf.index) for i in newdf.index: for c in newdf.columns: #通过查询获得Oid和Did对应的值 value = df[df. Did==c][df[df. DID==C]. oid==i] newdf[c][i] = value[ col_value] time=time-1 print ' Ater %d the app will leave. ' %time print ' Ready to write. ' newdf.to_csv (col_value+ '. csv ') print ' Finsh write, the %s.cvs was generated '%col_value
Pandas in addition to the query good in bigfile processing is also quite impressive, such as the following from a large file extract features saved functions:
Def save (Pathfile,outpath): reader = pd.read_csv (Pathfile,iterator=True) #使用iterator so that pandas can read the file separately loop = true chunksize = 1000000 chunks = [] while loop: try: #划分成chunksize行大小的块进行读取 df = reader.get_chunk (chunkSize) chunks.append (DF) except StopIteration: loop = False print ' iteration is stopped. ' try: #将块连接起来, here is a try, because I do not know how to always happen memory error, if not try: Finally back #代码总是无法运行, but do not know add try: Does finally have an impact on the data? df = pd.concat (chunks, ignore_index=true) finally: df = df[[' Name ', ' Total_ Length ', ' total_time ']] #提出Name字段中数值中 ' - ' before putting in OID df[' Oid '] = df[' Name '].map (Lambda x: int ( X.split (' - ') [0]) df[' did '] = df[' Name ']. Map (Lambda x: int (x.split (' - ') [1]) del df[' Name '] df.to_csv (outpath) print ' Finsh. '
Pandas and table processing