Name0 KEN1 John2 JIMIOK, this is the way to define the data frame dataframe.The third problem: limitations, what are the limitations of using this data structure?In general, the restriction is that there is only one data type for this data structure, and in Python's data frame, it is possible to store multiple data types, basically without any restrictions on the default data type.Question fourth: What is the way to access and access the data in this structure?
Access location
dataFor more information, see: Basics section1. Look at the head and tail lines in the frame:2. Display indexes, columns, and underlying numpy data:3. The describe () function is a quick statistical summary of the data:4. Transpose the data:5, by axis to sort6. Sort by valueThird, the choiceWhile the standard python/numpy selection and setup expressions can come in handy, we recommend using optimized pandas data access as the code used for the project:. At,. IAT,. Loc,.
1. Ways to get a column of another column with max/min values:A. Most_bars_country = flags["name"][flags["Bars"].idxmax ()]B. bars_sorted = flags.sort_values ("Bars", ascending=[0])Most_bars_country = bars_sorted["Name"].iloc[0]2. The probability of a certain value in a column:orange_probability = flags[flags["Orange"]==1].shape[0]/flags.shape[0]3. The calculate combination by using factorial: Import Mathdef find_outcome_combinations (N, k): # Calcu
from:76713387How to iterate through rows in a DataFrame in pandas-dataframe by row iterationHttps://stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandasHttp://stackoverflow.com/questions/7837722/what-is-the-most-efficient-way-to-loop-through-dataframes-with-pandasWhen it comes to manipulating dataframe, we inevitably need to view or manipulate the data row by line, so what's the efficient and fast way to do it?Index ordinalimport pandas as pdinp = [{‘c1‘:10, ‘c2
'], Aggfunc=[np.sum, Np.mean])#Ibid .5Df.groupby (['E','F']). AGG ({'A':['mean','sum'],'B':'min'})#GroupBy can also write this11 sort1 df.sort (['A','B'# Sort by column, Na_ Position controlling the location of Nan 2# Sort by index12 filtering1 # Value Filtering 2 df[df. E.str.contains (">"# contains a character, contains filter is actually a regular expression 3 df[df . F.isin (['1'# inside the list13 Variable Selection1df['A']#a single column2Df[0:3]#Line3df['20130102':'20130104']#Filter b
the ' One ' and ' both ' lines and columns as a-b column;a.loc[' One ', ' a '] has the same effect as a.loc[[' a '],[' a '], but the former only displays the corresponding values, and the latter displays the corresponding row and column labels.3.iloc selects the data directly from the location.This is similar to selecting by labelA.iloc[1:2,1:2] Displays the data for the first column of the first row (the value after the slice is not taken)A.iloc[1:2
Array one-dimensional arraysThe storage address of the array element labeled I (0≤iLOC (Arrname[i]) =loc (arrname[0]) +i*sizeof (elemtype) (0≤iTwo-dimensional arraysThe storage address of the array element Arrname[i][j] Loc (Arrname[i][j]) isLOC (Arrname[i][j]) =loc (arrname[0][0]) + (i*n+j) sizeof (Elemtype) (0≤iAbstract data types for multidimensional array arraysThree-dimensional integer array#include #defineERROR 0#defineOK 1#defineNotpresent 2#de
recommend the optimized pandas data access methods,. At,. IAT,. Loc,. Iloc and. IX.
The indexing section and below. getting
Selecting a single column, which yields a Series, equivalent to DF. A
in [[]: df[' A ']
out[21]:
2013-01-01 0.469112
2013-01-02 1.212112 2013-01-03 - 0.861849
2013-01-04 0.721555
2013-01-05 -0.424972
2013-01-06 -0.673690
freq:d , Name:a, Dtype:float64
selecting via [], which slices the rows.
in [[]: Df[0:3]
out[22
[[]: Df.sort (columns= ' B ')
out[22]:
A B C D
2013-01-03-0.861849-2.104569- 0.494929 1.071804
2013-01-04 0.721555-0.706771-1.039575 0.271860 2013-01-01 0.469112-0.282863-1.509059-1.135632
2013-01-02 1.212112-0.173215 0.119209-1.044236
2013-01-06-0.673690 0.113648-1.478427 0.524988
2013-01-05-0.424972 0.567020 0.276232- 1.087401
[6 rows x 4 columns]
selection¶
Note
While standard python/numpy expressions for selecting and setting are int
). mean (). Iloc[1:][0] + [D[x].max ()]
s = pd.cut (d[x], group, labels = [x + str (i) for I in range (4)]) return
s
discretization_d = Pd.concat (' F (' syndrome of liver-qi stagnation '), f (' accumulation coefficient of heat toxin '),
F (' Chong-ren imbalance syndrome type coefficient '), f (' Qi and blood two deficiency syndrome '),
f (' Spleen and stomach weakness syndrome typ
'}, InPlace = True)
inserting rows and columns
Http://www.jianshu.com/p/7df2593a01ce
Related Reference links :
Reference
http://www.qingpingshan.com/rjbc/dashuju/228593.html
10 minutes to fix
http://python.jobbole.com/84416/.
Official document
http://pandas.pydata.org/pandas-docs/stable/index.html
operation index
https://www.dataquest.io/ Blog/images/cheat-sheets/pandas-cheat-sheet.pdf
Advanced
fetch Number (element):
Take a specific data in DF
interface:dateinterface = Dateinterface ()var data = ""}Let da = datastring ()Da.data.append ("Iloc")Let file = da.interface.filename//At this time only the lazy decorated/*If you have objective-c development experience, you should know that there are two ways to store values and references in a class instance. Alternatively, you can use instance variables as backup storage for the values stored in the attribute.Swift unifies these concepts into attr
Query Write operations Pandas can have powerful query functions like SQL and is simple to do: printtips[[' Total_bill ', ' tip ', ' smoker ', ' time ']] #显示 ' total_bill ', ' tip ', ' Smoker ', ' time ' column, functionally similar to the Select command in SQL printtips[tips[' time ']== ' Dinner ']# Displays data equal to dinner in the time column, functionally similar to the where command in SQL printtips[(tips[' size ']>=5) | (tips[' Total _bill ']>45)]printtips[(tips[' time ']== ' Dinner ')
columns before a column (including col2)Df. loc [m: n] Get from m ~ N rows (recommended)Df. iloc [m: n] Get from m ~ N-1 rowsDf. loc [m: n-1, 'col1': 'coln'] Get from m ~ Col1 ~ of n rows ~ Coln Column
Sr = df ['col'] retrieves a column and returns SeriesSr. values Series value, returned as a numpy. ndarray objectSr. index Series index, which is returned as an index object
5. Data operation and sorting
Df. T DataFrame transposeDf1 + df2 are added
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.