Let's create a data frame by hand.[Python]View PlainCopy
Import NumPy as NP
Import Pandas as PD
DF = PD. DataFrame (Np.arange (0,2). Reshape (3), columns=list (' abc ' )
DF is such a dropSo how do you choose the three ways to pick the data?One, when each column already has column name, with DF [' a '] can choose to take out a whole column of data. If you know column names and index, and both are well-entered, you can choose. loc[Python]View PlainCopy
df.loc[0,
I believe many people like me in the process of learning Python,pandas data selection and modification has a great deal of confusion (perhaps by the Matlab) impact ...
To this day finally completely figure out ...
Let's start with a data box manually.
Import NumPy as NP
import pandas as PD
DF = PD. Dataframe (Np.arange (0,60,2). Reshape (10,3), columns=list (' abc ')DF is such a drop
So what are the three ways to choose the data?
First, when column name is already available in each row, a full
Python array, list, And dataframe index slicing operations: July 22, July 19, 2016-zhi Lang document,Array, list, And dataframe index slicing operations: January 1, July 19, 2016-zhi Lang document
List, one-dimensional, two-dimensional array, datafrme, loc, iloc, and ix
Numpy array index and slice introduction:Starting from the basic list index, let's start with the code and result:
A = [,] a [: 5:-1] # step
Output:
[9, 8, 7, 6][][1, 0]
List slicing
label, not an integer position index) a list or array label ["A", "B", "C"] a slice object with the label "a": "F" (note that, in contrast to the Python slice, The first and last of these slices are contained inside! A Boolean array is a callable function (called series, Dataframe or panel) and returns the valid output of the index (one of the above). Iloc is the most basic integer-based index (from the No. 0 bit of the axis to the length-1 bit), but
. Pandas provides three methods for similar operations: loc, iloc, ix, and ix, which are not officially recommended.
Loc select loc Based on the tag
Df. loc [row index start position: Row index end position, [column name array]
Iloc selected based on Index
Df. iloc [row index start position: Row index end position, column index start position: column index en
://www.CodeHighlighter.com/-->Public override DesignerVerbCollection Verbs
{
Get
{
DesignerVerbCollection verbs = new DesignerVerbCollection ();
Verbs. Add (new DesignerVerb ("Remove page", new EventHandler (ETWizardPage_Remove )));
Return verbs;
}
}
The ETWizardPage_Remove method is implemented as follows:
Code highlighting produced by Actipro CodeHighlighter (freeware)http://www.CodeHighlighter.com/-->ETWizardPage page = Control as ETWizardPage;If (page = null)Return;IDesignerHost host = (IDe
1 32012-04-12 7 6 12012-04-13 2 72012-04-14 4 7In [16]: # Generate two specific dates ... : fecha_1 = Dt.datetime (4, +) ...: fecha_2 = Dt.datetime (4, ...): ...: # Generate tile data ... : data_fecha.loc[fecha_1:fecha_2]out[16]: rnd_1 rnd_2 rnd_3fecha 2013-04-14 4 52013-04-15 1 2 182013-04-17 9 12013-04-18 7 17
Update: If there is no special requirement, it is highly recommended to use LOC with minimal use [], as Loc avoids chained ind
=df_inner.index,columns=[‘category‘,‘size‘]))8, the completion of the split data table and the original Df_inner data table to matchdf_inner=pd.merge(df_inner,split,right_index=True, left_index=True)V. Data extractionThe main use of the three functions: Loc,iloc and Ix,loc function by the value of the tag to extract, Iloc by location, IX can be at the same time by the label and location to extract.1. Extrac
* x3 + biris = pd.read_csv('iris.csv ') ← read iris.csv file temp = iris. iloc [:,] # indicates that in the iris of your defined matrix, the rows are [0, end), and the columns are [) # iloc directly determines the number of rows and columns, left closed right open temp ['x0'] = 1 # To add error B to w * = (w, B) vector (w * On P55 page of machine learning by Zhou Zhihua has a '^' symbol on it, which is har
Pandas Select Data Iloc and LOC are not used the same way, Iloc is based on the index, LOC is based on the value of the row>>>importpandasaspd>>>importos>>>os.chdir ("d:\\") >>>d=pd.read_csv ("Gwas_water.qassoc",delimiter= "\s+") >> >d.loc[1:3]CHRSNPBPNMISS BETASER2 tp11. 447440.18000.17830.02369 1.0090.318521.449 440.27850.24730.029311.1260.26653 1.452440.1800 0.17830.023691.0090.3185>>>d.loc[0:3]chrsnp BP
index)Index Default index-serial number-position-Easy to index but difficult to understandCustom index-feature name-Attribute-easy to understand
2. filter the row and column data of dataframe
import pandas as pd,numpy as npfrom pandas import DataFramedf = DataFrame(np.arange(20).reshape((4,5)),column = list('abcde'))
1. df [] df. Select column data
Df.Df [['A', 'B']
2. df. loc [[index], [colunm] use tags to select data
When you do not filter rows, enter "(cannot be blank)" in "[index]", that
. subplots (figsize = (12, 8) ax. plot (Nums, sigmoid (Nums), 'k') PLT. show () ''' # Calculate the loss function value def cost (Theta, x, y): Theta = NP. matrix (theta) x = NP. matrix (x) y = NP. matrix (y) Part1 = NP. multiply (-y, NP. log (sigmoid (x * Theta. t) Part2 = NP. multiply (1-y), NP. log (1-sigmoid (x * Theta. t) return NP. sum (part1-part2)/Len (x) # Add one column before the original matrix 1st to all 1data. insert (0, 'ones', 1) Cols = data. shape [1] x = data.
row index number m, N, O's line.However, in the regenerated New_titanic_suvival, the index number of the row has become irregular, and the new function iloc[] will be used to index the position by location # outputs the first five elements of a new table = New_titanic_survival.iloc[:5,:] # output The fourth row of the new table, and note that the index is still starting at 0, so fill in the parameters with 3 instead of 4 = new_titanic_survival.iloc
The following for you to share a pandas implementation of the selection of a specific index of the row, has a good reference value, I hope to be helpful to everyone. Come and see it together.
As shown below:
>>> Import numpy as np>>> import pandas as pd>>> Index=np.array ([2,4,6,8,10]) >>> Data=np.array ([3,5,7,9,11]) >>> DATA=PD. DataFrame ({' num ':d ata},index=index) >>> print (data) num2 910 11> >> select_index=index[index>5]>>> Print (select_index) [6 8 10]>>> data[' num '].loc[sel
['price'] >= 4000), 'sign']=1
7. Sort the values of the category field in sequence and create a data table. The index column with the index value df_inner is named "category" and "size ".
pd.DataFrame((x.split('-') for x in df_inner['category']),index=df_inner.index,columns=['category','size']))
8. Match the split data table with the original df_inner data table.
df_inner=pd.merge(df_inner,split,right_index=True, left_index=True)
V. Data ExtractionThe following functions are mainly used: loc,
','Chen Jiu','Xiao Ming'], dtype='Object')8Zhang San 1.29Reese 1.0TenHarry 2.3 OneChen Jiu 5.0 AXiao Ming 6.0 - Name:price, Dtype:float64 -Zhang San 1.2 theReese 1.0 -Harry 2.3 -Chen Jiu 5.0 -Xiao Ming 6.0 +Name:price, Dtype:float64 In general, we often need to value by column, then Dataframe provides loc and Iloc for everyone to choose from, but the difference is between the two.1 Print(frame2)2 Print(frame2.loc['Harry'])#Loc can use the index of th
This time to bring you pandas+dataframe to achieve the choice of row and slice operation, pandas+dataframe to achieve the row and column selection and the attention of the slicing operation, the following is the actual case, take a look.
Select in SQL is selected according to the name of the column, pandas is more flexible, not only can be selected according to the column name, but also according to the column position (number, in the first few rows, note that the position of the Pandas column
203 Mars 25
DataFrame column selection
df[name]
In [31]: df['name']Out[31]: 0 Bob1 Loya2 Denny3 MarsName: name, dtype: object
DataFrame row selection
Df. iloc [0,:] # The first parameter is the row and the second parameter is the column. All column df. iloc [:, 0] # all rows, column 0th
In [32]: df.iloc[0,:]Out[32]: name Bobage 26Name: 0, dtype: objectIn [33]: df.iloc[:,0]Out[33]: 0 Bob1 Loya2 Denny3 MarsN
70000.0Suzhou NaN
The index of Series can be modified locally through replication.
In [27]: obj.index = ['Bob', 'Steve', 'Jeff', 'Ryan']In [28]: objOut[28]: Bob 4Steve 7Jeff -5Ryan 3
DataFrame
Pandas reads files
In [29]: df = pd.read_table('pandas_test.txt',sep=' ', names=['name', 'age'])In [30]: dfOut[30]: name age0 Bob 261 Loya 222 Denny 203 Mars 25
DataFrame column selection
df[name]
In [31]: df['name']Out[31]: 0 Bob1 Loya2 Denny3 MarsName: name, dtype: object
DataFrame row selection
Df.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.