Array,list,dataframe Index Tile Operation July 19, 2016--smart wave documentA simple discussion on list, one-dimensional, two-dimensional array,datafrme,loc, Iloc and IXNumPy an array of indexes and tiles:Starting with the most basic list index, let's start with a code and result:a = [0,1,2,3,4,5,6,7,8,9] a[:5:-1] #step Output:[9, 8, 7, 6][][1, 0]List slice, in "[]" There are generally two ":" Delimiter, Chinese meaning is [start: End: Step] In the
Republicans ), the data used now is the voting records of these members. Each row represents a member's situation (party-party, D stands for the Republican party, R stands for the Democratic party, and I stands for the non-partisan party, the third column represents the vote of a certain bill. 1 stands for favor, 0 stands for opposition, and 0.5 stands for waiver)
import pandasvotes = pandas.read_csv('114_congress.csv')
Print (votes ["party"]. value_counts ())
From sklearn. metrics. pairwise i
Function Prototypes:Https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.fillna.html#pandas.DataFrame.fillnaPad/ffill: Fills the missing value with the previous non-missing valueBackfill/bfill: Fills the missing value with the next non-missing valueNone: Specify a value to replace the missing value
123456789101112131415161718192021st22232425262728293031323334353637383940414243444546474849505152535455565758596061 62 63 64 65 66 67 68 69 70 71 72 73 74
Dataframe Data Filter--loc,iloc,ix,at,iat condition Filter Single condition filter Select a record with a value greater than N for the col1 column: data[data[' col1 ']>n] filters the col1 column for records with a value greater than N, but displays col2, Col3 column value: data[[' col2 ', ' col3 ']][data[' col1 ']>n] Select a specific row: Use the Isin function to filter records based on specific values. Filter col1 value equals record of element in l
. Locvs Iloc (https://stackoverflow.com/questions/28757389/ loc-vs-iloc-vs-ix-vs-at-vs-iat/47098873#47098873), you may want to see another explanation.After studying these two parts, you should be able to understand a DataFrame and a Series component, and also understand how to select different subsets from the data. You can now read "10 minutes to pandas" for a broad overview of more useful operations. As
(' Frequency ') print (' Number of draws left:%d, posterior mean:%.3 F, posterior median:%.3f, posterior 95%% quantile interval:%.3f-%.3f '% (len (posterior), Posterior.mean (), posterior.med Ian (), Posterior.quantile (. 025), Posterior.quantile (. 975))) ds_n_trials = Int (dominic_smith_spring[[' AB ', ' H ']].iloc [-1] [0]) ds_k_success = Int (dominic_smith_spring[[' AB ', ' H ']].iloc[-1][1]) posterior
The pandas Series is much more powerful than the numpy array , in many waysFirst, the pandas Series has some methods, such as:The describe method can give some analysis data of Series :Import= PD. Series ([1,2,3,4]) d = s.describe ()Print (d)Count 4.000000mean 2.500000std 1.290994min 1.00000025% 1.75000050% 2.50000075% 3.250000max 4.000000dtype:float64Second, the biggest difference between the Pandas series and the numpy array is that the Pandas series has a
. Global mean value2. Average value of items3. User mean value4. User classification-item mean value5. Item Classification-user average6. User Active Degree7. Item Active Degree8. Improved user activity9. Improved item active degree...The common feature of such models is to classify users and objects by designing the clustering method, and to use the average value of similar items to predict the user's score. In addition, the realization of the model has a basic understanding of the characterist
Registration channel '),
index=df.index,columns=[' customer registration channel ', ' Size '])
Disaggregated
Data ExtractionExtract by label
Loc Function
Df.loc[0:3]
Extract 0-3 rows of data
Extract by Date
# Reset Index
df.reset_index ()
#设置日期为索引
df=df.set_index (' date ')
#提取2016年11月2号的数据
df[' 2016-11-2 ': ' 2016-11-02 ']
November 2 Data extracted by location (Iloc function) by region
Df.iloc[:4,: 5]
4 rows, 5 columns, extracted b
This article brings the content is about Python pandas in-depth understanding (code example), there is a certain reference value, the need for friends can refer to, I hope to help you.
First, screening
First, create a 6X4 matrix data.
Dates = Pd.date_range (' 20180830 ', periods=6) df = PD. DataFrame (Np.arange) reshape ((6,4)), index=dates, columns=[' A ', ' B ', ' C ', ' D ']) print (DF)
Print:
A B C d2018-08-30 0 1 2 32018-08-31 4 5 6 72018-09-01 8 9 1
to row No. 0 all columns df.iloc[:,0] #全部行, No. 0 column
in [+]: df.iloc[0,:]out[32]: Name Bobage 26name:0, Dtype:objectin [all]: df.iloc[:,0]out[33]: 0 Bob1 Loya2 Denny3 Marsna Me:name, Dtype:object
Gets an element that can be iloc, faster by the IAT
In [the]: df.iloc[1,1]out[34]: 22In []: df.iat[1,1]out[35]: 22
Dataframe Block Selection
In [approx]: df.loc[1:2,[' name ', ' age ']]out[36]: name Age1 Loya 222 Denny 20
Filter rows by criteria
To filt
A, b column;
a.loc[[' One ', ' two '],[' a ', ' B '] means to select the ' One ' and ' both ' lines and columns as a-b column;
a.loc[' One ', ' a '] has the same effect as a.loc[[' a '],[' a '], but the former only displays the corresponding values, and the latter displays the corresponding row and column labels.
3.iloc selects the data directly from the location.This is similar to selecting by labelA.iloc[1:2,1:2] Displays the data for the first col
30000.0Shanghai 35000.0Suzhou Nanin [+]: Obj3 + obj4out[26]: Beijing 80000.0Hangzhou 60000.0Nanjing Nanshanghai 70000.0Suzhou NaN
The index of a series can be modified in-place by copying
in [+]: Obj.index = [' Bob ', ' Steve ', ' Jeff ', ' Ryan ']in []: objout[28]: Bob 4Steve 7jeff-5ryan 3
DataFrame
Pandas reading files
in [+]: df = pd.read_table (' pandas_test.txt ', sep= ', names=[' name ', ' age ')) in [+]: dfout[30]: name age0 Bob 261 Loy A 222 Denny 203 Mars 25
Dataframe Column Select
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.