The most by a friend set up a part-time operation of the company, but the need for some part-time staff pay, but due to a part-time wage between the 40~60, so the company adopted the principle is more than 200 to carry out, this rule is equivalent to drop the driver, the withdrawal needs more than 200, Then the problem came, in order to better let a large number of part-time staff can, clearly understand the time period in which they earn a lot of money, this time extended a problem, we need to
the unique value of A, the number of occurrences (a, b) of the unique value of statistics = (1,3) c appears 1 times (A, B) = (2,4) appears 3 times - the Print(Pd.crosstab (df['A'],df['B'],normalize=true))#display in a frequency-based manner - Print('--------') - Print(Pd.crosstab (df['A'],df['B'],values=df['C'],aggfunc=np.sum))#values: A value array based on a factor aggregation - #Aggfunc: If the values array is not passed, the frequency table is computed, and if the array is passed, the calc
Ming 6.0 - Name:price, Dtype:float64 -Zhang San 1.2 theReese 1.0 -Harry 2.3 -Chen Jiu 5.0 -Xiao Ming 6.0 +Name:price, Dtype:float64 In general, we often need to value by column, then Dataframe provides loc and Iloc for everyone to choose from, but the difference is between the two.1 Print(frame2)2 Print(frame2.loc['Harry'])#Loc can use the index of the string type, whereas the Iloc can only be of type int3 Print(frame0.iloc[2])4 out[2]: 5 Color Object Price6Zhang San Blue ball 1.27Reese Green
1. Create a dataframe from a dictionary>>>ImportPandas as PD>>> Dict1 = {'col1': [1,2,5,7],'col2':['a','b','C','D']}>>> DF =PD. DataFrame (Dict1)>>>DF col1 COL201a1 2b2 5C3 7 D2. Create Dataframe from multiple lists (convert the list to a dictionary, then convert the dictionary to dataframe)>>> lista = [1,2,5,7]>>> LISTB = ['a','b','C','D']>>> df = PD. DataFrame ({'col1': Lista,'col2': Listb})>>>DF col1 COL201a1 2b2 5C3 7 DPython Pandas Dataframe oper
Sometimes we can rank and sort series and dataframe based on the size of the index or the size of the value. A, sorting
Pandas provides a Sort_index method that sorts A, series sort 1, sorted by index based on the index of rows or columns in the order of the dictionary.
#定义一个Series
s = Series ([1,2,3],index=["A", "C", "B"])
#对Series的索引进行排序, the default is ascending
print (S.sort_index ())
'
a 1
b 3
C 2
'
Official documents:
Pandas. Dataframe.unstack¶Dataframe. Unstack (Level=-1, fill_value=none) [source]¶
Pivot A level of the (necessarily hierarchical) index labels, returning a DATAFRAME has a new level of column labels WH OSE Inner-most level consists of the pivoted index labels. If The index is not a multiindex, the output would be a Series (the analogue's stack when the columns are not a multiindex (when there is only one row index, the result gene
date belongs to a leap year
Import pandas as PD
Df=pd.read_excel ("C:/users/administrator/desktop/new Microsoft Excel worksheet. xlsx") #读取工作表
DF [Property],df[' Description ']=df[' property Description '].str.split ("", n=1). str# divide by first space
Df.drop ("Property Description ", axis=1,inplace=true) #删除原有的列
df.to_csv (" C:/users/administrator/desktop/new Microsoft Excel Worksheet. csv ", Index=false) #保存为csv, and delete the index
Th
Statistical methods
Pandas objects have some statistical methods. Most of them are reduction and summary statistics that are used to extract a single value from a Series, or to extract a Series from a dataframe row or column.
For example, the Dataframe.mean (axis=0,skipna=true) method, when NA values are present in the dataset, are simply skipped, unless the entire slice (row or column) is all NA, and if you do not want to, you can disable this feat
1. Create Dataframe several ways
1.1
Import Pandas as PD
df1= PD. DataFrame ({' A ': Range (3), ' B ': Range (3)})
2. Traverse a column
L = [Str (v) for V in DF.A]
Print L
3. Common operation
Slice
db= da.loc[:,[' A ', ' B ',]]
Polymerizationdb = Da_38.groupby ([' a ']). SUM ()
Filter
da = da[(da.a==1) | (Da.b==1)]
Add a column
D1[' C '] = d1[' A ']/d1[' B ']
Apply
D2[' C '] = d2[' A '].apply (lambda x:1)
da["B"]=da.a.apply (lambda x:
) pd.read_sql_table (table_name, con, Schema=none, Index_col=none, Coerce_float=true, Parse_dates=none, columns= None, Chunksize=none) For example: data = pd.read_sql_table (table_name = ' t_line ', con = engine,parse_dates = ' time ', Index_col = ' time ', columns = [' A ', ' B ', ' C ']) 3: Read database (via SQL statement or table name) See me through the SQL statement another article: http://www.cnblogs.com/cymwill/articles/7576600.html pd.read_sql (sql, con, index_col=none, Coerce_float=t
Label:Read the contents of the table, as in the following example: ImportMySQLdbTry: Conn= MySQLdb.connect (host='127.0.0.1', user='Root', passwd='Root', db='MyDB', port=3306) DF= Pd.read_sql ('select * from test;', con=conn) Conn.close ()Print "Finish Load DB"
exceptmysqldb.error,e:PrintE.ARGS[1] Write the data to the table, as in the following example DF = PD. DataFrame ([[1,'XXX'],[2,'yyy']],columns=list ('AB'))
Try: Conn= MySQLdb.connect (host='127.0.0.1', user='Root', passwd='Root', db='My
Getting started with Python for data analysis--pandas
Based on the NumPy established
from pandas importSeries,DataFrame,import pandas as pd
One or two kinds of data structure 1. Series
A python-like dictionary with indexes and values
Create a series#不指定索引,默认创建0-NIn [54]: obj = Series([1,2,3,4,5])In [55]: objOut[55]:0
Function Prototypes:Https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.fillna.html#pandas.DataFrame.fillnaPad/ffill: Fills the missing value with the previous non-missing valueBackfill/bfill: Fills the missing value with the next non-missing valueNone: Specify a value to replace the missing value
123456789101112131415161718192021st22232425262728293031323334353637383940414243444546474849505152535455565758596061 62 63
Original: Chapter 7
# usual opening
%matplotlib inline
import pandas as PD
import matplotlib.pyplot as Plt
import NumPy as NP
# make diagram Table bigger and prettier
pd.set_option (' Display.mpl_style ', ' Default ')
plt.rcparams[' figure.figsize '] = (5)
plt.rcparams[' font.family ' = ' sans-serif '
# need to show a lot of columns in Pandas 0.12
# in Pandas
absrtact: This article is mainly in the pandas how to split the string. Let's consider the following scenario.
This is our dataset (data), and you can see that a column (name) in the dataset is a category for an industry. Symbols ' | ' Between industries Segmentation. We're going to use each ' | ' Extract the contents of the partition. Pandas has a step-by-step approach to the place, very convenient.
Import
1. In the dataframe of pandas, we often need to select the rows of a specified condition based on a property, at which point the Isin method is particularly effective.
Import pandas as PD
DF = PD. Dataframe ([[1,2,3],[1,3,4],[2,4,3]],index = [' One ', ' two ', ' three '],columns = [' A ', ' B ', ' C '])
print DF
# A B C
# One 1 2 3
# two 1 3 4
# three 2 4 3
Let's say we choose a row w
Hierarchical Indexes Hierarchical indexing means you can have multiple indexes on an array, for example: a bit like a merged cell in Excel, right?Select a subset of the data based on the index to select a subset of the data from the other layer:Select data in the same way as the index in the layer:Multi-index series conversion to Dataframe hierarchical indexes play an important role in data reshaping and grouping, for example, the hierarchical index data above can be converted to a dataframe:For
Use Python for data analysis _ Pandas _ basic _ 2, _ pandas_2Reindex method of Series reindex
In [15]: obj = Series([3,2,5,7,6,9,0,1,4,8],index=['a','b','c','d','e','f','g', ...: 'h','i','j'])In [16]: obj1 = obj.reindex(['a','b','c','d','e','f','g','h','i','j','k'])In [17]: obj1Out[17]:a 3.0b 2.0c 5.0d 7.0e 6.0f 9.0g 0.0h 1.0i 4.0j 8.0k NaNdtype: float64
If the current value of the new index is missing, interpolatio
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.