Delete one or more columns of Pandas Dataframe:method One : Direct del df[' Column-name ']method Two : Using the Drop method, there are three types of equivalent expressions:1. df= df.drop (' column_name ', 1);2. Df.drop (' column_name ', Axis=1, Inplace=true)3. Df.drop ([df.columns[[0,1, 3]], axis=1,inplace=true) # Note:zero indexedNote : Usually there is a inplace optional parameter that modifies the original array and returns a new array. If set to
I believe many people like me in the process of learning Python,pandas data selection and modification has a great deal of confusion (perhaps by the Matlab) impact ...
To this day finally completely figure out ...
Let's start with a data box manually.
Import NumPy as NP
import pandas as PD
DF = PD. Dataframe (Np.arange (0,60,2). Reshape (10,3), columns=list (' a
label as a numpy array of Python objects
Int64index
Special index for integers
Multiindex
A hierarchical Index object that represents a multi-level index on a single axis. Can be seen as an array of tuples
Datetimeindex
Memory nanosecond timestamp (denoted by NumPy's Datetime64 type)
Periodindex
Special index for period data (time interval)
2.2.d.1 Primary Inde
Let me briefly introduce the two commonly used data structures, series and daraframe in Python, which are defined by the Pandas module. The series is similar to dict in Python, but is structured, and dataframe is similar to a table in a database.1.pandas basic data Structure-pandas
About Python data analysis in the Pandas module in the output, the middle of each line will have ellipses appear, and lines and lines in the middle of the ellipsis .... Problem, most of the other sites (Baidu) are written blindly, is simply copy paste the previous version, you want to know the answer to other questions you have to read the official documents.1 #!/usr/bin/python2 #-*-coding:utf-8-*-3 ImportN
data conversion refers to filtering, cleaning, and other conversion operations on the data. Remove Duplicate data Repeating rows often appear in the Dataframe, Dataframe provides a duplicated () method to detect whether rows are duplicated, and another drop_duplicates () method to discard duplicate rows:Duplicated () and Drop_duplicates () methods defaultJudging all Columns, if you do not want to, the collection of incoming columns as a parameter can be specified as a column, for example:Dupl
The most by a friend set up a part-time operation of the company, but the need for some part-time staff pay, but due to a part-time wage between the 40~60, so the company adopted the principle is more than 200 to carry out, this rule is equivalent to drop the driver, the withdrawal needs more than 200, Then the problem came, in order to better let a large number of part-time staff can, clearly understand the time period in which they earn a lot of money, this time extended a problem, we need to
the unique value of A, the number of occurrences (a, b) of the unique value of statistics = (1,3) c appears 1 times (A, B) = (2,4) appears 3 times - the Print(Pd.crosstab (df['A'],df['B'],normalize=true))#display in a frequency-based manner - Print('--------') - Print(Pd.crosstab (df['A'],df['B'],values=df['C'],aggfunc=np.sum))#values: A value array based on a factor aggregation - #Aggfunc: If the values array is not passed, the frequency table is computed, and if the array is passed, the calc
Label:Read the contents of the table, as in the following example: ImportMySQLdbTry: Conn= MySQLdb.connect (host='127.0.0.1', user='Root', passwd='Root', db='MyDB', port=3306) DF= Pd.read_sql ('select * from test;', con=conn) Conn.close ()Print "Finish Load DB"
exceptmysqldb.error,e:PrintE.ARGS[1] Write the data to the table, as in the following example DF = PD. DataFrame ([[1,'XXX'],[2,'yyy']],columns=list ('AB'))
Try: Conn= MySQLdb.connect (host='127.0.0.1', user='Root', passwd='Root', db='My
Objective
Pandas is a numpy built with more advanced data structures and tools than the NumPy core is the Ndarray,pandas is also centered around Series and dataframe two core data structures. Series and Dataframe correspond to one-dimensional sequence and two-dimensional table structure respectively. Pandas's conventional approach to importing is as follows:
From
Getting started with Python for data analysis--pandas
Based on the NumPy established
from pandas importSeries,DataFrame,import pandas as pd
One or two kinds of data structure 1. Series
A python-like dictionary with indexes and values
Excel has a computational function skew () for skewness, but it is unclear how to traverse with Excel, which has a large amount of data.Try using Python for resolution.The first time to learn python, did not expect to overcome the installation of various packages of sadness, incredibly successful implementation.python3.3:#this is a test case#-*-coding:gbk-*-print ("Hello
1.1. Pandas Analysis steps
Loading data
COUNT the date of the access_time. SQL similar to the following:
SELECT date_format (access_time, '%H '), COUNT (*) from log GROUP by Date_format (access_time, '%H ');
1.2. Code
Cat pd_ng_log_stat.py#!/usr/bin/env python#-*-Coding:utf-8-*-From Ng_line_parser import NglineparserImport Pandas as PDImport socketImport str
Pythonpandas connection to MySQL1, Python and MySQL connection and operation, directly on the code, simple and direct efficiency:1 ImportMySQLdb2 3 Try:4 5conn = MySQLdb.connect (host='localhost', user='Root', passwd='xxxxx', db='Test', charset='UTF8')6 7Cur =conn.cursor ()8 9Cur.execute ('CREATE TABLE User (id int,name varchar )' )Ten One A -Value = [1,'Jkmiao'] - theCur.execute ("INSERT into user values (%s,%s)", value) - - - +Users = []
NaNDtype:object----[' A ', ' B ', ' C ']0 A1 12 NaN3 NaNDtype:object0 b1 22 NaN3 NaNDtype:object0 10 A B,c1 1 2,32 Nan Nan3 Nan Nan0 10 A, b C1 32 Nan Nan3 Nan NanDataframe0 a-b-c1 1-2-c2 [,-,-,]Name:key2, Dtype:object0 [A, b, c]1 [1, 2, C]2 NaNName:key2, Dtype:object#String Indexs = PD. Series (['A','b','C','Bbhello','123', Np.nan,'HJ']) DF= PD. DataFrame ({'Key1': List ('abcdef'), 'Key2':['Hee','FV','W','Hjja','123', Np.nan]})Print(S,'\ n-----')Print(S.str[0])#take the first
values appearDf.boxplot (column= ' label 1 ', by = ' Label 2 ')Plt.show ()The data under label 1 can then be plotted in a numerical distribution according to label 2As indicated below, it has been classified according to the level of education, high-level wage extremes, and other conclusions can be obtainedNote: When you want to paint, the individual input drawing instructions can not display graphics, then you need to enter Plt.show () on another line, condition: import Matplotlib.pyplot as Pl
ImportOSImportPandas as PDImportMatplotlib.pyplot as PltdefTest_run (): start_date='2017-01-01'End_data='2017-12-15'dates=Pd.date_range (start_date, End_data)#Create an empty data frameDF=PD. DataFrame (index=dates) Symbols=['SPY','AAPL','IBM','GOOG','GLD'] forSymbolinchsymbols:temp=getadjcloseforsymbol (symbol) DF=df.join (temp, how='Inner') returnDF def Normalize_data (DF): "" " normalize stock prices using the first row of the DATAFR Ame " " " df=df/df.ix[0,:] return DF defGetadj
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.