Today, I want to pandas in the row of the operation, looking for a long time to find the relevant functions
First look at a small example
From pandas import Series, dataframe
data = Dataframe ({' K ': [1, 1, 2, 2]})
print data
isduplicated = DATA.DUPL icated ()
print isduplicated
print type (isduplicated)
data = Data.drop_duplicates ()
print data
Delete one or more columns of Pandas Dataframe:method One : Direct del df[' Column-name ']method Two : Using the Drop method, there are three types of equivalent expressions:1. df= df.drop (' column_name ', 1);2. Df.drop (' column_name ', Axis=1, Inplace=true)3. Df.drop ([df.columns[[0,1, 3]], axis=1,inplace=true) # Note:zero indexedNote : Usually there is a inplace optional parameter that modifies the original array and returns a new array. If set to
American Group Shop Evaluation Language Processing and classification (NLP)
The First Data Analysis section
The second visualization section,
This article is the third of the series, text classification
The main use of the package has Jieba,sklearn,pandas, this post mainly uses the word bag model (bag of words), the text in the form of a numerical feature vector (each document constructs a ei
For example we have the dataframe like this: SPY AAPL IBM GOOG GLD2017-01-03 222.073914 114.311760 160.947433 786.140015 110.4700012017-01-04 223.395081 114.183815 162.940125 786.900024 110.8600012017-01-05 223.217606 114.764473 162.401047 794.020020 112.5800022017-01-06 224.016220 116.043915 163.200043 806.150024 111.7500002017-01-09 223.276779 117.106812 161.390244 806.650024 112.669998...Now we only we want to get highli
About Python data analysis in the Pandas module in the output, the middle of each line will have ellipses appear, and lines and lines in the middle of the ellipsis .... Problem, most of the other sites (Baidu) are written blindly, is simply copy paste the previous version, you want to know the answer to other questions you have to read the official documents.1 #!/usr/bin/python2 #-*-coding:utf-8-*-3 ImportN
data conversion refers to filtering, cleaning, and other conversion operations on the data. Remove Duplicate data Repeating rows often appear in the Dataframe, Dataframe provides a duplicated () method to detect whether rows are duplicated, and another drop_duplicates () method to discard duplicate rows:Duplicated () and Drop_duplicates () methods defaultJudging all Columns, if you do not want to, the collection of incoming columns as a parameter can be specified as a column, for example:Dupl
The most by a friend set up a part-time operation of the company, but the need for some part-time staff pay, but due to a part-time wage between the 40~60, so the company adopted the principle is more than 200 to carry out, this rule is equivalent to drop the driver, the withdrawal needs more than 200, Then the problem came, in order to better let a large number of part-time staff can, clearly understand the time period in which they earn a lot of money, this time extended a problem, we need to
Label:Read the contents of the table, as in the following example: ImportMySQLdbTry: Conn= MySQLdb.connect (host='127.0.0.1', user='Root', passwd='Root', db='MyDB', port=3306) DF= Pd.read_sql ('select * from test;', con=conn) Conn.close ()Print "Finish Load DB"
exceptmysqldb.error,e:PrintE.ARGS[1] Write the data to the table, as in the following example DF = PD. DataFrame ([[1,'XXX'],[2,'yyy']],columns=list ('AB'))
Try: Conn= MySQLdb.connect (host='127.0.0.1', user='Root', passwd='Root', db='My
Operating system: Windowspython:3.5Welcome to join the Learning Exchange QQ Group: 657341423
The previous section describes the library of data analysis and mining needs, the most important of which is pandas,matplotlib.Pandas: Mainly on data analysis, calculation and statistics, such as the average, square bad.Matplotlib: The main combination of pandas to generate images. Both are often used in combination
Import NumPy as NP import pandas as PD from pandas import series,dataframe ' If copied code, error syntaxerror:invalid character
In identifier, there is a space for the Chinese symbol in the copied code. "DATA=PD." Dataframe (Np.arange (6). Reshape ((3,2)), INDEX=PD. Index ([' A ', ' B ', ' C '],name= ' state '), COLUMNS=PD. Index ([' I ', ' II '],name= ')] Print
1.1. Pandas Analysis steps
Loading data
COUNT the date of the access_time. SQL similar to the following:
SELECT date_format (access_time, '%H '), COUNT (*) from log GROUP by Date_format (access_time, '%H ');
1.2. Code
Cat pd_ng_log_stat.py#!/usr/bin/env python#-*-Coding:utf-8-*-From Ng_line_parser import NglineparserImport Pandas as PDImport socketImport str
Pythonpandas connection to MySQL1, Python and MySQL connection and operation, directly on the code, simple and direct efficiency:1 ImportMySQLdb2 3 Try:4 5conn = MySQLdb.connect (host='localhost', user='Root', passwd='xxxxx', db='Test', charset='UTF8')6 7Cur =conn.cursor ()8 9Cur.execute ('CREATE TABLE User (id int,name varchar )' )Ten One A -Value = [1,'Jkmiao'] - theCur.execute ("INSERT into user values (%s,%s)", value) - - - +Users = []
values appearDf.boxplot (column= ' label 1 ', by = ' Label 2 ')Plt.show ()The data under label 1 can then be plotted in a numerical distribution according to label 2As indicated below, it has been classified according to the level of education, high-level wage extremes, and other conclusions can be obtainedNote: When you want to paint, the individual input drawing instructions can not display graphics, then you need to enter Plt.show () on another line, condition: import Matplotlib.pyplot as Pl
ImportOSImportPandas as PDImportMatplotlib.pyplot as PltdefTest_run (): start_date='2017-01-01'End_data='2017-12-15'dates=Pd.date_range (start_date, End_data)#Create an empty data frameDF=PD. DataFrame (index=dates) Symbols=['SPY','AAPL','IBM','GOOG','GLD'] forSymbolinchsymbols:temp=getadjcloseforsymbol (symbol) DF=df.join (temp, how='Inner') returnDF def Normalize_data (DF): "" " normalize stock prices using the first row of the DATAFR Ame " " " df=df/df.ix[0,:] return DF defGetadj
Close 2017-11-24 260.359985 2017-11-27 260.230011 2017-11-28 262.869995"""if __name__=='__main__': Test_run ()There is a simpy-to-drop the data which index is not present in Dspy:Df1=df1.join (Dspy, how='inner')We can also rename the ' Adj Close ' to prevent conflicts: # Rename the column Dspy=dspy.rename (columns={'Adj Close'SPY'})Load More stocks:ImportPandas as PDdefTest_run (): start_date='2017-11-24'End_data='2017-11-28'dates=Pd.date_range (start_date, End_data)#Create an empty data
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.