date belongs to a leap year
Import pandas as PD
Df=pd.read_excel ("C:/users/administrator/desktop/new Microsoft Excel worksheet. xlsx") #读取工作表
DF [Property],df[' Description ']=df[' property Description '].str.split ("", n=1). str# divide by first space
Df.drop ("Property Description ", axis=1,inplace=true) #删除原有的列
df.to_csv (" C:/users/administrator/desktop/new Microsoft Excel Worksheet. csv ", Index=false) #保存为csv, and delete the index
Th
Statistical methods
Pandas objects have some statistical methods. Most of them are reduction and summary statistics that are used to extract a single value from a Series, or to extract a Series from a dataframe row or column.
For example, the Dataframe.mean (axis=0,skipna=true) method, when NA values are present in the dataset, are simply skipped, unless the entire slice (row or column) is all NA, and if you do not want to, you can disable this feat
1. Create Dataframe several ways
1.1
Import Pandas as PD
df1= PD. DataFrame ({' A ': Range (3), ' B ': Range (3)})
2. Traverse a column
L = [Str (v) for V in DF.A]
Print L
3. Common operation
Slice
db= da.loc[:,[' A ', ' B ',]]
Polymerizationdb = Da_38.groupby ([' a ']). SUM ()
Filter
da = da[(da.a==1) | (Da.b==1)]
Add a column
D1[' C '] = d1[' A ']/d1[' B ']
Apply
D2[' C '] = d2[' A '].apply (lambda x:1)
da["B"]=da.a.apply (lambda x:
) pd.read_sql_table (table_name, con, Schema=none, Index_col=none, Coerce_float=true, Parse_dates=none, columns= None, Chunksize=none) For example: data = pd.read_sql_table (table_name = ' t_line ', con = engine,parse_dates = ' time ', Index_col = ' time ', columns = [' A ', ' B ', ' C ']) 3: Read database (via SQL statement or table name) See me through the SQL statement another article: http://www.cnblogs.com/cymwill/articles/7576600.html pd.read_sql (sql, con, index_col=none, Coerce_float=t
Label:Read the contents of the table, as in the following example: ImportMySQLdbTry: Conn= MySQLdb.connect (host='127.0.0.1', user='Root', passwd='Root', db='MyDB', port=3306) DF= Pd.read_sql ('select * from test;', con=conn) Conn.close ()Print "Finish Load DB"
exceptmysqldb.error,e:PrintE.ARGS[1] Write the data to the table, as in the following example DF = PD. DataFrame ([[1,'XXX'],[2,'yyy']],columns=list ('AB'))
Try: Conn= MySQLdb.connect (host='127.0.0.1', user='Root', passwd='Root', db='My
Just urllib2.unquote_plus decoding is not enough, you need to remove the special charactersIllegal_characters_re = Re.compile (R ' [\000-\010]|[ \013-\014]| [\016-\037]|\XEF|\XBF ')Value = Illegal_characters_re.sub ("', Origin_value)Due to the existence of \XEF|\XBF, resulting in string garbled, check this is as Utf-8 BOM existence, need to filter out.Bom:https://en.wikipedia.org/wiki/byte_order_mark#utf-8ASCII characters:http://donsnotes.com/tech/charsets/ascii.htmlThen, it worked for me.PYTHON
ImportOSImportPandas as PDImportMatplotlib.pyplot as PltdefTest_run (): start_date='2017-01-01'End_data='2017-12-15'dates=Pd.date_range (start_date, End_data)#Create an empty data frameDF=PD. DataFrame (index=dates) Symbols=['SPY','AAPL','IBM','GOOG','GLD'] forSymbolinchsymbols:temp=getadjcloseforsymbol (symbol) DF=df.join (temp, how='Inner') returnDF def Normalize_data (DF): "" " normalize stock prices using the first row of the DATAFR Ame " " " df=df/df.ix[0,:] return DF defGetadj
Close 2017-11-24 260.359985 2017-11-27 260.230011 2017-11-28 262.869995"""if __name__=='__main__': Test_run ()There is a simpy-to-drop the data which index is not present in Dspy:Df1=df1.join (Dspy, how='inner')We can also rename the ' Adj Close ' to prevent conflicts: # Rename the column Dspy=dspy.rename (columns={'Adj Close'SPY'})Load More stocks:ImportPandas as PDdefTest_run (): start_date='2017-11-24'End_data='2017-11-28'dates=Pd.date_range (start_date, End_data)#Create an empty data
This question mainly writes the method of sorting series and dataframe according to index or value
Code:
#coding =utf-8
Import pandas as PD
import numpy as NP
#以下实现排序功能.
SERIES=PD. Series ([3,4,1,6],index=[' B ', ' A ', ' d ', ' C '])
FRAME=PD. Dataframe ([[2,4,1,5],[3,1,4,5],[5,1,4,2]],columns=[' B ', ' A ', ' d ', ' C '],index=[' one ', ' two ', ' three '])
print
the frame print series
print ' series is sorted by index: '
print series.sort_index ()
Use Astype as follows:
Df[[column]] = Df[[column]].astype (type)
1
1
Type is an int, float, and so on.
Example:
Import pandas as PD
data = PD. Dataframe ([[1, "2"], [2, "2"]])
data.columns = ["One", "two"]
print (data)
# Current type print ("----\ n modified before type:") print (data.dtypes) # type conversion data [["two"]] = da
The main tasks of data preprocessing are:
First, data preprocessing
1. Data cleaning
2. Data integration
3. Data Conversion
4. Data reduction
1. Data cleaningReal-world data is generally incomplete, noisy, and inconsistent. The data cleanup routine attempts to populate the missing values, smoothing the noise and identifying outliers, and correcting inconsistencies in the data.
(The data used above)
① Ignore tuples: This is usually done when the class label is missing. This method is not effe
convert to a format that can be found using XPath
= Doc.xpath ('//table ')
find all the tables in the document and return a list
Let's look at the source code of the Web page and find the form that needs to be retrieved
The first behavior title of the table, the following behavior data, we define a function to get them separately:
def _unpack (Row, kind= ' TD '):
ELTs = Row.xpath ('.//%s '%kind)
# Get data based on label type return
[Val.text_content () For Val in ELTs]
# Use
At the time of data processing, especially in the big data contest, often encounter a problem is that multiple forms of merging problems, such as a form has user_id and age two fields, another form has user_id and sex two fields, to merge these two tables into only user_id, Age, sex three fields of the table what to do, the ordinary stitching is not possible, because user_id each row is not the corresponding, like the building blocks of horizontal stitching is certainly not. There is a merge fun
Mysql
Index, or the query is slow
Note whether the time type will be refreshed after update
Design Logic Delete Enable
NULL, string numeric operation with function Ifnull (total,0), design-time default value
String type (if it contains non-pure numeric data), must be quoted
Default value, non-null value must be assigned in advance (TO_SQL)
Plus and minus if there is a precision problem, use ABS () > Accuracy error
Pandas+mysql
This article brings the content is about Python pandas in-depth understanding (code example), there is a certain reference value, the need for friends can refer to, I hope to help you.
First, screening
First, create a 6X4 matrix data.
Dates = Pd.date_range (' 20180830 ', periods=6) df = PD. DataFrame (Np.arange) reshape ((6,4)), index=dates, columns=[' A ', ' B ', ' C ', ' D ']) print (DF)
Print:
A B C d2018-08-30 0 1 2 320
Tags: Establish connection copy TOC UTF8 identify Data-nec LDB serviceWrites pandas's dataframe data to the MySQL database + sqlalchemy [Python]View PlainCopyprint?
IMPORTNBSP;PANDASNBSP;ASNBSP;PDNBSP;NBSP;
fromsqlalchemyimportcreate_engine
NBSP;NBSP;
# #将数据写入mysql的数据库, However, you need to establish a connection through Sqlalchemy.create_engine, and the character encoding is set to UTF8, otherwise some Latin characters cannot handle
' mysql+mysqldb://roo
1. Python and MySQL connection and operation, directly on the code, simple and direct efficiency:Import MySQLdbTry: Conn= MySQLdb.connect (host='localhost', user='Root', passwd='xxxxx', db='Test', charset='UTF8') cur=conn.cursor () Cur.execute ('CREATE TABLE User (id int,name varchar )') Value= [1,'Jkmiao'] Cur.execute ("INSERT into user values (%s,%s)", value) users= [] forIinchRange -): Users.append ((i,"User"+str (i))) Cur.executemany ("INSERT into user values (%s,%s)", users) Cur.execute
Data sources see the front of a few essaysSort one of the columnsData.high.sort_values (ascending=False) data.high.sort_values (Ascending=True) data[' High ']. Sort_values (ascending=False) data['high'].sort_values (ascending=true)p =
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.