import NumPy as NPImport Pandas as PD1 #string Common methods-strip2s = PD. Series (['Jack','Jill','Jease','Feank'])3DF = PD. DataFrame (Np.random.randn (3,2), columns=['Column A','Column B'],index=range (3))4 Print(s)5 Print(df.columns)6 7 Print('----')8 Print(S.str.lstrip (). Values)#Remove the left space9 Print(S.str.rstrip (). Values)#Remove the space on the rightTenDf.columns =Df.columns.str.strip () One Print(Df.columns)Results:0 Jack 1
Notes:Import Pandas as PDFor CSV data files, open with Pd.read_csv (), such as Train_data=pd.read_csv (")Use Train_data.head () to view part of the dataTrain_describe () can get statistics number, get average, variance and other characteristics (of course, for the numeric type of data)For non-numeric types of data (character data), you can use train_data[' here to fill in the statistics of the label '].value_counts () statistical classification number
DF1 is the test data for the DATAFRAME structure:The DF1 data is read from the TEST.XLSX document, using the sample code as follows:#-*-Coding:utf-8-*-import Tushare as Tsimport pandas as Pddf = Pd.read_excel (' test.xlsx ') df1 = Df.head (Ten) #dataframe按索引In ascending order, the default is ascending #print df1.sort_index () #dataframe按索引降序排列 #print df1.sort_index (ascending=false) #第一行按升序排序, which is ascending by default #printDf1.sort_index (Axis=1
1.1. Pandas Analysis steps
Loading data
COUNT the date of the access_time. SQL similar to the following:
SELECT date_format (access_time, '%H '), COUNT (*) from log GROUP by Date_format (access_time, '%H ');
1.2. Code
Cat pd_ng_log_stat.py#!/usr/bin/env python#-*-Coding:utf-8-*-From Ng_line_parser import NglineparserImport Pandas as PDImport socketImport structClass Pdnglogstat (object):def __init__ (s
Tag: Das name root mysq panda from src add creatThe code is as follows:#导入import pymysqlimport pandas as pdfrom sqlalchemy import create_engine#连接mysqlengine = create_engine(‘mysql+pymysql://root:@localhost:3306/lg‘)#查询sql = ‘‘‘ select LGL_LSKU,LGL_NAME_LOCAL,LGL_RAW_ADDRESS from kfc limit 5;‘‘‘df = pd.read_sql_query(sql,engine)print(df)Using the LOC functionsql_1 = ‘‘‘ select LGL_LSKU,LGL_NAME_LOCAL,LGL_RAW_ADDRESS,LGL_EPISODE, LGL_EPISODE_D
Pythonpandas connection to MySQL1, Python and MySQL connection and operation, directly on the code, simple and direct efficiency:1 ImportMySQLdb2 3 Try:4 5conn = MySQLdb.connect (host='localhost', user='Root', passwd='xxxxx', db='Test', charset='UTF8')6 7Cur =conn.cursor ()8 9Cur.execute ('CREATE TABLE User (id int,name varchar )' )Ten One A -Value = [1,'Jkmiao'] - theCur.execute ("INSERT into user values (%s,%s)", value) - - - +Users = [] - + A at forIinchRange (20): - -Use
Just urllib2.unquote_plus decoding is not enough, you need to remove the special charactersIllegal_characters_re = Re.compile (R ' [\000-\010]|[ \013-\014]| [\016-\037]|\XEF|\XBF ')Value = Illegal_characters_re.sub ("', Origin_value)Due to the existence of \XEF|\XBF, resulting in string garbled, check this is as Utf-8 BOM existence, need to filter out.Bom:https://en.wikipedia.org/wiki/byte_order_mark#utf-8ASCII characters:http://donsnotes.com/tech/charsets/ascii.htmlThen, it worked for me.PYTHON
ImportOSImportPandas as PDImportMatplotlib.pyplot as PltdefTest_run (): start_date='2017-01-01'End_data='2017-12-15'dates=Pd.date_range (start_date, End_data)#Create an empty data frameDF=PD. DataFrame (index=dates) Symbols=['SPY','AAPL','IBM','GOOG','GLD'] forSymbolinchsymbols:temp=getadjcloseforsymbol (symbol) DF=df.join (temp, how='Inner') returnDF def Normalize_data (DF): "" " normalize stock prices using the first row of the DATAFR Ame " " " df=df/df.ix[0,:] return DF defGetadj
Close 2017-11-24 260.359985 2017-11-27 260.230011 2017-11-28 262.869995"""if __name__=='__main__': Test_run ()There is a simpy-to-drop the data which index is not present in Dspy:Df1=df1.join (Dspy, how='inner')We can also rename the ' Adj Close ' to prevent conflicts: # Rename the column Dspy=dspy.rename (columns={'Adj Close'SPY'})Load More stocks:ImportPandas as PDdefTest_run (): start_date='2017-11-24'End_data='2017-11-28'dates=Pd.date_range (start_date, End_data)#Create an empty data
This question mainly writes the method of sorting series and dataframe according to index or value
Code:
#coding =utf-8
Import pandas as PD
import numpy as NP
#以下实现排序功能.
SERIES=PD. Series ([3,4,1,6],index=[' B ', ' A ', ' d ', ' C '])
FRAME=PD. Dataframe ([[2,4,1,5],[3,1,4,5],[5,1,4,2]],columns=[' B ', ' A ', ' d ', ' C '],index=[' one ', ' two ', ' three '])
print
the frame print series
print ' series is sorted by index: '
print series.sort_index ()
Use Astype as follows:
Df[[column]] = Df[[column]].astype (type)
1
1
Type is an int, float, and so on.
Example:
Import pandas as PD
data = PD. Dataframe ([[1, "2"], [2, "2"]])
data.columns = ["One", "two"]
print (data)
# Current type print ("----\ n modified before type:") print (data.dtypes) # type conversion data [["two"]] = da
The main tasks of data preprocessing are:
First, data preprocessing
1. Data cleaning
2. Data integration
3. Data Conversion
4. Data reduction
1. Data cleaningReal-world data is generally incomplete, noisy, and inconsistent. The data cleanup routine attempts to populate the missing values, smoothing the noise and identifying outliers, and correcting inconsistencies in the data.
(The data used above)
① Ignore tuples: This is usually done when the class label is missing. This method is not effe
convert to a format that can be found using XPath
= Doc.xpath ('//table ')
find all the tables in the document and return a list
Let's look at the source code of the Web page and find the form that needs to be retrieved
The first behavior title of the table, the following behavior data, we define a function to get them separately:
def _unpack (Row, kind= ' TD '):
ELTs = Row.xpath ('.//%s '%kind)
# Get data based on label type return
[Val.text_content () For Val in ELTs]
# Use
At the time of data processing, especially in the big data contest, often encounter a problem is that multiple forms of merging problems, such as a form has user_id and age two fields, another form has user_id and sex two fields, to merge these two tables into only user_id, Age, sex three fields of the table what to do, the ordinary stitching is not possible, because user_id each row is not the corresponding, like the building blocks of horizontal stitching is certainly not. There is a merge fun
Mysql
Index, or the query is slow
Note whether the time type will be refreshed after update
Design Logic Delete Enable
NULL, string numeric operation with function Ifnull (total,0), design-time default value
String type (if it contains non-pure numeric data), must be quoted
Default value, non-null value must be assigned in advance (TO_SQL)
Plus and minus if there is a precision problem, use ABS () > Accuracy error
Pandas+mysql
Tags: Establish connection copy TOC UTF8 identify Data-nec LDB serviceWrites pandas's dataframe data to the MySQL database + sqlalchemy [Python]View PlainCopyprint?
IMPORTNBSP;PANDASNBSP;ASNBSP;PDNBSP;NBSP;
fromsqlalchemyimportcreate_engine
NBSP;NBSP;
# #将数据写入mysql的数据库, However, you need to establish a connection through Sqlalchemy.create_engine, and the character encoding is set to UTF8, otherwise some Latin characters cannot handle
' mysql+mysqldb://roo
1. Python and MySQL connection and operation, directly on the code, simple and direct efficiency:Import MySQLdbTry: Conn= MySQLdb.connect (host='localhost', user='Root', passwd='xxxxx', db='Test', charset='UTF8') cur=conn.cursor () Cur.execute ('CREATE TABLE User (id int,name varchar )') Value= [1,'Jkmiao'] Cur.execute ("INSERT into user values (%s,%s)", value) users= [] forIinchRange -): Users.append ((i,"User"+str (i))) Cur.executemany ("INSERT into user values (%s,%s)", users) Cur.execute
The function pandas.pivot_table can be used to create spreadsheet-style pivot tables.It takes a number of argumentsData: A DataFrame ObjectValues: A column or a list of columns to aggregateIndex: A column, Grouper, array which has the same length as
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.