dateframe Modify column names in Pandas
The data are as follows:
>>>import pandas as PD
>>>a = PD. Dataframe ({' A ': [1,2,3], ' B ': [4,5,6], ' C ': [7,8,9]})
>>> a
a B C
0 1 4 7
1 2 5 8
2 3 6 91 2 3 4 5 6 7 1 2 3 4 5 6-7
method One: Methods of violence
>>>a.columns = [' A ', ' B ', ' C ']
>>>a
a b c
0 1 4 7
1 2 5 8
2 3 6 91 2 3 4 5 6 1 2 3 4 5-6
But the disadvantage is that you
The is very simple to use when data manipulation is done through the Pandas library, and then a brief instance is written to the CSV file:
In [1]: Import pandas as PD in [2]: data = {' Row1 ': [1,2,3, ' Biubiu '], ' row2 ': [3,1,3, ' Kaka ']} in [3]: Data out[3]: {' row1 ': [1, 2, 3, ' Biubiu '], ' row2 ': [3, 1, 3, ' Kaka ']} in [4]: DATA_DF = PD.
Dataframe (data) in [5]: DATA_DF out[5]: row1 row2 0
import NumPy as NPImport Pandas as PD1 #string Common methods-strip2s = PD. Series (['Jack','Jill','Jease','Feank'])3DF = PD. DataFrame (Np.random.randn (3,2), columns=['Column A','Column B'],index=range (3))4 Print(s)5 Print(df.columns)6 7 Print('----')8 Print(S.str.lstrip (). Values)#Remove the left space9 Print(S.str.rstrip (). Values)#Remove the space on the rightTenDf.columns =Df.columns.str.strip () One Print(Df.columns)Results:0 Jack 1
Notes:Import Pandas as PDFor CSV data files, open with Pd.read_csv (), such as Train_data=pd.read_csv (")Use Train_data.head () to view part of the dataTrain_describe () can get statistics number, get average, variance and other characteristics (of course, for the numeric type of data)For non-numeric types of data (character data), you can use train_data[' here to fill in the statistics of the label '].value_counts () statistical classification number
DF1 is the test data for the DATAFRAME structure:The DF1 data is read from the TEST.XLSX document, using the sample code as follows:#-*-Coding:utf-8-*-import Tushare as Tsimport pandas as Pddf = Pd.read_excel (' test.xlsx ') df1 = Df.head (Ten) #dataframe按索引In ascending order, the default is ascending #print df1.sort_index () #dataframe按索引降序排列 #print df1.sort_index (ascending=false) #第一行按升序排序, which is ascending by default #printDf1.sort_index (Axis=1
1.1. Pandas Analysis steps
Loading data
COUNT the date of the access_time. SQL similar to the following:
SELECT date_format (access_time, '%H '), COUNT (*) from log GROUP by Date_format (access_time, '%H ');
1.2. Code
Cat pd_ng_log_stat.py#!/usr/bin/env python#-*-Coding:utf-8-*-From Ng_line_parser import NglineparserImport Pandas as PDImport socketImport structClass Pdnglogstat (object):def __init__ (s
Tag: Das name root mysq panda from src add creatThe code is as follows:#导入import pymysqlimport pandas as pdfrom sqlalchemy import create_engine#连接mysqlengine = create_engine(‘mysql+pymysql://root:@localhost:3306/lg‘)#查询sql = ‘‘‘ select LGL_LSKU,LGL_NAME_LOCAL,LGL_RAW_ADDRESS from kfc limit 5;‘‘‘df = pd.read_sql_query(sql,engine)print(df)Using the LOC functionsql_1 = ‘‘‘ select LGL_LSKU,LGL_NAME_LOCAL,LGL_RAW_ADDRESS,LGL_EPISODE, LGL_EPISODE_D
Pythonpandas connection to MySQL1, Python and MySQL connection and operation, directly on the code, simple and direct efficiency:1 ImportMySQLdb2 3 Try:4 5conn = MySQLdb.connect (host='localhost', user='Root', passwd='xxxxx', db='Test', charset='UTF8')6 7Cur =conn.cursor ()8 9Cur.execute ('CREATE TABLE User (id int,name varchar )' )Ten One A -Value = [1,'Jkmiao'] - theCur.execute ("INSERT into user values (%s,%s)", value) - - - +Users = [] - + A at forIinchRange (20): - -Use
Hierarchical Indexes Hierarchical indexing means you can have multiple indexes on an array, for example: a bit like a merged cell in Excel, right?Select a subset of the data based on the index to select a subset of the data from the other layer:Select data in the same way as the index in the layer:Multi-index series conversion to Dataframe hierarchical indexes play an important role in data reshaping and grouping, for example, the hierarchical index data above can be converted to a dataframe:For
Use Python for data analysis _ Pandas _ basic _ 2, _ pandas_2Reindex method of Series reindex
In [15]: obj = Series([3,2,5,7,6,9,0,1,4,8],index=['a','b','c','d','e','f','g', ...: 'h','i','j'])In [16]: obj1 = obj.reindex(['a','b','c','d','e','f','g','h','i','j','k'])In [17]: obj1Out[17]:a 3.0b 2.0c 5.0d 7.0e 6.0f 9.0g 0.0h 1.0i 4.0j 8.0k NaNdtype: float64
If the current value of the new index is missing, interpolatio
Use easy_install to install numpy, pandas, matplotlib, and various third-party modules
After one night, I finally set the environment in the question. The following is a brief description, which is reserved for information and shared.
1. Install python. In cmd, you can enter the python environment by adding the python path to the system path.
2. install easy-install (installtools ). Download the appropriate version of the compressed package on
The difference between resample and GroupBy:Resample: Resampling within a given time unitGroupBy: Statistics on a given data entryFunction Prototypes:Dataframe.resample (rule, How=none, axis=0, Fill_method=none, Closed=none, Label=none, convention= ' start ', Kind=None, Loffset=none, Limit=none, base=0)Where the parameters are deprecated.Let's start practicing.Import NumPy as NP import Pandas as PDStart by creating a series with 9 one minute timestamp
Perform:df.shift(-1)Will get:
Index
value1
A
1
B
2
C
3
D
NaN
Freq:dateoffset, Timedelta, or time rule string, optional parameter, the default value is None, applies only to time series, if this parameter exists, it will be moved by the parameter value, and the data value has not changed. For example now there are df1 as follows:
Index
in an arrayThere is values in US alcohol consumption column that is preventing we from converting the column from floats to string S. In order to fix this, we first has the to learn how to replace values. We can replace values in a? NumPy Arrayjust assigning to them with the equals sign.The code above would replace any item in the Alcohol consumption column that contains ' 0 ' (remember that the world alcohol Matrix is all? stringvalues) with ' 10 '.Convert The alcohol consumption column to flo
Series: A one-dimensional array, similar to a one-dimensional array in NumPy. The two are similar to the Python basic data Structure list, the difference is that the elements in the list can be different data types, and the array and series only allow the same data types to be stored, so that more efficient use of memory, improve the efficiency of operations. Time-series: A Series that is indexed in time. DataFrame: A two-dimensional tabular data structure. Many functions are similar to the Data
Using Python for data analysis (13) pandas basics: Data remodeling/axial rotation, pythonpandas Remodeling DefinitionRemodeling refers to re-arranging data, also called axial rotation.DataFrame provides two methods:
Stack: rotate the column of data into rows.
Unstack: "Rotate" data rows as columns.
For example:
Process stack formatThe stack format is also called the long format. Generally, the data stored in the time series in a relational dat
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.