Objective
Pandas is a numpy built with more advanced data structures and tools than the NumPy core is the Ndarray,pandas is also centered around Series and dataframe two core data structures. Series and Dataframe correspond to one-dimensional sequence and two-dimensional table structure respectively. Pandas's conventional approach to importing is as follows:
From
Getting started with Python for data analysis--pandas
Based on the NumPy established
from pandas importSeries,DataFrame,import pandas as pd
One or two kinds of data structure 1. Series
A python-like dictionary with indexes and values
Two data structure series and dataframe.SeriesThe series is the same as a list in Python, with data and index values.Here we create a series object. Data values and indexes for series objects:The index of the list starts at 0, and the series is indexed by default, similar to the list starting with 0. However, you can also customize the index:Indexes can be redefined:Operation elements according to index:Series is also used in the form of dictionaries:
Hierarchical Indexes Hierarchical indexing means you can have multiple indexes on an array, for example: a bit like a merged cell in Excel, right?Select a subset of the data based on the index to select a subset of the data from the other layer:Select data in the same way as the index in the layer:Multi-index series conversion to Dataframe hierarchical indexes play an important role in data reshaping and grouping, for example, the hierarchical index data above can be converted to a dataframe:For
Use Python for data analysis _ Pandas _ basic _ 2, _ pandas_2Reindex method of Series reindex
In [15]: obj = Series([3,2,5,7,6,9,0,1,4,8],index=['a','b','c','d','e','f','g', ...: 'h','i','j'])In [16]: obj1 = obj.reindex(['a','b','c','d','e','f','g','h','i','j','k'])In [17]: obj1Out[17]:a 3.0b 2.0c 5.0d 7.0e 6.0f 9.0g 0.0h 1.0i 4.0j 8.0k NaNdtype: float64
If the current va
Using Python for data analysis (13) pandas basics: Data remodeling/axial rotation, pythonpandas Remodeling DefinitionRemodeling refers to re-arranging data, also called axial rotation.DataFrame provides two methods:
Stack: rotate the column of data into rows.
Unstack: "Rotate" data rows as columns.
For example:
Process stack formatThe stack format is also called the long format. Generally, the data
For example we have the dataframe like this: SPY AAPL IBM GOOG GLD2017-01-03 222.073914 114.311760 160.947433 786.140015 110.4700012017-01-04 223.395081 114.183815 162.940125 786.900024 110.8600012017-01-05 223.217606 114.764473 162.401047 794.020020 112.5800022017-01-06 224.016220 116.043915 163.200043 806.150024 111.7500002017-01-09 223.276779 117.106812 161.390244 806.650024 112.669998...Now we only we want to get highli
Excel has a computational function skew () for skewness, but it is unclear how to traverse with Excel, which has a large amount of data.Try using Python for resolution.The first time to learn python, did not expect to overcome the installation of various packages of sadness, incredibly successful implementation.python3.3:#this is a test case#-*-coding:gbk-*-print ("Hello
1.1. Pandas Analysis steps
Loading data
COUNT the date of the access_time. SQL similar to the following:
SELECT date_format (access_time, '%H '), COUNT (*) from log GROUP by Date_format (access_time, '%H ');
1.2. Code
Cat pd_ng_log_stat.py#!/usr/bin/env python#-*-Coding:utf-8-*-From Ng_line_parser import NglineparserImport Pandas as PDImport socketImport str
Pythonpandas connection to MySQL1, Python and MySQL connection and operation, directly on the code, simple and direct efficiency:1 ImportMySQLdb2 3 Try:4 5conn = MySQLdb.connect (host='localhost', user='Root', passwd='xxxxx', db='Test', charset='UTF8')6 7Cur =conn.cursor ()8 9Cur.execute ('CREATE TABLE User (id int,name varchar )' )Ten One A -Value = [1,'Jkmiao'] - theCur.execute ("INSERT into user values (%s,%s)", value) - - - +Users = []
NaNDtype:object----[' A ', ' B ', ' C ']0 A1 12 NaN3 NaNDtype:object0 b1 22 NaN3 NaNDtype:object0 10 A B,c1 1 2,32 Nan Nan3 Nan Nan0 10 A, b C1 32 Nan Nan3 Nan NanDataframe0 a-b-c1 1-2-c2 [,-,-,]Name:key2, Dtype:object0 [A, b, c]1 [1, 2, C]2 NaNName:key2, Dtype:object#String Indexs = PD. Series (['A','b','C','Bbhello','123', Np.nan,'HJ']) DF= PD. DataFrame ({'Key1': List ('abcdef'), 'Key2':['Hee','FV','W','Hjja','123', Np.nan]})Print(S,'\ n-----')Print(S.str[0])#take the first
values appearDf.boxplot (column= ' label 1 ', by = ' Label 2 ')Plt.show ()The data under label 1 can then be plotted in a numerical distribution according to label 2As indicated below, it has been classified according to the level of education, high-level wage extremes, and other conclusions can be obtainedNote: When you want to paint, the individual input drawing instructions can not display graphics, then you need to enter Plt.show () on another line, condition: import Matplotlib.pyplot as Pl
ImportOSImportPandas as PDImportMatplotlib.pyplot as PltdefTest_run (): start_date='2017-01-01'End_data='2017-12-15'dates=Pd.date_range (start_date, End_data)#Create an empty data frameDF=PD. DataFrame (index=dates) Symbols=['SPY','AAPL','IBM','GOOG','GLD'] forSymbolinchsymbols:temp=getadjcloseforsymbol (symbol) DF=df.join (temp, how='Inner') returnDF def Normalize_data (DF): "" " normalize stock prices using the first row of the DATAFR Ame " " " df=df/df.ix[0,:] return DF defGetadj
Close 2017-11-24 260.359985 2017-11-27 260.230011 2017-11-28 262.869995"""if __name__=='__main__': Test_run ()There is a simpy-to-drop the data which index is not present in Dspy:Df1=df1.join (Dspy, how='inner')We can also rename the ' Adj Close ' to prevent conflicts: # Rename the column Dspy=dspy.rename (columns={'Adj Close'SPY'})Load More stocks:ImportPandas as PDdefTest_run (): start_date='2017-11-24'End_data='2017-11-28'dates=Pd.date_range (start_date, End_data)#Create an empty data
This article brings the content is about Python pandas in-depth understanding (code example), there is a certain reference value, the need for friends can refer to, I hope to help you.
First, screening
First, create a 6X4 matrix data.
Dates = Pd.date_range (' 20180830 ', periods=6) df = PD. DataFrame (Np.arange) reshape ((6,4)), index=dates, columns=[' A ', ' B ', ' C ', ' D ']) print (DF)
Print:
1. Python and MySQL connection and operation, directly on the code, simple and direct efficiency:Import MySQLdbTry: Conn= MySQLdb.connect (host='localhost', user='Root', passwd='xxxxx', db='Test', charset='UTF8') cur=conn.cursor () Cur.execute ('CREATE TABLE User (id int,name varchar )') Value= [1,'Jkmiao'] Cur.execute ("INSERT into user values (%s,%s)", value) users= [] forIinchRange -): Users.append ((i,"User"+str (i))) Cur.executemany ("INSERT
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.