Pandas Generation Time Series:
Import pandas as Pdimport NumPy as NP
Time series
- Timestamp (timestamp)
- Fixed cycle (period)
- Time interval (interval)
Date_range
- You can specify a start time and a period
- H: Hours
- D: Day
M: Month
# Several ways to write the Times #2016 Jul 1; 7/1/2016; 1/7/2016; 2016-07-01; 2016/07/01rng = Pd.date_range (' 2016-07-01 ', periods = ten, freq = ' 3D ') #不传freq则默认是Drng
Results:
Datetimeindex (['2016-07-01','2016-07-04','2016-07-07','2016-07-10', '2016-07-13','2016-07-16','2016-07-19','2016-07-22', '2016-07-25','2016-07-28'], Dtype='Datetime64[ns]', freq='3D')
View Code
TIME=PD. Series (NP.RANDOM.RANDN), Index=pd.date_range (Dt.datetime (2016,1,1), periods=20)) print (time) # Results: 2016-01-01 -0.1293792016-01-02 0.1644802016-01-03 -0.6391172016-01-04 -0.4272242016-01-05 2.0551332016-01-06 1.1160752016-01-07 0.3574262016-01-08 0.2742492016-01-09 0.8344052016-01-10 -0.0054442016-01-11 -0.1344092016-01-12 0.2493182016-01-13 - 0.2978422016-01-14 -0.1285142016-01-15 0.0636902016-01-16 -2.2460312016-01-17 0.3595522016-01-18 0.3830302016-01-19 0.4027172016-01-20 -0.694068freq:d, Dtype:float64
Truncate filtration
Time.truncate (before= ' 2016-1-10 ') #1月10之前的都被过滤掉了
Results:
2016-01-10 -0.0054442016-01-11 -0.1344092016-01-12 0.2493182016-01-13 -0.2978422016-01-14 -0.1285142016-01-15 0.0636902016-01-16 -2.2460312016-01-17 0.3595522016-01-18 0.3830302016-01-19 0.4027172016-01-20 -0.694068freq:d, Dtype:float64
View Code
Time.truncate (after= ' 2016-1-10 ') #1月10之后的都被过滤掉了 # Result: 2016-01-01 -0.1293792016-01-02 0.1644802016-01-03 -0.6391172016-01-04 -0.4272242016-01-05 2.0551332016-01-06 1.1160752016-01-07 0.3574262016-01-08 0.2742492016-01-09 0.8344052016-01-10 -0.005444freq:d, Dtype:float64
Print (time[' 2016-01-15 ')) #0.063690487247print (time[' 2016-01-15 ': ' 2016-01-20 ')) Results: 2016-01-15 0.0636902016-01-16 -2.2460312016-01-17 0.3595522016-01-18 0.3830302016-01-19 0.4027172016-01-20 -0.694068freq:d, Dtype:float64data=pd.date_range (' 2010-01-01 ', ' 2011-01-01 ', freq= ' M ') Print (data) #结果: Datetimeindex ([' 2010-01-31 ', ' 2010-02-28 ', ' 2010-03-31 ', ' 2010-04-30 ', ' 2010-05-31 ', ' 2010-06-30 ', ' 2010-07-31 ', ' 2010-08-31 ', ' 2010-09-30 ', ' 2010-10-31 ', ' 2010-11-30 ', ' 2010-12-31 ', dtype= ' Datetime64[ns] ', freq= ' M ')
#时间戳pd. Timestamp (' 2016-07-10 ') #Timestamp (' 2016-07-10 00:00:00 ') # You can specify more detail PD. Timestamp (' 2016-07-10 ') #Timestamp (' 2016-07-10 10:00:00 ') PD. Timestamp (' 2016-07-10 10:15 ') #Timestamp (' 2016-07-10 10:15:00 ') # How much detail can you add?t = PD. Timestamp (' 2016-07-10 10:15 ') # time interval PD. Period (' 2016-01 ') #Period (' 2016-01 ', ' M ') PD. Period (' 2016-01-01 ') #Period (' 2016-01-01 ', ' D ') # time Offsetspd.timedelta (' 1 day ') #Timedelta (' 1 days 00:00:00 ') PD. Period (' 2016-01-01 10:10 ') + PD. Timedelta (' 1 day ') #Period (' 2016-01-02 10:10 ', ' T ') PD. Timestamp (' 2016-01-01 10:10 ') + PD. Timedelta (' 1 day ') #Timestamp (' 2016-01-02 10:10:00 ') PD. Timestamp (' 2016-01-01 10:10 ') + PD. Timedelta (' ns ') #Timestamp (' 2016-01-01 10:10:00.000000015 ') p1 = Pd.period_range (' 2016-01-01 10:10 ', freq = ' 25H ', periods = Ten) P2 = Pd.period_range (' 2016-01-01 10:10 ', freq = ' 1d1h ', periods = ten) p1p2 result: Periodindex ([' 2016-01-01 10:00 ', ' 2016-01-02 11:00 ', ' 2016-01-03 12:00 ', ' 2016-01-04 13:00 ', ' 2016-01-05 14:00 ', ' 2016-01-06 15:00 ', ' 2016-01-07 16:00 ', ' 2016-01-08 17:00 ', ' 2016-01-09 18:00 ', ' 2016-01-10 19:00 '], dtype= ' period[25 H] ', freq= ' 25H ') periodindex ([' 2016-01-01 10:00 ', ' 2016-01-02 11:00 ', ' 2016-01-03 12:00 ', ' 2016-01-04 13:00 ', ' 2016-01-05 14:00 ', ' 2016-01-06 15:00 ', ' 2016-01-07 16:00 ', ' 2016-01-08 17:00 ', ' 2016-01-09 18:00 ', ' 2016-01-10 19:00 '], dtype= ' period[25h] ', freq= ' 25H ') # Specify Index RNG = Pd.date_range (' 1 ', periods = ten, fre Q = ' D ') rngpd. Series (Len (rng), index = RNG) results: 2016-07-01 02016-07-02 12016-07-03 22016-07-04 32016-07-05 42016-07- 52016-07-07 62016-07-08 72016-07-09 82016-07-10 9freq:d, dtype:int32periods = [PD. Period (' 2016-01 '), PD. Period (' 2016-02 '), PD. Period (' 2016-03 ')]ts = PD. Series (Np.random.randn (len (periods)), index = periods) TS Result: 2016-01-0.0158372016-02-0.9234632016-03-0.485212freq: M, Dtype:float64type (Ts.index) #pandas. core.indexes.period.periodindex# timestamps and time periods can beConvert ts = pd. Series (range), Pd.date_range (' 07-10-16-8:00am ', periods = ten, freq = ' H ')) TS results: 2016-07-10 08:00:00 02016-07-10 09:00:0 0 12016-07-10 10:00:00 22016-07-10 11:00:00 32016-07-10 12:00:00 42016-07-10 13:00:00 52016-07-10 14:00:00 62016-07-10 15:00:00 72016-07-10 16:00:00 82016-07-10 17:00:00 9freq:h, dtype:int32ts_period = Ts.to_period ( ) ts_period results: 2016-07-10 08:00 02016-07-10 09:00 12016-07-10 10:00 22016-07-10 11:00 32016-07-10 12:00 42016 -07-10 13:00 52016-07-10 62016-07-10 15:00 72016-07-10 16:00 82016-07-10 17:00 9freq:h, Dtype:int32 Difference between time period and timestamp ts_period[' 2016-07-10 08:30 ': ' 2016-07-10 11:45 '] #时间周期包含08:00 Results: 2016-07-10 08:00 02016-07-10 09:00 12016- 07-10 10:00am 22016-07-10 11:00am 3freq:h, dtype:int32ts[' 2016-07-10 08:30 ': ' 2016-07-10 11:45 '] #时间戳不包含08:30# results: 2016-0 7-10 09:00:00 12016-07-10 10:00:00 22016-07-10 11:00:00 3freq:h, Dtype:int32
Data resampling:
- Time data is converted from one frequency to another frequency
- Drop sampling
- L Sampling
Import pandas as Pdimport numpy as Nprng = Pd.date_range (' 1/1/2011 ', periods=90, freq= ' D ') #数据按天ts = PD. Series (Len (RNG)), index=rng) Ts.head () results: 2011-01-01-1.0255622011-01-02 0.4108952011-01-03 0.660311 (NP.RANDOM.RANDN) 2011-01-04 0.7102932011-01-05 0.444985freq:d, dtype:float64ts.resample (' M '). SUM () #数据降采样, descending to the month, the indicator is summed, can also be averaged, the result is specified by itself: 20 11-01-31 2.5101022011-02-28 0.5832092011-03-31 2.749411freq:m, dtype:float64ts.resample (' 3D '). SUM () #数据降采样, down to 3 knots Fruit: 2011-01-01 0.0456432011-01-04-2.2552062011-01-07 0.5711422011-01-10 0.8350322011-01-13-0.3967662011-01-16 -1.1562532011-01-19-1.2868842011-01-22 2.8839522011-01-25 1.5669082011-01-28 1.4355632011-01-31 0.311565 2011-02-03-2.5412352011-02-06 0.3170752011-02-09 1.5988772011-02-12-1.9505092011-02-15 2.9283122011-02-18 -0.7337152011-02-21 1.6748172011-02-24-2.0788722011-02-27 2.1723202011-03-02-2.0221042011-03-05-0.07035620 11-03-08 1.2766712011-03-11-2.8351322011-03-14-1.3841132011-03-17 1.5175652011-03-20-0.5504062011-03-23 0.7734302011-03-26 2.2443192011-03-29 2.951082freq:3d, dtype:float64day3ts = ts.resample (' 3D '). Mean () day3ts results: 2011-01-01 0.0152142011-01-04-0.7517352011 -01-07 0.1903812011-01-10 0.2783442011-01-13-0.1322552011-01-16-0.3854182011-01-19-0.4289612011-01-22 0. 9613172011-01-25 0.5223032011-01-28 0.4785212011-01-31 0.1038552011-02-03-0.8470782011-02-06 0.1056922011-0 2-09 0.5329592011-02-12-0.6501702011-02-15 0.9761042011-02-18-0.2445722011-02-21 0.5582722011-02-24-0.69 29572011-02-27 0.7241072011-03-02-0.6740352011-03-05-0.0234522011-03-08 0.4255572011-03-11-0.9450442011-03- 14-0.4613712011-03-17 0.5058552011-03-20-0.1834692011-03-23 0.2578102011-03-26 0.7481062011-03-29 0.9836 94freq:3d, Dtype:float64print (day3ts.resample (' D '). Asfreq ()) #升采样, to interpolate the result: 2011-01-01 0.0152142011-01-02 NaN201 1-01-03 nan2011-01-04-0.7517352011-01-05 nan2011-01-06 nan2011-01-07 0.1903812011-01-08 nan2011-01-09 nan2011-01-10 0.2783442011-01-11 nan2011-01-12 nan2011-01-13-0.1322552011-01-14 NaN2011-01 -15 nan2011-01-16-0.3854182011-01-17 nan2011-01-18 nan2011-01-19-0.4289612011-01-20 nan2011-01-21 nan2011-01-22 0.9613172011-01-23 nan2011-01-24 nan2011-01-25 0.5223032011-01-2 6 nan2011-01-27 nan2011-01-28 0.4785212011-01-29 nan2011-01-30 NaN ... 2011-02-28 nan2011-03-01 nan2011-03-02-0.6740352011-03-03 nan2011-03-04 nan2011-03-05 -0.0234522011-03-06 nan2011-03-07 nan2011-03-08 0.4255572011-03-09 nan2011-03-10 NaN2 011-03-11-0.9450442011-03-12 nan2011-03-13 nan2011-03-14-0.4613712011-03-15 nan2011-03-16 NAN2011-03-17 0.5058552011-03-18 nan2011-03-19 nan2011-03-20-0.1834692011-03-21 nan2011-03-22 nan2011- 03-23 0.2578102011-03-24 nan2011-03-25 nan2011-03-26 0.7481062011-03-27 nan2011-03-28 Nan2011-03-29 0.983694freq:d, length:88, Dtype:float64
Interpolation methods:
- Ffill NULL to take the preceding value
- Bfill NULL to take the following value
- Interpolate linear value
Day3ts.resample (' D '). Ffill (1) Results: 2011-01-01 0.0152142011-01-02 0.0152142011-01-03 nan2011-01-04-0.75173520 11-01-05-0.7517352011-01-06 nan2011-01-07 0.1903812011-01-08 0.1903812011-01-09 NaN2011-01-10 0.2783442011-01-11 0.2783442011-01-12 nan2011-01-13-0.1322552011-01-14-0.1322552011-01-15 NaN2011 -01-16-0.3854182011-01-17-0.3854182011-01-18 nan2011-01-19-0.4289612011-01-20-0.4289612011-01-21 Nan2011-01-22 0.9613172011-01-23 0.9613172011-01-24 nan2011-01-25 0.5223032011-01-26 0.5223032011-0 1-27 nan2011-01-28 0.4785212011-01-29 0.4785212011-01-30 NaN ... 2011-02-28 0.7241072011-03-01 nan2011-03-02-0.6740352011-03-03-0.6740352011-03-04 NaN2011-03-05 -0.0234522011-03-06-0.0234522011-03-07 nan2011-03-08 0.4255572011-03-09 0.4255572011-03-10 NaN20 11-03-11-0.9450442011-03-12-0.9450442011-03-13 nan2011-03-14-0.4613712011-03-15-0.4613712011-03-16 nan2011-03-17 0.5058552011-03-18 0.5058552011-03-19 nan2011-03-20-0.1834692011-03-21-0.1834692011-03-22 nan2011-03-23 0.25781020 11-03-24 0.2578102011-03-25 nan2011-03-26 0.7481062011-03-27 0.7481062011-03-28 NaN2011-03-29 0.983694freq:d, length:88, Dtype:float64day3Ts.resample (' D '). Bfill (1) Results: 2011-01-01 0.0152142011-01-02 NaN201 1-01-03-0.7517352011-01-04-0.7517352011-01-05 nan2011-01-06 0.1903812011-01-07 0.1903812011-01-08 nan2011-01-09 0.2783442011-01-10 0.2783442011-01-11 nan2011-01-12-0.1322552011-01-13-0.1322552011- 01-14 NAN2011-01-15-0.3854182011-01-16-0.3854182011-01-17 nan2011-01-18-0.4289612011-01-19-0.4 289612011-01-20 nan2011-01-21 0.9613172011-01-22 0.9613172011-01-23 nan2011-01-24 0.5223032011-01 -25 0.5223032011-01-26 nan2011-01-27 0.4785212011-01-28 0.4785212011-01-29 nan2011-01-30 0.103855 ... 2011-02-28 nan2011-03-01-0.6740352011-03-02-0.6740352011-03-03 nan2011-03-04-0.0234522011-03-05 -0.0234522011-03-06 nan2011-03-07 0.4255572011-03-08 0.4255572011-03-09 nan2011-03-10-0.9450442 011-03-11-0.9450442011-03-12 nan2011-03-13-0.4613712011-03-14-0.4613712011-03-15 NaN2011-03-16 0.5058552011-03-17 0.5058552011-03-18 nan2011-03-19-0.1834692011-03-20-0.1834692011-03-21 NaN201 1-03-22 0.2578102011-03-23 0.2578102011-03-24 nan2011-03-25 0.7481062011-03-26 0.7481062011-03-27 Nan2011-03-28 0.9836942011-03-29 0.983694freq:d, length:88, Dtype:float64day3Ts.resample (' D '). Interpolate (' Li Near ') #线性拟合填充结果: 2011-01-01 0.0152142011-01-02-0.2404352011-01-03-0.4960852011-01-04-0.7517352011-01-05-0.43 76972011-01-06-0.1236582011-01-07 0.1903812011-01-08 0.2197022011-01-09 0.2490232011-01-10 0.2783442011-01-11 0.1414782011-01-12 0.0046112011-01-13-0.1322552011-01-14-0.2166432011-01-15-0.3010302011-01-16-0.3854182011-01-17-0.399932 2011-01-18-0.4144472011-01-19-0.4289612011-01-20 0.0344652011-01-21 0.4978912011-01-22 0.9613172011-01-23 0.8149792011-01-24 0.6686412011-01-25 0.5223032011-01-26 0.5077092011-01-27 0.4931152011-01-28 0.47852120 11-01-29 0.3536322011-01-30 0.228744 ... 2011-02-28 0.2580602011-03-01-0.2079882011-03-02-0.6740352011-03-03-0.4571742011-03-04-0.2403132011-03-05 -0.0234522011-03-06 0.1262182011-03-07 0.2758872011-03-08 0.4255572011-03-09-0.0313102011-03-10-0.48817720 11-03-11-0.9450442011-03-12-0.7838202011-03-13-0.6225952011-03-14-0.4613712011-03-15-0.1389622011-03-16 0.1834462011-03-17 0.5058552011-03-18 0.2760802011-03-19 0.0463062011-03-20-0.1834692011-03-21-0.0363762011-03-22 0.1107172011-03-23 0.2578102011-03-24 0.4212422011-03-25 0.5846 742011-03-26 0.7481062011-03-27 0.8266362011-03-28 0.9051652011-03-29 0.983694freq:d, length:88, Dtype:floa T64
Pandas sliding window:
The sliding window is the ability to frame the time series according to the specified unit length, thus calculating the statistical indicators within the box. The equivalent of a length specified slider on the scale slide, each slide a unit to feedback the data within the slider.
Sliding windows can make the data more stable, the floating range will be relatively small, representative, a separate data may be more or less out of the group, there are differences or errors, using sliding window will be more normative.
%matplotlib Inline import matplotlib.pylabimport numpy as Npimport pandas as Pddf = PD. Series (NP.RANDOM.RANDN), index = Pd.date_range (' 7/1/2016 ', freq = ' D ', periods = +)) Df.head () Result: 2016-07-01-0.1921 402016-07-02 0.3579532016-07-03-0.2018472016-07-04-0.3722302016-07-05 1.414753freq:d, dtype:float64r = df.ro lling (window = ten) r#rolling [window=10,center=false,axis=0] #r. Max, R.median, R.STD, R.skew tilt, r.sum, R.varprint ( R.mean ()) Result: 2016-07-01 nan2016-07-02 nan2016-07-03 nan2016-07-04 nan2016-07-05 NaN 2016-07-06 nan2016-07-07 nan2016-07-08 nan2016-07-09 nan2016-07-10 0.3001332016-07-11 0.2847802016-07-12 0.2528312016-07-13 0.2206992016-07-14 0.1671372016-07-15 0.0185932016-07-16-0.06141420 16-07-17-0.1345932016-07-18-0.1533332016-07-19-0.2189282016-07-20-0.1694262016-07-21-0.2197472016-07-22- 0.1812662016-07-23-0.1736742016-07-24-0.1306292016-07-25-0.1667302016-07-26-0.2330442016-07-27-0.2566422016-07-28-0.2807382016-07-29-0.2898932016-07-30-0.379625 ... 2018-01-22-0.2114672018-01-23 0.0349962018-01-24-0.1059102018-01-25-0.1457742018-01-26-0.0893202018-01-27 -0.1643702018-01-28-0.1108922018-01-29-0.2057862018-01-30-0.1011622018-01-31-0.0347602018-02-01 0.22933320 18-02-02 0.0437412018-02-03 0.0528372018-02-04 0.0577462018-02-05-0.0714012018-02-06-0.0111532018-02-07- 0.0457372018-02-08-0.0219832018-02-09-0.1967152018-02-10-0.0637212018-02-11-0.2894522018-02-12-0.0509462018 -02-13-0.0470142018-02-14 0.0487542018-02-15 0.1439492018-02-16 0.4248232018-02-17 0.3618782018-02-18 0. 3632352018-02-19 0.5174362018-02-20 0.368020freq:d, length:600, Dtype:float64import Matplotlib.pyplot as PLT%MATP Lotlib inlineplt.figure (figsize= (5)) Df.plot (style= ' r--') df.rolling (window=10). mean (). Plot (style= ' B ') #< Matplotlib.axes._subpLots. Axessubplot at 0x249627fb6d8>
Results:
Data stationarity and Difference method:
Second-order difference is to make the first order difference on the basis of first difference.
Correlation function Evaluation Method:
Python Time series Analysis