When we are dealing with a lot of data, we have to use the concept of time. such as timestamps, fixed periods, or time intervals. Pandas provides a standard set of time-series processing tools and data algorithms.
The datetime.datetime module is the most used module in Python. Using datetime.datetime.now () , for example, gets the current time 2018-04-14 14:12:31.888964. This time contains the year, month, day, hour, minute, second, and millisecond.
a two time difference can also be obtained through the datetime Module
T1=datetime (2018,4,11)
T2=datetime (2018,3,3)
Print (T1-T2)
, 0:00:00
You can also perform a date operation by Timedelta
T1=datetime (2018,4,11)
Delta=timedelta (12)
Print (T1+delta)
Results:
2018-04-23 00:00:00
But actually in the code development process, we often encounter the use of strings to represent time, how to convert to a datetime module. Here you need to use the strptime function.
Value= ' 2018-4-12 '
Datetime.strptime (value, '%y-%m-%d ')
But every time you need to use strptime To change the time is too troublesome. And many times there are different time expressions. For example , the ' Apri 12,2018 ' format cannot be converted by strptime . Here and need to use the parser method in dateutil
From Dateutil.parser Import Parse
Parse (' April 12,2018-PM ')
Operation Result:
2018-04-12 12:00:00
There is also the following in this format, if set dayfirst to True. That means the first day, not the moon.
Parse (' 12/4/2018 ', dayfirst=true)
2018-04-12 00:00:00
If not set, the first parameter is a month,
Parse (' 12/4/2018 ')
2018-12-04 00:00:00
The following describes how dates are handled in pandas
datestr=[' 4/12/2018 ', ' 3/12/2018 ']
Pd.to_datetime (DATESTR)
Run the result and get a datetime object.
Datetimeindex ([' 2018-04-12 ', ' 2018-03-12 '], dtype= ' datetime64[ns] ', Freq=none)
Time series:
Pandas the most basic time series type is a series with a timestamp index
Datestr=[datetime (2018,4,12), DateTime (2018,4,11), DateTime (2018,4,10), DateTime (2018,4,9)]
Ts=series (NP.RANDOM.RANDN (4), INDEX=DATESTR)
2018-04-12 on 0.282997
2018-04-11 on 0.775905
2018-04-10-1.039524
2018-04-09 on 1.946392
Dtype:float64
Index, selection, subset
Now that time series has been formed through time. Then you can also pass the time index to the corresponding value.
STAMP=TS.INDEX[2]
Ts[stamp]
For longer sequences, such as the duration of The day or span of the year, month. Then index can Set the start time and time span through the Pd.date_range method. the periods here means the duration.
Ts=series (NP.RANDOM.RANDN (+), Index=pd.date_range (' 4/12/2018 ', periods=100))
Get from 4 months to the next day time.
2018-04-12-0.148937
2018-04-13 on 0.937058
2018-04-14-2.096196
2018-04-15 on 0.916470
2018-04-16-0.697598
2018-04-17 on 0.643925
2018-04-18-0.307314
2018-04-19-0.141321
2018-04-20-0.175498
2018-04-21-0.829793
2018-04-22-0.024155
2018-04-23-1.051386
2018-04-24 on 0.540014
2018-04-25 on 0.154808
2018-04-26 on 1.358971
2018-04-27 on 0.525493
2018-04-28-0.669124
2018-04-29-0.207421
2018-04-30-0.228202
2018-05-01 on 0.816570
2018-05-02-0.877241
2018-05-03 on 0.772659
2018-05-04 on 0.554481
2018-05-05-0.714872
2018-05-06 on 1.773668
2018-05-07 on 0.326872
2018-05-08-1.079632
2018-05-09 on 1.024192
2018-05-10-0.646678
2018-05-11-1.515030
...
2018-06-21-0.053543
2018-06-22 on 2.118719
2018-06-23 on 0.106124
2018-06-24 on 0.659720
2018-06-25-0.991692
2018-06-26-0.556483
2018-06-27-0.819689
2018-06-28 on 0.031711
2018-06-29 on 0.543342
2018-06-30 on 0.009368
2018-07-01 on 1.141678
2018-07-02 on 0.222943
2018-07-03 on 0.303460
2018-07-04-0.815658
2018-07-05 on 1.291347
2018-07-06-0.681728
2018-07-07-0.327148
2018-07-08 on 1.385592
2018-07-09 on 1.302346
2018-07-10 on 1.179094
2018-07-11-0.465722
2018-07-12-0.351399
2018-07-13 on 0.059268
2018-07-14-0.235086
2018-07-15 on 0.983399
2018-07-16-1.767474
2018-07-17 on 0.596053
2018-07-18-2.022643
2018-07-19 on 0.539513
2018-07-20 on 0.421791
Freq:d, length:100, Dtype:float64
In the sequence generated above, you can set the index to a year or a January of data. ts[' 2018-4 '] you can get data for 4 months. The format can also be ts[' 2018/4 ']
Get the data over time in the following way
ts[' 2018/4/12 ': ' 2018/4/23 '
Operation Result:
2018-04-12-1.080229
2018-04-13 on 1.231485
2018-04-14 on 0.725456
2018-04-15 on 0.029311
2018-04-16 on 0.331900
2018-04-17 on 0.921682
2018-04-18-0.822750
2018-04-19-0.569305
2018-04-20 on 0.589461
2018-04-21 on 1.405626
2018-04-22-0.049872
2018-04-23-0.144766
Freq:d, Dtype:float64
You can also get data before or after a certain period of time by means of truncate
Ts.truncate (after= ' 2018/4/15 ') # get the data before 2018/4/15
Ts.truncate (before= ' 2018/4/15 ') # get the data after 2018/4/15
The interval for the time series that you set earlier is day-level. To set the interval to be a monthly or yearly interval, you need to set the value of Freq,D,M,Y The counter-Representative day is the interval, the month is the interval, the year is the interval.
Pd.date_range (' 4/12/2018 ', periods=100,freq= ' D ')
Pd.date_range (' 4/12/2018 ', periods=100,freq= ' M ')
Pd.date_range (' 4/12/2018 ', periods=100,freq= ' Y ')
There are many other parameter settings. The specific parameters are set as follows:
Time series with repeating sequence
In some scenarios, there may be situations where multiple observations fall at the same point in time
Dup_ts=series (Np.arange (4), index=dates)
2018-04-12 on 0
2018-04-13 on 1
2018-04-14 on 2
2018-04-14 on 3
Dtype:int64
by Is_unique , you can get a repeating sequence .
Dup_ts.index.is_unique
Date range, frequency, and movement
Pd.date_range (' 4/12/2018 ', ' 5/12/2018 ')
get A date of 4 months to 5 months . The same can be set freq to set the interval
Datetimeindex ([' 2018-04-12 ', ' 2018-04-13 ', ' 2018-04-14 ', ' 2018-04-15 ',
' 2018-04-16 ', ' 2018-04-17 ', ' 2018-04-18 ', ' 2018-04-19 ',
' 2018-04-20 ', ' 2018-04-21 ', ' 2018-04-22 ', ' 2018-04-23 ',
' 2018-04-24 ', ' 2018-04-25 ', ' 2018-04-26 ', ' 2018-04-27 ',
' 2018-04-28 ', ' 2018-04-29 ', ' 2018-04-30 ', ' 2018-05-01 ',
' 2018-05-02 ', ' 2018-05-03 ', ' 2018-05-04 ', ' 2018-05-05 ',
' 2018-05-06 ', ' 2018-05-07 ', ' 2018-05-08 ', ' 2018-05-09 ',
' 2018-05-10 ', ' 2018-05-11 ', ' 2018-05-12 ',
Dtype= ' Datetime64[ns] ', freq= ' D ')
If you want to shift the generated time series to get. You need to use the shift function .
Ts=series (NP.RANDOM.RANDN (4), Index=pd.date_range (' 4/12/2018 ', periods=4,freq= ' M '))
Print (TS)
Print (Ts.shift (2))
As a result, time is shifted and the corresponding data is shifted.
2018-04-30-0.065679
2018-05-31-0.163013
2018-06-30 on 0.501377
2018-07-31 on 0.856595
Freq:m, Dtype:float64
2018-04-30 NaN
2018-05-31 NaN
2018-06-30-0.065679
2018-07-31-0.163013
Freq:m, Dtype:float64
Because the simple shift operation does not modify the index. So some of the data will be discarded. Therefore, if the frequency is known, it can be passed to shift to implement a timestamp instead of a simple displacement of the data
Ts.shift (2,freq= ' M ')
2018-06-30-0.235855
2018-07-31 on 1.189707
2018-08-31 on 0.005851
2018-09-30-0.134599
Freq:m, Dtype:float64
Python Data analysis: Time series One