Python Data analysis: Time series One

Source: Internet
Author: User
Tags timedelta

When we are dealing with a lot of data, we have to use the concept of time. such as timestamps, fixed periods, or time intervals. Pandas provides a standard set of time-series processing tools and data algorithms.

The datetime.datetime module is the most used module in Python. Using datetime.datetime.now () , for example, gets the current time 2018-04-14 14:12:31.888964. This time contains the year, month, day, hour, minute, second, and millisecond.

a two time difference can also be obtained through the datetime Module

T1=datetime (2018,4,11)

T2=datetime (2018,3,3)

Print (T1-T2)

, 0:00:00

You can also perform a date operation by Timedelta

T1=datetime (2018,4,11)

Delta=timedelta (12)

Print (T1+delta)

Results:

2018-04-23 00:00:00

But actually in the code development process, we often encounter the use of strings to represent time, how to convert to a datetime module. Here you need to use the strptime function.

Value= ' 2018-4-12 '

Datetime.strptime (value, '%y-%m-%d ')

But every time you need to use strptime To change the time is too troublesome. And many times there are different time expressions. For example , the ' Apri 12,2018 ' format cannot be converted by strptime . Here and need to use the parser method in dateutil

From Dateutil.parser Import Parse

Parse (' April 12,2018-PM ')

Operation Result:

2018-04-12 12:00:00

There is also the following in this format, if set dayfirst to True. That means the first day, not the moon.

Parse (' 12/4/2018 ', dayfirst=true)

2018-04-12 00:00:00

If not set, the first parameter is a month,

Parse (' 12/4/2018 ')

2018-12-04 00:00:00

The following describes how dates are handled in pandas

datestr=[' 4/12/2018 ', ' 3/12/2018 ']

Pd.to_datetime (DATESTR)

Run the result and get a datetime object.

Datetimeindex ([' 2018-04-12 ', ' 2018-03-12 '], dtype= ' datetime64[ns] ', Freq=none)

Time series:

Pandas the most basic time series type is a series with a timestamp index

Datestr=[datetime (2018,4,12), DateTime (2018,4,11), DateTime (2018,4,10), DateTime (2018,4,9)]

Ts=series (NP.RANDOM.RANDN (4), INDEX=DATESTR)

2018-04-12 on 0.282997

2018-04-11 on 0.775905

2018-04-10-1.039524

2018-04-09 on 1.946392

Dtype:float64

Index, selection, subset

Now that time series has been formed through time. Then you can also pass the time index to the corresponding value.

STAMP=TS.INDEX[2]

Ts[stamp]

For longer sequences, such as the duration of The day or span of the year, month. Then index can Set the start time and time span through the Pd.date_range method. the periods here means the duration.

Ts=series (NP.RANDOM.RANDN (+), Index=pd.date_range (' 4/12/2018 ', periods=100))

Get from 4 months to the next day time.

2018-04-12-0.148937

2018-04-13 on 0.937058

2018-04-14-2.096196

2018-04-15 on 0.916470

2018-04-16-0.697598

2018-04-17 on 0.643925

2018-04-18-0.307314

2018-04-19-0.141321

2018-04-20-0.175498

2018-04-21-0.829793

2018-04-22-0.024155

2018-04-23-1.051386

2018-04-24 on 0.540014

2018-04-25 on 0.154808

2018-04-26 on 1.358971

2018-04-27 on 0.525493

2018-04-28-0.669124

2018-04-29-0.207421

2018-04-30-0.228202

2018-05-01 on 0.816570

2018-05-02-0.877241

2018-05-03 on 0.772659

2018-05-04 on 0.554481

2018-05-05-0.714872

2018-05-06 on 1.773668

2018-05-07 on 0.326872

2018-05-08-1.079632

2018-05-09 on 1.024192

2018-05-10-0.646678

2018-05-11-1.515030

...

2018-06-21-0.053543

2018-06-22 on 2.118719

2018-06-23 on 0.106124

2018-06-24 on 0.659720

2018-06-25-0.991692

2018-06-26-0.556483

2018-06-27-0.819689

2018-06-28 on 0.031711

2018-06-29 on 0.543342

2018-06-30 on 0.009368

2018-07-01 on 1.141678

2018-07-02 on 0.222943

2018-07-03 on 0.303460

2018-07-04-0.815658

2018-07-05 on 1.291347

2018-07-06-0.681728

2018-07-07-0.327148

2018-07-08 on 1.385592

2018-07-09 on 1.302346

2018-07-10 on 1.179094

2018-07-11-0.465722

2018-07-12-0.351399

2018-07-13 on 0.059268

2018-07-14-0.235086

2018-07-15 on 0.983399

2018-07-16-1.767474

2018-07-17 on 0.596053

2018-07-18-2.022643

2018-07-19 on 0.539513

2018-07-20 on 0.421791

Freq:d, length:100, Dtype:float64

In the sequence generated above, you can set the index to a year or a January of data. ts[' 2018-4 '] you can get data for 4 months. The format can also be ts[' 2018/4 ']

Get the data over time in the following way

ts[' 2018/4/12 ': ' 2018/4/23 '

Operation Result:

2018-04-12-1.080229

2018-04-13 on 1.231485

2018-04-14 on 0.725456

2018-04-15 on 0.029311

2018-04-16 on 0.331900

2018-04-17 on 0.921682

2018-04-18-0.822750

2018-04-19-0.569305

2018-04-20 on 0.589461

2018-04-21 on 1.405626

2018-04-22-0.049872

2018-04-23-0.144766

Freq:d, Dtype:float64

You can also get data before or after a certain period of time by means of truncate

Ts.truncate (after= ' 2018/4/15 ') # get the data before 2018/4/15

Ts.truncate (before= ' 2018/4/15 ') # get the data after 2018/4/15

The interval for the time series that you set earlier is day-level. To set the interval to be a monthly or yearly interval, you need to set the value of Freq,D,M,Y The counter-Representative day is the interval, the month is the interval, the year is the interval.

Pd.date_range (' 4/12/2018 ', periods=100,freq= ' D ')

Pd.date_range (' 4/12/2018 ', periods=100,freq= ' M ')

Pd.date_range (' 4/12/2018 ', periods=100,freq= ' Y ')

There are many other parameter settings. The specific parameters are set as follows:

Time series with repeating sequence

In some scenarios, there may be situations where multiple observations fall at the same point in time

Dup_ts=series (Np.arange (4), index=dates)

2018-04-12 on 0

2018-04-13 on 1

2018-04-14 on 2

2018-04-14 on 3

Dtype:int64

by Is_unique , you can get a repeating sequence .

Dup_ts.index.is_unique

Date range, frequency, and movement

Pd.date_range (' 4/12/2018 ', ' 5/12/2018 ')

get A date of 4 months to 5 months . The same can be set freq to set the interval

Datetimeindex ([' 2018-04-12 ', ' 2018-04-13 ', ' 2018-04-14 ', ' 2018-04-15 ',

' 2018-04-16 ', ' 2018-04-17 ', ' 2018-04-18 ', ' 2018-04-19 ',

' 2018-04-20 ', ' 2018-04-21 ', ' 2018-04-22 ', ' 2018-04-23 ',

' 2018-04-24 ', ' 2018-04-25 ', ' 2018-04-26 ', ' 2018-04-27 ',

' 2018-04-28 ', ' 2018-04-29 ', ' 2018-04-30 ', ' 2018-05-01 ',

' 2018-05-02 ', ' 2018-05-03 ', ' 2018-05-04 ', ' 2018-05-05 ',

' 2018-05-06 ', ' 2018-05-07 ', ' 2018-05-08 ', ' 2018-05-09 ',

' 2018-05-10 ', ' 2018-05-11 ', ' 2018-05-12 ',

Dtype= ' Datetime64[ns] ', freq= ' D ')

If you want to shift the generated time series to get. You need to use the shift function .

Ts=series (NP.RANDOM.RANDN (4), Index=pd.date_range (' 4/12/2018 ', periods=4,freq= ' M '))

Print (TS)

Print (Ts.shift (2))

As a result, time is shifted and the corresponding data is shifted.

2018-04-30-0.065679

2018-05-31-0.163013

2018-06-30 on 0.501377

2018-07-31 on 0.856595

Freq:m, Dtype:float64

2018-04-30 NaN

2018-05-31 NaN

2018-06-30-0.065679

2018-07-31-0.163013

Freq:m, Dtype:float64

Because the simple shift operation does not modify the index. So some of the data will be discarded. Therefore, if the frequency is known, it can be passed to shift to implement a timestamp instead of a simple displacement of the data

Ts.shift (2,freq= ' M ')

2018-06-30-0.235855

2018-07-31 on 1.189707

2018-08-31 on 0.005851

2018-09-30-0.134599

Freq:m, Dtype:float64

Python Data analysis: Time series One

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.