R Language time series data application XTs

Source: Internet
Author: User
Tags diff

Zoo is the basic library of time series, and it is a universal-oriented design. XTS is an extended implementation of the time series Library (Zoo). The XTS type inherits the zoo type and enriches the function of time series data processing.

First, xts structure and definition of an object

1 , xts An object is an observation matrix with a time index , The structure is as follows :

xts = Matrix + times

2 , creating xts object , the function is as follows:

XTS (x=, order.by=, ...)

Parameter x: Data, must be a vector or matrix;

Order.by: Index, which is a time object in ascending order of the same number of x rows.

To create an example:

Data <-Rnorm (5)

Dates <-seq (as. Date ("2016-01-01"), length = 5, by = ' days ')

Smith <-xts (x = data, order.by = dates)

3 , Properties ( Attr )

XTS allows data binding to any key-value property that can be used to hold the object's metadata. When you create an XTs object, you add a property, and you simply pass the Name=value parameter to the XTS () function.

#使用 Posixct Date Class object creation Bday
Bday <-as. POSIXCT ("1899-05-08")

# Create XTs object and add born property
Hayek <-xts (x = data, order.by = dates, born = bday)

4 , decomposition xts Object

The core of the XTS and Zoo is a simple r matrix and some additional attributes, the most important of which is index. The index contains all the information that takes the data as a time series.

CoreData () Gets the matrix portion of the XTs object.

Index () Gets the index portion of the XTs object.

5 , converted into xts Object

As.xts ()

6 , xts The main difference from other time series

The main difference between XTS and R most other time series objects is that XTS can use any class that represents time, whether Posixct, Date, or other classes, xts convert them to an internal format, allowing the user to select subsets as naturally as possible.

A <-xts (x = 1:2,as. Date ("2012-01-01") + 0:1)

A[index (a)]

7 , Indexed Properties

View the category of the Index Indexclass ()

View the index's time zone Indextz ()

Show or modify index time format Indexformat ()

# Modify Time representation format
Indexformat (temps) <-"%m/%d/%y"

Tzone () for extracting or setting the time zone.

Tzone (x) <-"Time_zone"

The original vector of the index of the XTs object is the cumulative number of seconds since the Unix era (1970-01-01)

. Index () to get the original vector of the index.

The following functions are used to extract a time component similar to the POSIXLT type:

. Indexday ()

. Indexmon ()

. Indexyear ()
#创建一个周末日期索引
Index <-which (. Indexwday (temps) = = 0 |. Indexwday (temps) = = 6)

second, input and output xts Data

1, the actual application from the hard disk or the network to read data.

For example, the contents of the Tmp_file file on the hard disk are as follows:

A, b

1/02/2015, 1, 3

2/03/2015, 2, 4

Enter Example 1:

# Read Tmp_file file
Dat<-read.csv (Tmp_file)

#将dat转换成xts格式
XTS (DAT, order.by = as.) Date (rownames (DAT), "%m/%d/%y"))

Enter Example 2:

#使用read. Zoo Read Tmp_file file
Dat_zoo <-Read.zoo (tmp_file, index.column = 0, Sep = ",", format= "%m/%d/%y")

#将dat_zoo转换成xts
Dat_xts <-xts (Dat_zoo)

Enter Example 3:

# fun = As.yearmon Converts a time string into a more appropriate time class.

Sun <-Read.zoo (tmp_file, Sep = ",", fun = As.yearmon)

# Convert to XTs object
Sun_xts<-xts (Sun)

2. Output xts Object

There are two main ways of doing this:

1. Use Saverds () and Readrds () to serialize a single R object.

2. Use the function Write.zoo in the zoo ()

#获取临时文件名
TMP <-Tempfile ()

#使用zoo将xts对象写入tmp文件
Write.zoo (data_xts, Sep = ",", File = tmp)

Three, the query time range

1 , query date range

XTS can quickly and efficiently determine a subset of date and time ranges and extract corresponding observations.

Use special characters and date combinations to extract the date range of the XTs object.

a["20090825"] # # 20090825

a["201203/201212"] # # 201203 to 201212

a["/201601"] # #自 201601 start

2 , extract daily interval

# Select the observations between all date 9:30-16:00

nyse["t09:30/t16:00"]

3 , update or replace the observed value

# set the corresponding observation value in the dates vector to NA

X[dates] <-NA

# The observed value has been modified to 0 since 2016-06-09.

x["2016-06-09/"] <-0

4 , the start and end of the positioning time period
Last (Temps, "1 week")

Last (Lastweek, 2)

First (Lastweek, "-2 Days")

You can combine first () and last () together to use

#第1周的后3天

Last (First (Temps, ' 1 week '), ' 3 days ')

5 , viewing time periodicity and number of times

Periodicity () Viewing the period of a time series

Ndays (), Nmonths (), nquarters () View the number of cycles

Four, xts merging operations on objects

XTS objects, when doing mathematical calculations, will follow the time and return only data with time intersection.

1 , with Merge Merge by Column xts

Merge () merges one or more sequences by column. Used to standardize observations by a fixed date.

Merge (A, B, join = "right", fill= 9999)

3 Key parameters:

... : An arbitrary object used for merging

Join: Specifies how sequences are merged, such as inner or left.

Fill: Specifies how to set missing values that occur after a sequence merge

2 , with Rbind Merge by Row xts

Merge results sorted in ascending order of time

v. The value of observations NA Value Processing

1 , the previous or next observed value closing method

Take the previous observation of the missing value to fill the missing value. Can prevent the first glimpse deviation (look-ahead bias)

# using the last observed value

NA.LOCF (x)

#设置fromLast = TRUE to fill the vacancy with the next observed value

NA.LOCF (x, fromlast = TRUE)

2 , using Na.approx () Interpolation value

Na.approx () is based on a simple linear interpolation between two points, where the data points are estimated using the distance between the index values, and the estimates are linear in time.

Vi. Time Series Operations

1 , offset functions Lag ()

K is the step of the offset. In XTs, K is positive, and the observed value of the sequence is shifted downward (after the time), K is negative, and the observed value is offset upward. The zoo is opposite xts.

> A

[, 1]

2016-01-01 on 1

2016-01-02 on 2

2016-01-03 on 3

> Lag (a)

[, 1]

2016-01-01 NA

2016-01-02 on 1

2016-01-03 on 2

> Lag (a,k=-1)

[, 1]

2016-01-01 on 2

2016-01-02 on 3

2016-01-03 NA

2 , Difference function diff ()

A simple difference for example: X (t)-X (t-k) where k is the step of the sequence offset. The high-order difference is a repetitive application of the results of each previous differential calculation.

diff (Xtsdata, lag =, differences =)

Parameter description:

Lag: the number of offsets;

Differences: The Order of difference (for example, how many times the diff is called).

# The following two instructions have the same effect

diff (x, differences = 2)

Diff (diff (x))

3 , endpoints () function to split data by time interval

Endpoints (data,on=, k=)

The function receives a time series and returns the position vector of the last observed value for each time interval. The return value starts at 0 and ends with the data length (total number of rows).

The parameters on support various time periods, including "years", "quarters", "months", "hours" and "minutes" and so on.

The parameter k is used to find the K-cycle. For example, set on = "Weeks", K = 2, preferably every two weeks on the last day. Note that the last return value is always the length of the data, even if it is inconsistent with the interval period.

For example, the following code shows the last observed value of a data per year

Endpoints (Air, on = "Years")

[1] 0 12 24 36 4860 72 84 96 108 120 132 144

4 , with period.apply split data by Time, and calculate

Period.apply (x, INDEX, fun, ...)

Examples of Use:

# Calculate the weekly endpoints
EP <-Endpoints (temps, on = "weeks")

# Calculate weekly mean and show results

Period.apply (temps, INDEX = EP, Fun =mean)

5 , with Split-lapply-rbind splitting data and computing

#按周来划分数据, the F parameter is a string that describes the dividing interval (for example: "Months", "years")

Data_weekly <-Split (data, F = "weeks")

#创建一个每周均值的列表

Temps_avg <-lapply (X = Data_weekly,fun = mean)

X_list_rbind <-Do.call (Rbind, Temps_avg)

Do.call (Rbind, ...)

Sends a list to Rbind instead of one object at a time.

6 , a single variable sequence is converted into OHLC Data ( Open-high-low-close data)

Integrating different frequency sequences based on regular Windows can make analysis easier.

The To.period () function is formatted as follows, and the parameters include the sequence x, the character k that represents the period, etc.

To.period (x,

Period = "months",

K = 1,

Indexat,

Name=null,

OHLC = TRUE,

...)

Examples of Use:
Usd_eur_weekly <-To.period (usd_eur, period = "weeks")
Usd_eur_yearly <-To.period (usd_eur, period = "Years", OHLC =false)

7 , convert to low frequency sequence

To.period () can also convert sequences to low-frequency data, similar to secondary sampling.

# Convert to quarterly OHLC format

Mkt_quarterly <-to.period (eq_mkt,period = "quarters")

#使用快捷功能转换成季度OHLC格式

Mkt_quarterly2 <-to.quarterly (eq_mkt,name = "Edhec_equity", Indexat = "Firstof")

Set the Indexat parameter to Firstof, and select the starting point for the interval time. Set the parameter name to change the base name of each column.

8 , calculating the rolling standard deviation of time series

Another common requirement for time series data is to apply functions in the scrolling window of the data.

XTS objects can be implemented using the Zoo function rollapply ().

The function parameter has a time series Object x, window size width, which is applied to each scrolling cycle of the function fun.

The width parameter specifies the number of observations in the window. For example, choose a 10-day scroll for a sequence.

Rollapply (x, width = ten, fun = max, na.rm= TRUE)

Note: If the sequence of the day observation value is selected for 10 days, the sequence of the monthly observations is selected for 10 months.

Vii. Modify Timestamp

1, in the high-frequency data found with the same time-stamp observations, the general effective practice is to force the time only, increase the number of milliseconds random.

make.index.unique (Data,eps=, drop=,...)

Parameter description:

Eps:epsilon or small the abbreviation of change, controlling the degree to which the same time is disturbed.

drop = TRUE: Removes all repeated observations.

Example

Make.index.unique (x, EPS = 1e-4) # Add random number

Make.index.unique (x, drop = TRUE) # Remove Duplicates

2, some cases time stamp is too accurate, preferably approximate to some fixed interval. For example, observations may occur at any point in the hour, but only the nearest next point is recorded.

Aligns data at the next time, seconds, minutes, hours.

Align.time (data,n=) parameter n, which represents the number of seconds to approximate

Align.time (x, n = 60) # Approximate to minute

R Language Time series data application XTs

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.