A Cheat Sheet of Time Series Analysis With R
Tianqi Li*, Muyang Ren†
                                                  IAS, Wuhan University
                                                 Last Updated: June 4, 2018
Introduction
      This is a cheat sheet of Time Series Analysis which uses language R for programming. In this paper, we
briefly introduce theory of time series and provide respective R coding.
    The cheat sheet is based on An Introduction to Analysis of Financial Data with R(Ruey S. Tsay, 2013). We
learned the course Applied Time Series Analysis in Wuhan University teaching by professor O Chia Chuang
and he chose this book as our textbook. Many coding and scripts come from the book.
                                                                                      Tianqi Li, Muyang Ren
                                                                                        May 21, 2018 in IAS.
Contents
1       Rudimental Coding                                                                                                                                                                    2
        1.1 Basic Commands . . . . . . . . . .               .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   2
        1.2 Get financial data from the Internet             .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   2
        1.3 Statistical Commands . . . . . . .               .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   2
        1.4 Simple Return and Log Return . .                 .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   2
2       ARMA                                                                                                                                                                                 3
        2.1 Stationary Test . . . . . . .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   3
            2.1.1 ACF and PACF Plot          .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   3
            2.1.2 Ljung-Box Test . . .       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   3
        2.2 General ARMA(p,q) model          .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   4
        2.3 Unit-root . . . . . . . . . .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   5
        2.4 Seasonal . . . . . . . . . . .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   5
        2.5 Long-memory time series .        .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   6
        2.6 Model testing . . . . . . . .    .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   6
3       ARCH                                                                                                                                                                                 8
4       VaR and ES                                                                                                                                                                           8
    *
        email: dicklitianqi@outlook.com
    †
        email: @outlook.com
                                                                             1
    A Cheat Sheet of Time Series Analysis With R                                                                - 2/8 -
    1 Rudimental Coding
    1.1 Basic Commands
    Here we list some basic commands for using R.
1   setwd('C:/TimeSeriesAnalysis/') # Set working directory
2   install.packages('quantmod') # Install library
3   library(quantmod) # Load Package
4   mydata = read.table("AAPL.txt", header=T) # Load txt file data
5   head(mydata) # Show head of data
6   dim(mydata) # Describe dimension of data
7   data$date = as.Date(as.character(data$date),"%Y%m%d") #Change numerical date 20180521 into Date form
        2018-05-21, this is important in timeseries analysis
8   source("MyScript.R") # Load your R Script
    To complete our works, all packages we need can be installed at once by the following coding:
1   install.packages(c("quantmod","fBasics","quantreg","ggplot2","fGarch","fUnitRoots","fracdiff"))
    1.2 Get financial data from the Internet
    We usually use getSymbols in quantmod to get data such as stocks from the Internet.
1   library(quantmod)
2   getSymbols("AAPL", from="2007-01-03", to="2011-12-02") # Get Data of Apple Stock
3   getSymbols("DEXUSEU", src="FRED") # Exchange rates from FRED
    1.3 Statistical Commands
    Before we start our analysis in time series data, we should learn some basic statistical commands.
1   library(fBasics) # Load package
2   basicStats(vector) # Show summary statistics of vector such as log(return)
3   mean(vector) # Sample mean
4   var(vector) # Sample variance
5   stdev(vector) # Standard deviation
6   t.test(vector) # Testing mean equals to zero
7   skewness(vector) # Skewness
8   kurtosis(vector) # Kurtosis
9   normalTest(vector, method="jb") # JB-test
    1.4 Simple Return and Log Return
    Simple return Rt has an intuitional understanding.
                                                           Pt − Pt−1
                                                    Rt =
                                                              Pt−1
    It is usually used in our daily life. However, in time series analysis, we mainly use log return rt = log (Rt + 1)
    instead for the following reason:
        • The continuously compounded multiperiod return is simply the sum of continuously compounded one-
          period returns involved.
                                       rt [k] = rt−1 + rt−2 + · · · + rt−k+1
        • Statistical properties of log returns are more tractable because of summation.
    A Cheat Sheet of Time Series Analysis With R                                                           - 3/8 -
        • Some time series data, for example, GDP, have an exponential growth, and take log form can transform
          it into linear form.
        • Log return transformation usually makes time series stationary.
    In R, we use the following code to do log return transform.
1   lrtn = log(rtn+1) # rtn is the simple return
    2 ARMA
    2.1 Stationary Test
    A necessary condition for using ARMA model to capture log return {rt } is that {rt } is weakly stationary. In
    detail, the mean and variance of {rt } are supposed to be time-invariant. We can check the covariance γk :=
    Cov(rt , rt−k ) between rt and rt−k to see whether a time series {rt } is weakly stationary or not.
    2.1.1 ACF and PACF Plot
    The definition of Autocorrelation Function (ACF) is
                                                           Cov(xt , xt−k )   γk
                                                    ρk =                   =
                                                             V ar(xt )       γ0
    Partial AutoCorrelation Function (PACF) : PACF(i) = ϕi,i is obtained by multi-regression:
                                   xt = ϕ0,1 + ϕ1,1 xt−1 + e1t
                                   xt = ϕ0,2 + ϕ1,2 xt−1 + ϕ2,2 xt−2 + e2t
                                   xt = ϕ0,3 + ϕ1,3 xt−1 + ϕ2,3 xt−2 + ϕ3,3 xt−3 + e3t
                                      ..
                                       .
    In R, we use acf and pacf to realize the plot.
1   # acf(x, lag.max = NULL, type = c("correlation", "covariance", "partial"), plot = TRUE, na.action =
        na.fail, demean = TRUE)
2   # pacf(x, lag.max, plot, na.action)
3   acf(lrtn, lag=10)
4   pacf(lrtn, lag=10)
    2.1.2 Ljung-Box Test
    The null hypothesis is
                                                     H0 : ρ1 = · · · = ρm = 0
    and the corresponding statistics Q∗ (m) is
                                                                      ∑
                                                                      m
                                                                         ρ̂2l
                                                   Q∗ (m) = T (T + 2)
                                                                        T −l
                                                                       l=1
    When Q(m) > χ2α we reject H0 and χ2α is the (1 − α)th quantile of χ2 distribution with freedom m. The
    coding is
1   # Box.test(x, lag = 1, type = c("Box-Pierce", "Ljung-Box"), fitdf = 0)
2   Box.test(lrtn, lag = 1, type = "Ljung-Box")
    A Cheat Sheet of Time Series Analysis With R                                                            - 4/8 -
    2.2 General ARMA(p,q) model
    Here we directly consider the ARMA(p,q) model instead of separately discuss AR(p) and MA(q) model. Gen-
    erally, an ARMA(p,q) model is
                                                       ∑
                                                       p                      ∑
                                                                              q
                                           xt = ϕ0 +         ϕi xt−i + at −         θj at−j
                                                       i=1                    j=1
    where {at } is a white noise sequence, p and q are non-negative integer.
       The order of (p,q) should be discussed at first. The following two methods can help us decide our model.
    However, these statistics can be calculated only after data is fitted by corresponding model.
    ACF and PACF For a stationary AR(p) model, there is a proposition:
                                                   {
                                                       not zero if j ≤ p
                                 PACF(j) = ϕj,j =
                                                       0          if j > p
    For an MA(q) model (MA model is definitely stationary), we have:
                                                  {
                                                     not zero if j ≤ p
                                     ACF(j) =
                                                     0          if j > p
    Notice that ARMA model doesn            t have property above.
    Information Criterion Now we introduce two information criterion, AIC and BIC, in order to determine the
    order (p,q). Denote that T is the number of samples. likelihood function means maximal likelihood function.
                 −2                           2
        • AIC =     ln(likelihood f unction) + × Number of parameter                         (Generally)
                 T                            T
                               2l
        • AIC(l) = ln(σ̃l2 ) +                                  (AR(l), σ̃l2 is the MLE estimator of σa2 .)
                               T
                             l × ln(T )
        • BIC(l) = ln(σ̃l2 ) +                                                                            (AR(l))
                                   T
    Smaller AIC or BIC are better.
       We have several commands to estimate an ARMA(p,q) model.
1   # AR Model
2   model_ar = ar(lrtn, method="mle") # The command ar normalizes minimum AIC to 0
3   model_ar$order # Print the identified order
4
5   # ARMA Model
6   # arima(x, order = c(0L, 0L, 0L), seasonal = list(order = c(0L, 0L, 0L), period = NA), xreg = NULL,
        include.mean = TRUE, transform.pars = TRUE, fixed = NULL, init = NULL, method = c("CSS-ML", "ML
        ", "CSS"), n.cond, SSinit = c("Gardner1980", "Rossignol2011"), optim.method = "BFGS", optim.
        control = list(), kappa = 1e6)
7   model_arma = arima(lrtn, order=c(3,0,2)) # An ARMA(3,2) model
    If we want to fix the coefficient of some variable, we need the option fixed. For instance, we need a modified
    MA(3) model:
                                       xt = c0 + at − θ1 at−1 − 0 ∗ at−2 − θ3 at−3
1   model_ma3 = arima(lrtn, order=c(0,0,3), fixed=c(NA,0,NA)) # An MA(3) model with specific coefficients
    Prediction is also a piece of cake by R.
1   predict(model_ma3,10) # Using your model_ma3 to forecast the next 10 periods
    A Cheat Sheet of Time Series Analysis With R                                                               - 5/8 -
    2.3 Unit-root
    Unit-root makes time series nonstationary. There are three typical examples of unit-root.
    Random walk
                                                          pt = pt−1 + at
    Random walk with drift
                                                        pt = µ + pt−1 + at
    Trend stationary time series
                                                        pt = β0 + β1 t + xt
    where {at } is a white noise series with zero mean.
        We extend ARMA model such that it has a unit characteristic root, then it is called Autoregressive Integrated
    Moving Average (ARIMA) which has unit-root causing long-memory. An efficient way to treat with ARIMA is
    differencing. Here we give a formal definition of ARIMA(p,1,q). Assume a time series {yt }. If ct = yt −yt−1 =
    (1 − B)yt satisfying ARMA(p,q), then {yt } is called ARIMA(p,1,q). Differencing makes {yt } stationary. Not
    surprisingly, ARIMA(p,n,q) (n ≥ 2) exists.
1   ct = diff(yt,1) # order 1 differencing
    A reliable way to test unit-root is Augmented Dickey-Fuller Test (ADF-Test):
                                                                  ∑
                                                                  p−1
                                             xt = ct + βxt−1 +          ϕi ∆xt−i + et
                                                                  i=1
    and the null hypothesis is H0 : β = 1. The alternative hypothesis is H1 : β < 1. ct is an explicit function
    depending on time. ∆xj = xj − xj−1 is a differencing sequence. Actually, ct usually has three forms:
                                        
                                         0           , random walk
                                   ct =   c           , random walk with drift
                                        
                                          w0 + w1 t , exponential growth
    The choice of ct is based on your data.
                                                                         β̂ − 1
                                                   ADF-statistics =
                                                                      Std. dev of β̂
    where β̂ is the OLS estimator of β.
1   library(fUnitRoots) # Loading package for unit-root
2   adfTest(lrtn, lags=10, type=c("c")) # Here type can be 0, c and ct
    2.4 Seasonal
    Actually, there are some time series which have periodic or cyclical behavior. This type of series is called a
    seasonal time series. We usually eliminate the seasonal behavior by differencing preceding analysis of time
    series, which is called seasonal adjustment. For example, if our data has a strong evidence of autocorrelation at
    lags of 4, we need to consider a seasonal differencing
                                                           ∆4 = 1 − B 4
    A Cheat Sheet of Time Series Analysis With R                                                                   - 6/8 -
    From the ACF plot, we can observe whether there are significant and non-zero values at each lag. If so, we may
    make difference to make the time series stationary.
       Furthermore, multiplicative seasonal models should be taken into consideration.
                                        (1 − B)(1 − B s )xt = (1 − θB)(1 − ΘB s )at
    To do with seasonal series, we can use arima with option seasonal
1   #(1-B)(1-B^4)x_t=(1+ma1*B)(1+sma1*B^4)a_t
2   model = arima(lrtn, order=c(0,1,1), seasonal=list(order=c(0,1,1), period=4)
    In addition, we use dummy variable to capture exact seasonality (For monthly return, ACF is significantly
    non-zero at lags of 12, 24 and 36, etc.). We can use dummies in OLS to check it.
1   Jan = rep(c(1, rep(0,11), n) # generate a sequence of one 1 and eleven 0 for n times
2   model_test = lm(lrtn~Jan)
3   summary(model_test)
    2.5 Long-memory time series
    For a unit-root nonstationary time series, it can be shown that the sample ACF converges to 1 for all fixed lags
    as the sample size increases. However, there are other time series whose ACF decreases at a polynomial rate
    while lags increasing. These time series are called long-memory time series. To fit these time series, we define
    ARFIMA(p,d,q) model.
        As is known,
                                               ∑∞
                                                          d(d − 1) . . . (d − k + 1) k
                                  (1 − B)d =        (−1)k                           B
                                                                      k!
                                                    k=0
    If a differencing sequence (1 − B)d xt follows an ARMA(p,q) model, we called it ARFIMA(p,d,q) process. Here
    d is not necessarily integer.
         We use package fracdiff to estimate AFRIMA(p,d,q).
1   model = fracdiff(lrtn, nar=1, nma=1) #AFRIMA(1,d,1), d is estimated.
    2.6 Model testing
    If we focus on the dynamic structure of a time series, we can use AIC, BIC and variance of residual to examine
    our model.
        If our aim is to forecast, a good way is to use Mean Square of Forecast Error (MSFE) to measure its ability
    of prediction. Here we briefly introduce the procedure of backtesting.
       1. Split our dataset for training {xt |t = 1, . . . , h} and testing{xt |t = h + 1, . . . , T };
       2. Use {xt |t = 1, . . . , h} to train model in order to estimate x̂h (1) and calculate eh (1) = xh+1 − x̂h (1);
       3. Expand training set to {xt |t = 1, . . . , h + 1} and, similar to step 2, estimate x̂h+1 (1) and calculate
          eh+1 (1) = xh+2 − x̂h+2 (1);
       4. Iterate step 3 until the last sample.
    And smaller MSFE is better.                                 ∑T −1
                                                                   j=k   [ej (1)]2
                                                   MSFE(m) =
                                                                     T −h
     A Cheat Sheet of Time Series Analysis With R                                                            - 7/8 -
     Also we can use Mean absolute Forecast Error (MAFE) and Bias followed.
                                            ∑T −1                       ∑T −1
                                               j=k |ej (1)|               j=h ej (1)
                             MAFE(m) =                      , Bias(m) =
                                                T −h                       T −h
     Notice: Here we use a R Script to realize backtest. backtest.R can be downloaded on Tsay’s homepage.1
 1   source("backtest.R")
 2   backtest_model = backtest(model, lrtn, 215, 1) # Return MSFE and MAFE
 3
 4
 5   # Raw code of backtest.R
 6   "backtest" <- function(m1,rt,orig,h,xre=NULL,fixed=NULL,inc.mean=TRUE){
 7   # m1: is a time-series model object
 8   # orig: is the starting forecast origin
 9   # rt: the time series
10   # xre: the independent variables
11   # h: forecast horizon
12   # fixed: parameter constriant
13   # inc.mean: flag for constant term of the model.
14   #
15   regor=c(m1$arma[1],m1$arma[6],m1$arma[2])
16   seaor=list(order=c(m1$arma[3],m1$arma[7],m1$arma[4]),period=m1$arma[5])
17   T=length(rt)
18   if(!is.null(xre) && !is.matrix(xre))xre=as.matrix(xre)
19   ncx=ncol(xre)
20   if(orig > T)orig=T
21   if(h < 1) h=1
22   rmse=rep(0,h)
23   mabso=rep(0,h)
24   nori=T-orig
25   err=matrix(0,nori,h)
26   jlast=T-1
27   for (n in orig:jlast){
28    jcnt=n-orig+1
29    x=rt[1:n]
30    if (!is.null(xre)){
31    pretor=xre[1:n,]
32    mm=arima(x,order=regor,seasonal=seaor,xreg=pretor,fixed=fixed,include.mean=inc.mean)
33    nx=xre[(n+1):(n+h),]
34    if(h==1)nx=matrix(nx,1,ncx)
35    fore=predict(mm,h,newxreg=nx)
36   }
37   else {
38     mm=arima(x,order=regor,seasonal=seaor,xreg=NULL,fixed=fixed,include.mean=inc.mean)
39     fore=predict(mm,h,newxreg=NULL)
40   }
41    kk=min(T,(n+h))
42   # nof is the effective number of forecats at the forecast origin n.
43    nof=kk-n
44    pred=fore$pred[1:nof]
45    obsd=rt[(n+1):kk]
46    err[jcnt,1:nof]=obsd-pred
47   }
48   #
49   for (i in 1:h){
50   iend=nori-i+1
51   tmp=err[1:iend,i]
52   mabso[i]=sum(abs(tmp))/iend
        1
            http://faculty.chicagobooth.edu/ruey.tsay/teaching/introTS/backtest.R
     A Cheat Sheet of Time Series Analysis With R                    - 8/8 -
53   rmse[i]=sqrt(sum(tmp^2)/iend)
54   }
55   print("RMSE␣of␣out-of-sample␣forecasts")
56   print(rmse)
57   print("Mean␣absolute␣error␣of␣out-of-sample␣forecasts")
58   print(mabso)
59   backtest <- list(origin=orig,error=err,rmse=rmse,mabso=mabso)
60   }
     3 ARCH
     4 VaR and ES