0% found this document useful (0 votes)
79 views76 pages

Time Sereis in R

This chapter introduces basic concepts in time series analysis. It provides examples of time series data including annual sunspot numbers from 1700-1994, annual Canadian lynx trapping numbers from 1821-1934, and US Treasury bill rates. The chapter discusses stationarity, eliminating trends and seasonality, and assessing residuals in time series. The objectives of time series analysis are introduced as gaining knowledge about the underlying stochastic process that generated the data.

Uploaded by

Jaber Yassine
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
79 views76 pages

Time Sereis in R

This chapter introduces basic concepts in time series analysis. It provides examples of time series data including annual sunspot numbers from 1700-1994, annual Canadian lynx trapping numbers from 1821-1934, and US Treasury bill rates. The chapter discusses stationarity, eliminating trends and seasonality, and assessing residuals in time series. The objectives of time series analysis are introduced as gaining knowledge about the underlying stochastic process that generated the data.

Uploaded by

Jaber Yassine
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 76

TIME SERIES ANALYSIS

Alexander Aue
University of California, Davis
University of California, Davis
Time Series Analysis

Alexander Aue
This text is disseminated via the Open Education Resource (OER) LibreTexts Project (https://LibreTexts.org) and like the hundreds
of other texts available within this powerful platform, it is freely available for reading, printing and "consuming." Most, but not all,
pages in the library have licenses that may allow individuals to make changes, save, and print this book. Carefully
consult the applicable license(s) before pursuing such effects.
Instructors can adopt existing LibreTexts texts or Remix them to quickly build course-specific resources to meet the needs of their
students. Unlike traditional textbooks, LibreTexts’ web based origins allow powerful integration of advanced features and new
technologies to support learning.

The LibreTexts mission is to unite students, faculty and scholars in a cooperative effort to develop an easy-to-use online platform
for the construction, customization, and dissemination of OER content to reduce the burdens of unreasonable textbook costs to our
students and society. The LibreTexts project is a multi-institutional collaborative venture to develop the next generation of open-
access texts to improve postsecondary education at all levels of higher learning by developing an Open Access Resource
environment. The project currently consists of 14 independently operating and interconnected libraries that are constantly being
optimized by students, faculty, and outside experts to supplant conventional paper-based books. These free textbook alternatives are
organized within a central environment that is both vertically (from advance to basic level) and horizontally (across different fields)
integrated.
The LibreTexts libraries are Powered by NICE CXOne and are supported by the Department of Education Open Textbook Pilot
Project, the UC Davis Office of the Provost, the UC Davis Library, the California State University Affordable Learning Solutions
Program, and Merlot. This material is based upon work supported by the National Science Foundation under Grant No. 1246120,
1525057, and 1413739.
Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not
necessarily reflect the views of the National Science Foundation nor the US Department of Education.
Have questions or comments? For information about adoptions or adaptions contact info@LibreTexts.org. More information on our
activities can be found via Facebook (https://facebook.com/Libretexts), Twitter (https://twitter.com/libretexts), or our blog
(http://Blog.Libretexts.org).
This text was compiled on 08/17/2023
TABLE OF CONTENTS
Licensing

1: Basic Concepts in Time Series


1.1: Introduction and Examples
1.2: Stationary Time Series
1.3: Eliminating Trend Components
1.4: Eliminating Trend and Seasonal Components
1.5: Assessing the Residuals
1.6: Summary

2: The Estimation of Mean and Covariances


2.1: Estimation of the Mean
2.2: Estimation of the Autocovariance Function

3: ARMA Processes
3.1: Introduction to Autoregressive Moving Average (ARMA) Processes
3.2: Causality and Invertibility
3.3: The PACF of a Causal ARMA Process
3.4: Forecasting
3.5: Parameter Estimation
3.6: Model Selection
3.7: Summary

4: Spectral Analysis
4.1: Introduction to Spectral Analysis
4.2: The Spectral Density and the Periodogram
4.3: Large Sample Properties
4.4: Linear Filtering
4.5: Summary

Index
Glossary
Detailed Licensing

1 https://stats.libretexts.org/@go/page/6213
Licensing
A detailed breakdown of this resource's licensing can be found in Back Matter/Detailed Licensing.

1 https://stats.libretexts.org/@go/page/32555
CHAPTER OVERVIEW

1: Basic Concepts in Time Series


The first chapter explains the basic notions and highlights some of the objectives of time series analysis. Section 1.1 gives several
important examples, discusses their characteristic features and deduces a general approach to the data analysis. In Section 1.2,
stationary processes are identified as a reasonably broad class of random variables which are able to capture the main features
extracted from the examples. Finally, it is discussed how to treat deterministic trends and seasonal components in Sections 1.3 and
1.4, and how to assess the residuals in Section 1.5. Section 1.6 concludes.
1.1: Introduction and Examples
1.2: Stationary Time Series
1.3: Eliminating Trend Components
1.4: Eliminating Trend and Seasonal Components
1.5: Assessing the Residuals
1.6: Summary

This page titled 1: Basic Concepts in Time Series is shared under a not declared license and was authored, remixed, and/or curated by Alexander
Aue.

1
1.1: Introduction and Examples
The first definition clarifies the notion time series analysis.

Definition 1.1.1: Time Series


Let T ≠ ∅ be an index set, conveniently being thought of as "time''. A family (X : t ∈ T ) of random variables (random
t

functions) is called a stochastic process. A realization of (X : t ∈ T ) is called a time series. We will use the notation
t

(x : t ∈ T ) in the discourse.
t

The most common choices for the index set T include the integers Z = {0, ±1, ±2, …}, the positive integers N = {1, 2, …}, the
nonnegative integers N = {0, 1, 2, …}, the real numbers R = (−∞, ∞) and the positive halfline R = [0, ∞) . This class is
0 +

mainly concerned with the first three cases which are subsumed under the notion discrete time series analysis.
Oftentimes the stochastic process (X : t ∈ T ) is itself referred to as a time series, in the sense that a realization is identified with
t

the probabilistic generating mechanism. The objective of time series analysis is to gain knowledge of this underlying random
phenomenon through examining one (and typically only one) realization. This separates time series analysis from, say, regression
analysis for independent data.
In the following a number of examples are given emphasizing the multitude of possible applications of time series analysis in
various scientific fields.
Example 1.1.1 (Wölfer's sunspot numbers). In Figure 1.1, the number of sunspots (that is, dark spots visible on the surface of the
sun) observed annually are plotted against time. The horizontal axis labels time in years, while the vertical axis represents the
observed values x of the random variable
t

Xt = # of sunspots at time t, t = 1700, … , 1994.

The figure is called a time series plot. It is a useful device for a preliminary analysis. Sunspot numbers are used to explain magnetic
oscillations on the sun surface.

Figure 1.1: Wölfer's sunspot from 1700 to 1994.


To reproduce a version of the time series plot in Figure 1.1 using the free software package R (downloads are available at
http://cran.r-project.org), download the file sunspots.dat from the course webpage and type the following commands:
> spots = read.table("sunspots.dat")
> spots = ts(spots, start=1700, frequency=1)
> plot(spots, xlab="time", ylab="", main="Number of Sun spots")
In the first line, the file sunspots.dat is read into the object spots, which is then in the second line transformed into a time
series object using the function ts(). Using start sets the starting value for the x-axis to a prespecified number, while
frequency presets the number of observations for one unit of time. (Here: one annual observation.) Finally, plot is the standard
plotting command in R, where xlab and ylab determine the labels for the x-axis and y-axis, respectively, and main gives the
headline.

1.1.1 https://stats.libretexts.org/@go/page/832
Example 1.1.2 (Canadian lynx data). The time series plot in Figure 1.2 comes from a biological data set. It contains the annual
returns of lynx at auction in London by the Hudson Bay Company from 1821--1934 (on a log scale). These are viewed as
10

observations of the stochastic process

Xt = log10 (number of lynx trapped at time 1820 + t), t = 1, … , 114.

Figure 1.2: Number of lynx trapped in the MacKenzie River district between 1821 and 1934.
The data is used as an estimate for the number of all lynx trapped along the MacKenzie River in Canada. This estimate, in turn, is
often taken as a proxy for the true population size of the lynx. A similar time series plot could be obtained for the snowshoe rabbit,
the primary food source of the Canadian lynx, hinting at an intricate predator-prey relationship.
Assuming that the data is stored in the file lynx.dat, the corresponding R commands leading to the time series plot in Figure 1.2
are:
> lynx = read.table("lynx.dat")
> lynx = ts(log10(lynx), start=1821, frequency=1)
> plot(lynx, xlab="", ylab="", main="Number of trapped lynx")
Example 1.1.3 (Treasury bills). Another important field of application for time series analysis lies in the area of finance. To hedge
the risks of portfolios, investors commonly use short-term risk-free interest rates such as the yields of three-month, six-month, and
twelve-month Treasury bills plotted in Figure 1.3. The (multivariate) data displayed consists of 2,386 weekly observations from
July 17, 1959, to December 31, 1999. Here,

Xt = (Xt,1 , Xt,2 , Xt,3 ), t = 1, … , 2386,

where X , X and X denote the three-month, six-month, and twelve-month yields at time t, respectively. It can be seen from
t,1 t,2 t,3

the graph that all three Treasury bills are moving very similarly over time, implying a high correlation between the components of
X .
t

1.1.2 https://stats.libretexts.org/@go/page/832
Figure 1.3: Yields of Treasury bills from July 17, 1959, to December 31, 1999.

Figure 1.4: S&P 500 from January 3, 1972, to December 31, 1999.

To produce the three-variate time series plot in Figure 1.3, use the R code
> bills03 = read.table("bills03.dat");
> bills06 = read.table("bills06.dat");
> bills12 = read.table("bills12.dat");
> par(mfrow=c(3,1))
> plot.ts(bills03, xlab="(a)", ylab="",
main="Yields of 3-month Treasury Bills")
> plot.ts(bills06, xlab="(b)", ylab="",
main="Yields of 6-month Treasury Bills")
> plot.ts(bills12, xlab="(c)", ylab="",
main="Yields of 12-month Treasury Bills")

1.1.3 https://stats.libretexts.org/@go/page/832
It is again assumed that the data can be found in the corresponding files bills03.dat, bills06.dat and bills12.dat.
The command line par(mfrow=c(3,1)) is used to set up the graphics. It enables you to save three different plots in the same
file.
Example 1.1.4 (S&P 500). The Standard and Poor's 500 index (S&P 500) is a value-weighted index based on the prices of 500
stocks that account for approximately 70% of the U.S. equity market capitalization. It is a leading economic indicator and is also
used to hedge market portfolios. Figure 1.4 contains the 7,076 daily S&P 500 closing prices from January 3, 1972, to December 31,
1999, on a natural logarithm scale. It is consequently the time series plot of the process

Xt = ln(closing price of S&P 500 at time t), t = 1, … , 7076.

Note that the logarithm transform has been applied to make the returns directly comparable to the percentage of investment return.
The time series plot can be reproduced in R using the file sp500.dat
There are countless other examples from all areas of science. To develop a theory capable of handling broad applications, the
statistician needs to rely on a mathematical framework that can explain phenomena such as
trends (apparent in Example 1.1.4);
seasonal or cyclical effects (apparent in Examples 1.1.1 and 1.1.2);
random fluctuations (all Examples);
dependence (all Examples?).
The classical approach taken in time series analysis is to postulate that the stochastic process (X : t ∈ T ) under investigation can
t

be divided into deterministic trend and seasonal components plus a centered random component, giving rise to the model

Xt = mt + st + Yt , t ∈ T (1.1.1)

where (m : t ∈ T ) denotes the trend function ("mean component''), (s : t ∈ T ) the seasonal effects and (Y
t t t: t ∈ T) a (zero mean)
stochastic process. After an appropriate model has been chosen, the statistician may aim at
estimating the model parameters for a better understanding of the time series;
predicting future values, for example, to develop investing strategies;
checking the goodness of fit to the data to confirm that the chosen model is appropriate.
Estimation procedures and prediction techniques are dealt with in detail in later chapters of the notes. The rest of this chapter will
be devoted to introducing the classes of strictly and weakly stationary stochastic processes (in Section 1.2) and to providing tools to
eliminate trends and seasonal components from a given time series (in Sections 1.3 and 1.4), while some goodness of fit tests will
be presented in Section 1.5.

Contributers
Alexander Aue (Department of Statistics, University of California, Davis)

This page titled 1.1: Introduction and Examples is shared under a not declared license and was authored, remixed, and/or curated by Alexander
Aue.

1.1.4 https://stats.libretexts.org/@go/page/832
1.2: Stationary Time Series
Fitting solely independent and identically distributed random variables to data is too narrow a concept. While, on one hand, they
allow for a somewhat nice and easy mathematical treatment, their use is, on the other hand, often hard to justify in applications.
Our goal is therefore to introduce a concept that keeps some of the desirable properties of independent and identically distributed
random variables ("regularity''), but that also considerably enlarges the class of stochastic processes to choose from by allowing
dependence as well as varying distributions. Dependence between two random variables X and Y is usually measured in terms of
the covariance f unction

C ov(X, Y ) = E[(X − E[X])(Y − E[Y ])]

and the correlation f unction

C ov(X, Y )
C orr(X, Y ) = .
−−−−−−−−−−− −
√V ar(X)V ar(Y )

With these notations at hand, the classes of strictly and weakly dependent stochastic processes can be introduced.
Definition 1.2.1 (Strict Stationarity). A stochastic process (X t: t ∈ T ) is called strictly stationary if, for all t 1, . . . , tn ∈ T and h

such that t + h, . . . , t + h ∈ T , it holds that


1 n

D
(Xt1 , … , Xtn ) = (Xt1 +h , … , Xtn +h ).

That is, the so-called finite-dimensional distributions of the process are invariant under time shifts. Here = indicates equality in D

distribution.
The definition in terms of the finite-dimensional distribution can be reformulated equivalently in terms of the cumulative joint
distribution function equalities

P (Xt ≤ x1 , … , Xt ≤ xn ) = P (Xt +h ≤ x1 , … , Xt +h ≤ xn )
1 n 1 n

holding true for all x , . . . , x ∈ R, t , . . . , t ∈ T and h such that t + h, . . . , t + h ∈ T . This can be quite difficult to check
1 n 1 n 1 n

for a given time series, especially if the generating mechanism of a time series is far from simple, since too many model parameters
have to be estimated from the available data, rendering concise statistical statements impossible. A possible exception is provided
by the case of independent and identically distributed random variables.
To get around these difficulties, a time series analyst will commonly only specify the first- and second-order moments of the joint
distributions. Doing so then leads to the notion of weak stationarity.
Definition 1.2.2 (Weak Stationarity). A stochastic process (X t: t ∈ T) is called weakly stationary if
the second moments are finite: E[X ] < ∞ for all t ∈ T ;
2
t

the means are constant: E[X ] = m for all t ∈ T ;


t

the covariance of X and X


t depends on h only:
t+h

γ(h) = γX (h) = C ov(Xt , Xt+h ), h ∈ T such that t + h ∈ T ,

is independent of t ∈ T and is called the autocovariance function (ACVF). Moreover,


γ(h)
ρ(h) = ρX (h) = , h ∈ T,
γ(0)

is called the autocorrelation function (ACF).


Remark 1.2.1. If (X : t ∈ T ) ) is a strictly stationary stochastic process with finite second moments, then it is also weakly
t

stationary. The converse is not necessarily true. If (X : t ∈ T ) , however, is weakly stationary and Gaussian, then it is also strictly
t

stationary. Recall that a stochastic process is called Gaussian if, for any t , . . . , t ∈ T , the random vector (X , . . . , X ) is
1 n t1 tn

multivariate normally distributed.

1.2.1 https://stats.libretexts.org/@go/page/833
This section is concluded with examples of stationary and nonstationary stochastic processes.

Figure 1.5: 100 simulated values of the cyclical time series (left panel), the stochastic amplitude (middle panel), and the sine
part (right panel).
Example 1.2.1 (White Noise). Let (Z : t ∈ Z) be a sequence of real-valued, pairwise uncorrelated random variables with
t

E[ Z ] = 0
t and 0 < V ar(Z ) = σ < ∞ for all t ∈ Z . Then (Z : t ∈ Z) is called white noise, abbreviated by
t
2
t

(Z : t ∈ Z) ∼ WN(0, σ ) . It defines a centered, weakly stationary process with ACVF and ACF given by
2
t

2
σ , h = 0, 1, h = 0,
γ(h) = { and ρ(h) = {
0, h ≠ 0, 0, h ≠ 0,

respectively. If the are moreover independent and identically distributed, they are called iid noise, shortly
(Zt : t ∈ Z)

. The left panel of Figure 1.6 displays 1000 observations of an iid noise sequence (Z : t ∈ Z) based on
(Zt : t ∈ Z) ∼ IID(0, σ )
2
t

standard normal random variables. The corresponding R commands to produce the plot are
> z = rnorm(1000,0,1)
> plot.ts(z, xlab="", ylab="", main="")

The command rnorm simulates here 1000 normal random variables with mean 0 and variance 1. There are various built-in random
variable generators in R such as the functions runif(n,a,b) and rbinom(n,m,p) which simulate the n values of a uniform
distribution on the interval (a, b) and a binomial distribution with repetition parameter m and success probability p, respectively.

Figure 1.6: 1000 simulated values of iid N(0, 1) noise (left panel) and a random walk with iid N(0, 1) innovations (right panel).
Example 1.2.2 (Cyclical Time Series). Let A and B be uncorrelated random variables with zero mean and variances
V ar(A) = V ar(B) = σ , and let λ ∈ R be a frequency parameter. Define
2

Xt = A cos(λt) + B sin(λt), t ∈ R.

The resulting stochastic process (X t: t ∈ R) is then weakly stationary. Since sin(λt + φ) = sin(φ) cos(λt) + cos(φ) sin(λt) , the
process can be represented as

Xt = R sin(λt + φ), t ∈ R,

so that R is the stochastic amplitude and φ ∈ [−π, π] the stochastic phase of a sinusoid. Some computations show that one must
have A = R sin(φ) and B = R cos(φ) . In the left panel of Figure 1.5, 100 observed values of a series (X ) are displayed.
t t∈Z

Therein, λ = π/25 was used, while R and φ were random variables uniformly distributed on the interval (−.5, 1) and (0, 1),

1.2.2 https://stats.libretexts.org/@go/page/833
respectively. The middle panel shows the realization of R , the right panel the realization of sin(λt + φ) . Using cyclical time series
bears great advantages when seasonal effects, such as annually recurrent phenomena, have to be modeled. The following R
commands can be applied:
> t = 1:100; R = runif(100,-.5,1); phi = runif(100,0,1); lambda = pi/25
> cyc = R*sin(lambda*t+phi)
> plot.ts(cyc, xlab="", ylab="")
This produces the left panel of Figure 1.5. The middle and right panels follow in a similar fashion.
Example 1.2.3 (Random Walk). Let (Z t: t ∈ N) ∼ WN(0, σ )
2
. Let S 0 =0 and

St = Z1 + … + Zt , t ∈ N.

The resulting stochastic process (S t: t ∈ N0 ) is called a random walk and is the most important nonstationary time series. Indeed,
it holds here that, for h > 0 ,
2
C ov(St , St+h ) = C ov(St , St + Rt,h ) = tσ ,

where R = Z + … + Z
t,h t+1 , and the ACVF obviously depends on t . In R, one may construct a random walk, for example,
t+h

with the following simple command that utilizes the 1000 normal observations stored in the array z of Example 1.2.1.
> rw = cumsum(z)
The function cumsum takes as input an array and returns as output an array of the same length that contains as its jth entry the sum
of the first j input entries. The resulting time series plot is shown in the right panel of Figure 1.6.
Chapter 3 discusses in detail so-called autoregressive moving average processes which have become a central building block in
time series analysis. They are constructed from white noise sequences by an application of a set of stochastic difference equations
similar to the ones defining the random walk (S : t ∈ N ) of Example 1.2.3.
t 0

In general, the true parameters of a stationary stochastic process (X : t ∈ T ) are unknown to the statistician. Therefore, they have
t

to be estimated from a realization x , . . . , x . The following set of estimators will be used here. The sample mean of x , . . . , x is
1 n 1 n

defined as
n
1
x̄ = ∑ xt .
n
t=1

The sample autocovariance function (sample ACVF) is given by


n−h
1
^(h) =
γ ∑(xt+h − x̄)(xt − x̄), h = 0, 1, … , n − 1. (1.2.1)
n
t=1

Finally, the sample autocorrelation function (sample ACF) is


γ
^(h)
^(h) =
ρ , h = 0, 1, … , n − 1.
^(0)
γ

Example 1.2.4. Let (Z : t ∈ Z) be a sequence of independent standard normally distributed random variables (see the left panel of
t

Figure 1.6 for a typical realization of size n = 1,000). Then, clearly, γ(0) = ρ(0) = 1 and γ(h) = ρ(h) = 0 whenever h ≠ 0 .
Table 1.1 gives the corresponding estimated values γ^(h) and ρ^(h) for h = 0, 1, … , 5.

1.2.3 https://stats.libretexts.org/@go/page/833
The estimated values are all very close to the true ones, indicating that the estimators work reasonably well for n = 1,000. Indeed it
can be shown that they are asymptotically unbiased and consistent. Moreover, the sample autocorrelations ρ^(h) are approximately
normal with zero mean and variance 1/1000. See also Theorem 1.2.1 below. In R, the function acf can be used to compute the
sample ACF.
Theorem 1.2.1. Let (Z : t ∈ Z) ∼ WN(0, σ ) and let h ≠ 0 . Under a general set of conditions, it holds that the sample ACF at
t
2

lag h , ρ^(h) , is for large n approximately normally distributed with zero mean and variance 1/n.
Theorem 1.2.1 and Example 1.2.4 suggest a first method to assess whether or not a given data set can be modeled conveniently by
a white noise sequence: for a white noise sequence, approximately 95% of the sample ACFs should be within the the confidence
interval ±2/√− n . Using the data files on the course webpage, one can compute with R the corresponding sample ACFs to check for

whiteness of the underlying time series. The properties of the sample ACF are revisited in Chapter 2.

Figure1.7: Annual water levels of Lake Huron (left panel) and the residual plot obtained from fitting a linear trend to the data
(right panel).

This page titled 1.2: Stationary Time Series is shared under a not declared license and was authored, remixed, and/or curated by Alexander Aue.

1.2.4 https://stats.libretexts.org/@go/page/833
1.3: Eliminating Trend Components
In this section three different methods are developed to estimate the trend of a time series model. It is assumed that it makes sense
to postulate the model (1.1.1) with s = 0 for all t ∈ T , that is,
t

Xt = mt + Yt , t ∈ T (1.3.1)

where (without loss of generality) E[Y ] = 0 . In particular, three different methods are discussed, (1) the least squares estimation of
t

m , (2) smoothing by means of moving averages and (3) differencing.


t

Method 1 (Least squares estimation) It is often useful to assume that a trend component can be modeled appropriately by a
polynomial,
p
mt = b0 + b1 t + … + bp t , p ∈ N0 .

In this case, the unknown parameters b 0, … , bp can be estimated by the least squares method. Combined, they yield the estimated
polynomial trend
^ ^ ^ p
m
^ t = b0 + b1 t + … + bp t , t ∈ T,

where ^b , … , ^b denote the corresponding least squares estimates. Note that the order p is not estimated. It has to be selected by
0 p

the statistician---for example, by inspecting the time series plot. The residuals Y^ can be obtained as
t

^ p
^ t = Xt − ^
Y t = Xt − m
^ ^
b0 − b1 t − … − bp t , t ∈ T.

How to assess the goodness of fit of the fitted trend will be subject of Section 1.5 below.
Example 1.3.1 (Level of Lake Huron). The left panel of Figure 1.7 contains the time series of the annual average water levels in
feet (reduced by 570) of Lake Huron from 1875 to 1972. It is a realization of the process

Xt = (Average water level of Lake Huron in the year 1874 + t) − 570, t = 1, … , 98.

There seems to be a linear decline in the water level and it is therefore reasonable to fit a polynomial of order one to the data.
Evaluating the least squares estimators provides us with the values
^ ^
b 0 = 10.202 and b 1 = −0.0242

for the intercept and the slope, respectively. The resulting observed residuals y^ = Y^ (ω) are plotted against time in the right panel
t t

of Figure 1.7. There is no apparent trend left in the data. On the other hand, the plot does not strongly support the stationarity of the
residuals. Additionally, there is evidence of dependence in the data.
To reproduce the analysis in R, assume that the data is stored in the file lake.dat. Then use the following commands.

> lake = read.table("lake.dat")


> lake = ts(lake, start=1875)
> t = 1:length(lake)
> lsfit = lm(lake$^\mathrm{\sim}$t)
> plot(t, lake, xlab="", ylab="", main="")
> lines(lsfit{\$}fit)

The function lm fits a linear model or regression line to the Lake Huron data. To plot both the original data set and the fitted
regression line into the same graph, you can first plot the water levels and then use the lines function to superimpose the fit. The
residuals corresponding to the linear model fit can be accessed with the command lsfit$resid.
\end{exmp}

Method 2 (Smoothing with Moving Averages) Let (Xt : t ∈ Z) be a stochastic process following model 1.3.1 . Choose q ∈ N0

and define the two-sided moving average

1.3.1 https://stats.libretexts.org/@go/page/896
q
1
Wt = ∑ Xt+j , t ∈ Z. (1.3.2)
2q + 1
j=−q

The random variables W can be utilized to estimate the trend component m in the following way. First note that
t t

q q
1 1
Wt = ∑ mt+j + ∑ Yt+j ≈ mt ,
2q + 1 2q + 1
j=−q j=−q

assuming that the trend is locally approximately linear and that the average of the Y over the interval [t − q, t + q] is close to zero.
t

Therefore, m can be estimated by


t

^ t = Wt ,
m t = q + 1, … , n − q.

Notice that there is no possibility of estimating the first q and last n − q drift terms due to the two-sided nature of the moving
averages. In contrast, one can also define one-sided moving averages by letting
^ 1 = X1 ,
m ^ t = aXt + (1 − a)m
m ^ t−1 , t = 2, … , n.

Figure 1.8: The two-sided moving average filters Wt for the Lake Huron data (upper panel) and their residuals (lower panel)
with bandwidth q = 2 (left), q = 10 (middle) and q = 35 (right).
Figure 1.8 contains estimators m^ tbased on the two-sided moving averages for the Lake Huron data of Example 1.3.1. for selected
choices of q (upper panel) and the corresponding estimated residuals (lower panel).
The moving average filters for this example can be produced in R in the following way:
> t = 1:length(lake)
> ma2 = filter(lake, sides=2, rep(1,5)/5)
> ma10 = filter(lake, sides=2, rep(1,21)/21)
> ma35 = filter(lake, sides=2, rep(1,71)/71)
> plot(t, ma2, xlab="", ylab="",type="l")
> lines(t,ma10); lines(t,ma35)

Therein, sides determines if a one- or two-sided filter is going to be used. The phrase rep(1,5) creates a vector of length 5
with each entry being equal to 1.

1.3.2 https://stats.libretexts.org/@go/page/896
More general versions of the moving average smoothers can be obtained in the following way. Observe that in the case of the two-
sided version W each variable X , … , X
t t−q obtains a "weight" a = (2q + 1) . The sum of all weights thus equals one. The
t+q j
−1

same is true for the one-sided moving averages with weights a and 1 − a . Generally, one can hence define a smoother by letting
q

^ t = ∑ aj Xt+j ,
m t = q + 1, … , n − q, (1.3.3)

j=−q

where a + … + a = 1 . These general moving averages (two-sided and one-sided) are commonly referred to as linear filters.
−q q

There are countless choices for the weights. The one here, a = (2q + 1) , has the advantage that linear trends pass undistorted.
j
−1

In the next example, a filter is introduced which passes cubic trends without distortion.
Example 1.3.2 (Spencer's 15-point moving average). Suppose that the filter in display 1.3.3 is defined by weights satisfying
a = 0 if |j| > 7 , a = a
j j and
−j

1
(a0 , a1 , … , a7 ) = (74, 67, 46, 21, 3, −5, −6, −3).
320

Then, the corresponding filters passes cubic trends m t = b0 + b1 t + b2 t


2
+ b3 t
3
undistorted. To see this, observe that
7 7

r
∑ aj = 1 and ∑ j aj = 0, r = 1, 2, 3.

j=−7 j=−7

Now apply Proposition 1.3.1 below to arrive at the conclusion. Assuming that the observations are in data, use the R commands
> a = c(-3, -6, -5, 3, 21, 46, 67, 74, 67, 46, 21, 3, -5, -6, -3)/320
> s15 = filter(data, sides=2, a)
to apply Spencer's 15-point moving average filter. This example also explains how to specify a general tailor-made filter for a given
data set.
Proposition 1.3.1. A linear filter (1.3.3) passes a polynomial of degree p if and only if
r
∑ aj = 1 and ∑ j aj = 0, r = 1, … , p.

j j

Proof. It suffices to show that ∑ j


aj (t + j)
r
=t
r
for r = 0, … , p. Using the binomial theorem, write
r
r
r k r−k
∑ aj (t + j) = ∑ aj ∑ ( )t j
k
j j k=0

r
r
k r−k
= ∑( )t ( ∑ aj j )
k
k=0 j

r
=t

for any r = 0, … , p if and only if the above conditions hold.

1.3.3 https://stats.libretexts.org/@go/page/896
Figure 1.9: Time series plots of the observed sequences (∇xt) in the left panel and (∇2xt) in the right panel of the differenced
Lake Huron data described in Example 1.3.1.
Method 3 (Differencing) A third possibility to remove drift terms from a given time series is differencing. To this end, introduce
the difference operator ∇ as

∇Xt = Xt − Xt−1 = (1 − B)Xt , t ∈ T,

where B denotes the backshift operator BX t = Xt−1 . Repeated application of ∇ is defined in the intuitive way:
2
∇ Xt = ∇(∇Xt ) = ∇(Xt − Xt−1 ) = Xt − 2 Xt−1 + Xt−2

and, recursively, the representations follow also for higher powers of ∇. Suppose that the difference operator is applied to the linear
trend m = b + b t , then
t 0 1

∇mt = mt − mt−1 = b0 + b1 t − b0 − b1 (t − 1) = b1

p
which is a constant. Inductively, this leads to the conclusion that for a polynomial drift of degree p, namely m = ∑ b t , t j=0 j
j

p
∇ m = p! bt and thus constant. Applying this technique to a stochastic process of the form (1.3.1) with a polynomial drift m ,
p t

yields then
p p
∇ Xt = p! bp + ∇ Yt , t ∈ T.

This is a stationary process with mean p!b . The plots in Figure 1.9 contain the first and second differences for the Lake Huron
p

data. In R, they may be obtained from the commands

> d1 = diff(lake)
> d2 = diff(d1)
> par(mfrow=c(1,2))
> plot.ts(d1, xlab="", ylab="")
> plot.ts(d2, xlab="", ylab="")
The next example shows that the difference operator can also be applied to a random walk to create stationary data.
Example 1.3.3. Let (St : t ∈ N0 ) be the random walk of Example 1.2.3. If the difference operator ∇ is applied to this stochastic
process, then

∇St = St − St−1 = Zt , t ∈ N.

In other words, ∇ does nothing else but recover the original white noise sequence that was used to build the random walk.

1.3.4 https://stats.libretexts.org/@go/page/896
This page titled 1.3: Eliminating Trend Components is shared under a not declared license and was authored, remixed, and/or curated by
Alexander Aue.

1.3.5 https://stats.libretexts.org/@go/page/896
1.4: Eliminating Trend and Seasonal Components
Recall the classical decomposition (1.1.1),

Xt = mt + st + Yt , t ∈ T,

with E[Y ] = 0 . In this section, three methods are discussed that aim at estimating both the trend and seasonal components in the
t

data. As additional requirement on (s : t ∈ T ) , it is assumed that


t

st+d = st , ∑ sj = 0,

j=1

where d denotes the period of the seasonal component. (If dealing with yearly data sampled monthly, then obviously d = 12 .) It is
convenient to relabel the observations x , … , x in terms of the seasonal period d as
1 n

xj,k = xk+d(j−1) .

In the case of yearly data, observation x thus represents the data point observed for the k th month of the j th year. For
j,k

convenience the data is always referred to in this fashion even if the actual period is something other than 12.

Method 1 (Small trend method) If the changes in the drift term appear to be small, then it is reasonable to assume that the drift in
year j , say, m is constant. As a natural estimator one can therefore apply
j

d
1
^j =
m ∑ xj,k .
d
k=1

To estimate the seasonality in the data, one can in a second step utilize the quantities
N
1
^k =
s ^ j ),
∑(xj,k − m
N
j=1

where N is determined by the equation n = N d , provided that data has been collected over N full cycles. Direct calculations
show that these estimators possess the property s^ + … + s^ = 0 (as in the case of the true seasonal components s ). To further
1 d t

assess the quality of the fit, one needs to analyze the observed residuals
^
y ^ j −s
= xj,k − m ^k .
j,k

Note that due to the relabeling of the observations and the assumption of a slowly changing trend, the drift component is solely
described by the "annual'' subscript j , while the seasonal component only contains the "monthly'' subscript k .

1.4.1 https://stats.libretexts.org/@go/page/903
Figure 1.10: Time series plots of the red wine sales in Australia from January 1980 to October 1991 (left) and its log
transformation with yearly mean estimates (right).
Example 1.4.1 (Australian Wine Sales). The left panel of Figure 1.10 shows the monthly sales of red wine (in kiloliters) in
Australia from January 1980 to October 1991. Since there is an apparent increase in the fluctuations over time, the right panel of
the same figure shows the natural logarithm transform of the data. There is clear evidence of both trend and seasonality. In the
following, the log transformed data is studied. Using the small trend method as described above, the annual means are estimated
first. They are already incorporated in the right time series plot of Figure 1.10. Note that there are only ten months of data available
for the year 1991, so that the estimation has to be adjusted accordingly. The detrended data is shown in the left panel of Figure
1.11. The middle plot in the same figure shows the estimated seasonal component, while the right panel displays the residuals.
Even though the assumption of small changes in the drift is somewhat questionable, the residuals appear to look quite nice. They
indicate that there is dependence in the data (see Section 1.5 below for more on this subject).

Figure 1.11: The detrended log series (left), the estimated seasonal component (center) and the corresponding residuals series
(right) of the Australian red wine sales data.
Method 2 (Moving average estimation) This method is to be preferred over the first one whenever the underlying trend
component cannot be assumed constant. Three steps are to be applied to the data.
1st Step: Trend estimation. At first, focus on the removal of the trend component with the linear filters discussed in the previous
section. If the period d is odd, then one can directly use m
^ = W as in (1.3.2) with q specified by the equation d = 2q + 1 . If the
t t

period d = 2q is even, then slightly modify W and use


t

1
^t =
m (.5 xt−q + xt−q+1 + … + xt+q−1 + .5 xt+q ), t = q + 1, … , n − q.
d

2nd Step: Seasonality estimation. To estimate the seasonal component, let


N
1
μk = ∑(xk+d(j−1) − m
^ k+d(j−1) ), k = 1, … , q,
N −1
j=2

N −1
1
μk = ^ k+d(j−1) ),
∑ (xk+d(j−1) − m k = q + 1, … , d.
N −1
j=1

1.4.2 https://stats.libretexts.org/@go/page/903
Define now
d
1
^k = μk −
s ∑ μℓ , k = 1, … , d,
d
ℓ=1

and set s^ = s^
k k−d whenever k > d . This will provide us with deseasonalized data which can be examined further. In the final
step, any remaining trend can be removed from the data.
3rd Step: Trend Reestimation. Apply any of the methods from Section 1.3.
Method 3 (Differencing at lag d) Introducing the lag-d difference operator ∇ , defined by letting
d

d
∇d Xt = Xt − Xt−d = (1 − B )Xt , t = d + 1, … , n,

and assuming model (1.1.1), one arrives at the transformed random variables

∇d Xt = mt − mt−d + Yt − Yt−d , t = d + 1, … , n.

Note that the seasonality is removed, since s = s . The remaining noise variables Y − Y
t t−d are stationary and have zero mean.
t t−d

The new trend component m − m t t−d can be eliminated using any of the methods developed in Section 1.3.

Figure 1.12: The differenced observed series ∇ 12 xt (left), ∇x (middle) and ∇∇


t 12 xt = ∇12 ∇xt (right) for the Australian red
wine sales data.
Example 1.4.2 (Australian wine sales). Revisit the Australian red wine sales data of Example 1.4.1 and apply the differencing
techniques just established. The left plot of Figure 1.12 shows the the data after an application of the operator ∇ . If the remaining
12

trend in the data is estimated with the differencing method from Section 1.3, the residual plot given in the right panel of Figure 1.12
is obtained. Note that the order of application does not change the residuals, that is, ∇∇ x = ∇ ∇x . The middle panel of
12 t 12 t

Figure 1.12 displays the differenced data which still contains the seasonal component.

This page titled 1.4: Eliminating Trend and Seasonal Components is shared under a not declared license and was authored, remixed, and/or
curated by Alexander Aue.

1.4.3 https://stats.libretexts.org/@go/page/903
1.5: Assessing the Residuals
In this subsection, several goodness-of-fit tests are introduced to further analyze the residuals obtained after the elimination of trend
and seasonal components. The main objective is to determine whether or not these residuals can be regarded as obtained from a
sequence of independent, identically distributed random variables or if there is dependence in the data. Throughout Y , … , Y 1 n

denote the residuals and y , … , y a typical realization.


1 n

Method 1 (The sample ACF) It could be seen in Example 1.2.4 that, for j ≠ 0 , the estimators ρ^(j) of the ACF ρ(j) are
asymptotically independent and normally distributed with mean zero and variance n , provided the underlying residuals are −1

independent and identically distributed with a finite variance. Therefore, plotting the sample ACF for a certain number of lags, say

h , it is expected that approximately 95% of these values are within the bounds ±1.96/ √n. The R function acf helps to perform

this analysis. (See Theorem 1.2.1)

Method 2 (The Portmanteau test) The Portmanteau test is based on the test statistic
h
2
Q = n∑ρ
^ (j).

j=1

Using the fact that the variables √− ^(j) are asymptotically standard normal, it becomes apparent that Q itself can be

approximated with a chi-squared distribution possessing h degrees of freedom. The hypothesis of independent and identically
distributed residuals is rejected at the level α if Q > χ (h) , where χ (h) is the 1 − α quantile of the chi-squared distribution
2
1−α
2
1−α

with h degrees of freedom. Several refinements of the original Portmanteau test have been established in the literature. We refer
here only to the papers Ljung and Box (1978), and McLeod and Li (1983) for further information.

Method 3 (The rank test) This test is very useful for finding linear trends. Denote by

Π = #{(i, j) : Yi > Yj , i > j, i = 2, … , n}

the random number of pairs (i, j) satisfying the conditions Y > Y and i > j . There are ( ) = n(n − 1) pairs (i, j) such that
i j
n

2
1

i > j . If Y , … , Y
1 are independent and identically distributed, then P (Y > Y ) = 1/2 (assuming a continuous distribution).
n i j

Now it follows that μ = E[Π] = n(n − 1) and, similarly, σ = Var(Π) = n(n − 1)(2n + 5) . Moreover, for large enough
Π
1

4 Π
2 1

72

sample sizes n , Π has an approximate normal distribution with mean μ and variance σ . Consequently, the hypothesis of
Π
2
Π

independent, identically distributed data would be rejected at the level α if


|Π − μΠ |
P = > z1−α/2 ,
σΠ

where z 1−α/2
denotes the 1 − α/2 quantile of the standard normal distribution.

Method 4 (Tests for normality) If there is evidence that the data are generated by Gaussian random variables, one can create the
qq plot to check for normality. It is based on a visual inspection of the data. To this end, denote by Y < … < Y the order (1) (n)

statistics of the residuals Y , … , Y which are normally distributed with expected value μ and variance σ . It holds that
1 n
2

E[ Y(j) ] = μ + σE[ X(j) ], (1.5.1)

where X (1)
are the order statistics of a standard normal distribution. The qq plot is defined as the graph of the pairs
< … < X(n)

(E[ X ], Y
(1) ), … , (E[ X
(1) ], Y ). According to display (1.5.1), the resulting graph will be approximately linear with the
(n) (n)

squared correlation R of the points being close to 1. The assumption of normality will thus be rejected if R is "too'' small. It is
2 2

common to approximate E[X ] ≈ Φ = Φ ((j − .5)/n) (Φ being the distribution function of the standard normal distribution).
(j) j
−1

The previous statement is made precise by letting


2
n
¯
[∑j=1 (Y(j) − Y )Φj ]
2
R = ,
n n
¯ 2 2
∑ (Y(j) − Y ) ∑ Φ
j=1 j=1 j

1.5.1 https://stats.libretexts.org/@go/page/912
where Y¯ = (Y + … + Y ) . The critical values for
1

n
1 n R
2
are tabulated and can be found, for example in Shapiro and Francia
(1972). The corresponding R function is qqnorm.

This page titled 1.5: Assessing the Residuals is shared under a not declared license and was authored, remixed, and/or curated by Alexander Aue.

1.5.2 https://stats.libretexts.org/@go/page/912
1.6: Summary
In this chapter, the classical decomposition (1.1.1) of a time series into a drift component, a seasonal component and a sequence of
residuals was introduced. Methods to estimate the drift and the seasonality were provided. Moreover, the class of stationary
processes was identified as a reasonably broad class of random variables. Several ways were introduced to check whether or not the
resulting residuals can be considered to be independent, identically distributed. In Chapter 3, the class of autoregressive moving
average (ARMA) processes is discussed in depth, a parametric class of random variables that are at the center of linear time series
analysis because they are able to capture a wide range of dependence structures and allow for a thorough mathematical treatment.
Before, properties of the sample mean, sample ACVF and ACF are considered in the next chapter.

This page titled 1.6: Summary is shared under a not declared license and was authored, remixed, and/or curated by Alexander Aue.

1.6.1 https://stats.libretexts.org/@go/page/919
CHAPTER OVERVIEW

2: The Estimation of Mean and Covariances


In this brief second chapter, some results concerning asymptotic properties of the sample mean and the sample ACVF are collected.
Throughout, (X : t ∈ Z) denotes a weakly stationary stochastic process with mean μ and ACVF γ . In Section 1.2 it was shown
t

that such a process is completely characterized by these two quantities. The mean μ was estimated by the sample mean x̄ , and the
ACVF γ by the sample ACVF γ^ defined in (1.2.1). In the following, some properties of these estimators are discussed in more
detail.
2.1: Estimation of the Mean
2.2: Estimation of the Autocovariance Function

This page titled 2: The Estimation of Mean and Covariances is shared under a not declared license and was authored, remixed, and/or curated by
Alexander Aue.

1
2.1: Estimation of the Mean
Assume that an appropriate guess for the unknown mean μ of some weakly stationary stochastic process (X : t ∈ Z) has to be t

found. The sample mean x̄, easily computed as the average of n observations x , … , x of the process, has been identified as1 n

suitable in Section 1.2. To investigate its theoretical properties, one needs to analyze the random variable associated with it, that is,
1
¯
Xn = (X1 + … + Xn ).
n

Two facts can be quickly established.


¯
Xn is an unbiased estimator for μ , since
n n
1 1 1
¯
E[ X n ] = E [ ∑ Xt ] = ∑ E[ Xt ] = nμ = μ.
n n n
t=1 t=1

This means that "on average'', the true but unknown μ is correctly estimated. Notice that there is no difference in the computations
between the standard case of independent and identically distributed random variables and the more general weakly stationary
process considered here.
If γ(n) → 0 as n → ∞ , then X
¯
is a consistent estimator for μ , since
n

n n n n
1 1 1
¯
Var(X n ) = Cov ( ∑ Xs , ∑ Xt ) = ∑ ∑ Cov(Xs , Xt )
n n n2
s=1 t=1 s=1 t=1

n n
1 1 |h|
= ∑ (n − |s − t|)γ(s − t) = ∑ (1 − ) γ(h).
n2 n n
s−t=−n h=−n

Now, the quantity on the right-hand side converges to zero as n → ∞ because γ(n) → 0 as n → ∞ by assumption. The first
equality sign in the latter equation array follows from the fact that Var(X) = Cov(X, X) for any random variable X, the second
equality sign uses that the covariance function is linear in both arguments. For the third equality, one can use that
Cov(X , X ) = γ(s − t) and that each γ(s − t) appears exactly n − |s − t| times in the double summation. Finally, the right-
s t

hand side is obtained by replacing s − t with h and pulling one n inside the summation. −1

In the standard case of independent and identically distributed random variables nVar(X ¯
) = σ . The condition
2
γ(n) → 0 is
automatically satisfied. However, in the general case of weakly stationary processes, it cannot be omitted.
More can be proved using an appropriate set of assumptions. The results are formulated as a theorem without giving the proofs.

Theorem 2.1.1

Let (X : t ∈ Z) be a weakly stationary stochastic process with mean μ and ACVF γ. Then, the following statements hold true
t

as n → ∞ .
a. If ∑ ∞

h=−∞
|γ(h)| < ∞ , then

¯ 2
nVar(X n ) → ∑ γ(h) = τ ;

h=−∞

b. If the process is "close to Gaussianity'', then


n
|h|
− ¯ 2 2
√n (X n − μ) ∼ AN (0, τn ), τn = ∑ (1 − ) γ(h).
n
h=−n

Here, ∼ AN (0, τ 2
n ) stands for approximately normally distributed with mean zero and variance τ . 2
n

Theorem 2.1.1 can be utilized to construct confidence intervals for the unknown mean parameter μ . To do so, one must, however,
estimate the unknown variance parameter τ . For a large class of stochastic processes, it holds that τ converges to τ as n → ∞ .
n
2
n
2

Therefore, we can use τ as an approximation for τ . Moreover, τ can be estimated by


2 2
n
2

2.1.1 https://stats.libretexts.org/@go/page/857
√n
|h|
2
^ =
τ ∑ (1 − ^(h),

n
n
h=−√n

where γ^(h) denotes the ACVF estimator defined in (1.2.1). An approximate 95% confidence interval for μ can now be constructed
as
τ
^n τ
^n
( X̄ n − 1.96 , X̄ n + 1.96 ).
− −
√n √n

Example 2.1.1: Autoregressive Processes

Let (X t: t ∈ Z) be given by the equations


Xt − μ = ϕ(Xt−1 − μ) + Zt , t ∈ Z, (2.1.1)

where (Z : t ∈ Z) ∼ WN(0, σ ) and |ϕ| < 1 . It will be shown in Chapter 3 that (X : t ∈ Z) defines a weakly stationary
t
2
t

process. Utilizing the stochastic difference Equations ??? , both mean and autocovariances can be determined. It holds that
E[ X ] = ϕE[ X
t ] + μ(1 − ϕ) . Since, by stationarity, E[ X
t−1 ] can be substituted with E[ X ], it follows that
t−1 t

E[ Xt ] = μ, t ∈ Z.

In the following we shall work with the process (X : t ∈ Z) given by letting X = X − μ . Clearly, E[X ] = 0 . From the
c
t
c
t t t
c

definition, it follows also that the covariances of (X : t ∈ Z) and (X : t ∈ Z) coincide. First computing the second moment of
t t
c

X , gives
c
t

c 2 c 2 2 c 2 2
E[{ X } ] = E[(ϕX + Zt ) ] = ϕ E[{ X } ] +σ
t t−1 t−1

and consequently, since E[{X t−1


c 2
} ] = E[{ Xt } ]
c 2
by weak stationarity of (X c
t
: t ∈ Z) ,
2
2
σ
c
E[{ X } ] = , t ∈ Z.
t 2
1 −ϕ

It becomes apparent from the latter equation, why the condition |ϕ| < 1 was needed in display (2.1.1). In the next step, the
autocovariance function is computed. For h > 0 , it holds that
c c c c c c h
γ(h) = E[ X X ] = E[(ϕX + Zt+h )X ] = ϕE[ X X ] = ϕγ(h − 1) = ϕ γ(0)
t+h t t+h−1 t t+h−1 t

after h iterations. But since γ(0) = E[{X c


t
2
} ] , by symmetry of the ACVF, it follows that
2 |h|
σ ϕ
γ(h) = , h ∈ Z.
2
1 −ϕ

After these theoretical considerations, a 95% (asymptotic) confidence interval for the mean parameter μ can be constructed. To
check if Theorem 2.1.1 is applicable here, one needs to check if the autocovariances are absolutely summable:
∞ 2 ∞ 2
σ σ 2
2 h
τ = ∑ γ(h) = (1 + 2 ∑ ϕ ) = (1 + − 2)
2 2
1 −ϕ 1 −ϕ 1 −ϕ
h=−∞ h=1

2 2
σ 1 σ
= (1 + ϕ) = < ∞.
2 2
1 −ϕ 1 −ϕ (1 − ϕ)

Therefore, a 95% confidence interval for μ which is based on the observed values x 1, … , xn is given by
σ σ
( x̄ − 1.96 − , x̄ + 1.96 − ).
√n (1 − ϕ) √n (1 − ϕ)

Therein, the parameters σ and ϕ have to be replaced with appropriate estimators. These will be introduced in Chapter 3.

This page titled 2.1: Estimation of the Mean is shared under a not declared license and was authored, remixed, and/or curated by Alexander Aue.

2.1.2 https://stats.libretexts.org/@go/page/857
2.2: Estimation of the Autocovariance Function
This section deals with the estimation of the ACVF and ACF at lag h . Recall from equation (1.2.1) that the estimator
n−|h|
1
¯ ¯
γ
^(h) = ∑ (Xt+|h| − X n )(Xt − X n ), h = 0, ±1, … , ±(n − 1),
n
t=1

may be utilized as a proxy for the unknown γ(h). As estimator for the ACF ρ(h),
γ
^(h)
^(h) =
ρ , h = 0, ±1, … , ±(n − 1),
^(0)
γ

was identified. Some of the theoretical properties of ρ^(h) are briefly collected in the following. They are not as obvious to derive
as in the case of the sample mean, and all proofs are omitted. Note also that similar statements hold for γ^(h) as well.
The estimator ρ^(h) is generally biased, that is, E[ρ^(h)] ≠ ρ(h) . It holds, however, under non-restrictive assumptions that
^(h)] → ρ(h)
E[ ρ (n → ∞).

This property is called asymptotic unbiasedness.


The estimator ρ^(h) is consistent for ρ(h) under an appropriate set of assumptions, that is, Var(ρ^(h) − ρ(h)) → 0 as n → ∞ .
It was already established in Section 1.5 how the sample ACF ρ^ can be used to test if residuals consist of white noise variables. For
more general statistical inference, one needs to know the sampling distribution of ρ^ . Since the estimation of ρ(h) is based on only
a few observations for h close to the sample size n , estimates tend to be unreliable. As a rule of thumb, given by Box and Jenkins
(1976), n should at least be 50 and h less than or equal to n/4.

Theorem 2.2.1

For m ≥ 1 , let ρ = (ρ(1), … , ρ(m)) and


m
T
^
ρ m
^(1), … , ρ
= (ρ ^(m))
T
, where T
denotes the transpose of a vector. Under a
set of suitable assumptions, it holds that

^
√n (ρ − ρm ) ∼ AN (0, Σ) (n → ∞),
m

where ∼ AN (0, Σ) stands for approximately normally distributed with mean vector 0 and covariance matrix Σ = (σ ij ) given
by Bartlett's formula

σij = ∑ [ρ(k + i) + ρ(k − i) − 2ρ(i)ρ(k)][ρ(k + j) + ρ(k − j) − 2ρ(j)ρ(k)].

k=1

The section is concluded with two examples. The first one recollects the results already known for independent, identically
distributed random variables, the second deals with the autoregressive process of Example (2.2.1).

Example 2.2.1

Let (X t:
2
t ∈ Z) ∼ IID(0, σ ) . Then, ρ(0) = 1 and ρ(h) = 0 for all h ≠ 0 . The covariance matrix Σ is therefore given by

σij = 1 if i = j and σij = 0 if i ≠ j.

This means that Σ is a diagonal matrix. In view of Theorem 2.2.1 it holds thus that the estimators ρ^(1), … , ρ^(k) are
approximately independent and identically distributed normal random variables with mean 0 and variance 1/n. This was the
basis for Methods 1 and 2 in Section 1.6 (see also Theorem 1.2.1).

Example 2.2.2

Reconsider the autoregressive process (X t: t ∈ Z) from Example 2.1.1 with μ = 0 . Dividing γ(h) by γ(0) yields that
|h|
ρ(h) = ϕ , h ∈ Z.

Now the diagonal entries of Σ are computed as

2.2.1 https://stats.libretexts.org/@go/page/858

2
σii = ∑ [ρ(k + i) + ρ(k − i) − 2ρ(i)ρ(k)]

k=1

i ∞

2i −k k 2 2k −i i 2
= ∑ϕ (ϕ −ϕ ) + ∑ ϕ (ϕ −ϕ )

k=1 k=i+1

2i 2 2 −1 2i
= (1 − ϕ )(1 + ϕ )(1 − ϕ ) − 2i ϕ .

This page titled 2.2: Estimation of the Autocovariance Function is shared under a not declared license and was authored, remixed, and/or curated
by Alexander Aue.

2.2.2 https://stats.libretexts.org/@go/page/858
CHAPTER OVERVIEW

3: ARMA Processes
In this chapter autoregressive moving average processes are discussed. They play a crucial role in specifying time series models for
applications. As the solutions of stochastic difference equations with constant coefficients and these processes possess a linear
structure.
3.1: Introduction to Autoregressive Moving Average (ARMA) Processes
3.2: Causality and Invertibility
3.3: The PACF of a Causal ARMA Process
3.4: Forecasting
3.5: Parameter Estimation
3.6: Model Selection
3.7: Summary

This page titled 3: ARMA Processes is shared under a not declared license and was authored, remixed, and/or curated by Alexander Aue.

1
3.1: Introduction to Autoregressive Moving Average (ARMA) Processes
In this chapter autoregressive moving average processes are discussed. They play a crucial role in specifying time series models for
applications. As the solutions of stochastic difference equations with constant coefficients and these processes possess a linear
structure.

Definition 3.1.1: ARMA processes


(a) A weakly stationary process X : t ∈ Z is called an autoregressive moving average time series of order p, q, abbreviated by
t

ARM A(p, q), if it satisfies the difference equations

Xt = ϕ1 Xt−1 + … + ϕp Xt−p + Zt + θ1 Zt−1 + … + θq Zt−q , t ∈ Z, (3.1.1)

where ϕ 1, … , ϕp and θ
1, … , θq are real constants, ϕ
p ≠ 0 ≠ θq , and (Zt:
2
t ∈ Z) ∼ WN(0, σ ) .
(b) A weakly stationary stochastic process Xt : t ∈ Z is called an ARM A(p, q) time series with mean μ if the process
X − μ: t ∈ Z satisfies the equation system.
t

A more concise representation of Equation 3.1.1 can be obtained with the use of the backshift operator B . To this end, define the
autoregressive polynomial and the moving average polynomial by
2 p
ϕ(z) = 1 − ϕ1 z − ϕ2 z − … − ϕp z , z ∈ C,

and
2 q
θ(z) = 1 + θ1 z + θ2 z + … + θq z , z ∈ C,

respectively, where C denotes the set of complex numbers. Inserting the backshift operator into these polynomials, the equations in
(3.1.1) become
ϕ(B)Xt = θ(B)Zt , t ∈ Z. (3.1.2)

Figure 3.1: Realizations of three autoregressive moving average processes.

Example 3.1.1 Figure 3.1 displays realizations of three different autoregressive moving average time series based on independent,
standard normally distributed (Z : t ∈ Z) . The left panel is an ARMA(2,2) process with parameter specifications ϕ = .2 ,
t 1

ϕ = −.3 , θ = −.5 and θ = .3 . The middle plot is obtained from an ARMA(1,4) process with parameters ϕ = .3 , θ = −.2 ,
2 1 2 1 1

θ = −.3 , θ = .5 , and θ = .2 , while the right plot is from an ARMA(4,1) with parameters ϕ = −.2 , ϕ = −.3 , ϕ = .5 and
2 3 4 1 2 3

ϕ = .2 and θ = .6 . The plots indicate that ARMA models can provide a flexible tool for modeling diverse residual sequences. It
4 1

will turn out in the next section that all three realizations here come from (strictly) stationary processes. Similar time series plots
can be produced in R using the commands
>arima22 =
arima.sim(list(order=c(2,0,2), ar=c(.2,-.3), ma=c(-.5,.3)), n=100)
>arima14 =
arima.sim(list(order=c(1,0,4), ar=.3, ma=c(-.2,-.3,.5,.2)), n=100)
>arima41 =
arima.sim(list(order=c(4,0,1), ar=c(-.2,-.3,.5,.2), ma=.6), n=100)

3.1.1 https://stats.libretexts.org/@go/page/860
Some special cases covered in the following two examples have particular relevance in time series analysis.

Figure 3.2: Realizations of three autoregressive processes.


Example 3.1.2 (AR Processes) If the moving average polynomial in (3.1.2) is equal to one, that is, if θ(z) ≡ 1 , then the resulting
(X : t ∈ Z) is referred to as autoregressive process of order p , AR(p). These time series interpret the value of the current variable
t

X as a linear combination of p previous variables X


t ,…,X t−1plus an additional distortion by the white noise Z . Figure 3.1.2
t−p t

displays two AR(1) processes with respective parameters ϕ = −.9 (left) and ϕ = .8 (middle) as well as an AR(2) process with
1 1

parameters ϕ = −.5 and ϕ = .3 . The corresponding R commands are


1 2

>ar1neg = arima.sim(list(order=c(1,0,0), ar=-.9), n=100)


>ar1pos = arima.sim(list(order=c(1,0,0), ar=.8), n=100)
>ar2 = arima.sim(list(order=c(2,0,0), ar=c(-.5,.3)), n=100)

Figure 3.3: Realizations of three moving average processes.


Example 3.1.3 (MA Processes) If the autoregressive polynomial in (3.1.2) is equal to one, that is, if ϕ(z) ≡ 1 , then the resulting
(X : t ∈ Z) is referred to as moving average process of order q , MA(q )}. Here the present variable X is obtained as superposition
t t

of q white noise terms Z , … , Z . Figure (3.1.3) shows two MA(1) processes with respective parameters θ = .5 (left) and
t t−q 1

θ = −.8 (middle). The right plot is observed from an MA(2) process with parameters θ = −.5 and θ = .3 . In R one may use
1 1 2

> ma1pos = arima.sim(list(order=c(0,0,1), ma=.5), n=100)


> ma1neg = arima.sim(list(order=c(0,0,1), ma=-.8), n=100)
> ma2 = arima.sim(list(order=c(0,0,2), ma=c(-.5,.3)), n=100)
For the analysis upcoming in the next chapters, we now introduce moving average processes of infinite order (q = ∞) . They are an
important tool for determining stationary solutions to the difference equations (3.1.1).
Definition 3.1.2 Linear processes
A stochastic process (X : t ∈ Z) is called linear process or MA(∞) time series if there is a sequence
t (ψj : j ∈ N0 ) with
| ψ | < ∞ such that

∑ j
j=0

Xt = ∑ ψj Zt−j , t ∈ Z, (3.1.3)

j=0

where (Z t: t ∈ Z) ∼ WN(0, σ )
2
.

3.1.2 https://stats.libretexts.org/@go/page/860
Moving average time series of any order q are special cases of linear processes. Just pick ψ j = θj for j = 1, … , q and set ψj = 0

if j > q . It is common to introduce the power series


j
ψ(z) = ∑ ψj z , z ∈ C,

j=0

to express a linear process in terms of the backshift operator. Display (3.1.3) can now be rewritten in the compact form

Xt = ψ(B)Zt , t ∈ Z.

With the definitions of this section at hand, properties of ARMA processes, such as stationarity and invertibility, are investigated in
the next section. The current section is closed giving meaning to the notation X = ψ(B)Z . Note that one is possibly dealing with
t t

an infinite sum of random variables. For completeness and later use, in the following example the mean and ACVF of a linear
process are derived.
Example 3.1.4 Mean and ACVF of a linear process
Let (X t: t ∈ Z) be a linear process according to Definition 3.1.2. Then, it holds that
∞ ∞

E[ Xt ] = E [ ∑ ψj Zt−j ] = ∑ ψj E[ Zt−j ] = 0, t ∈ Z.

j=0 j=0

Next observe also that


γ(h) = Cov(Xt+h , Xt )

∞ ∞

= E [ ∑ ψj Zt+h−j ∑ ψk Zt−k ]

j=0 k=0

2
=σ ∑ ψk+h ψk < ∞

k=0

by assumption on the sequence (ψ j: j ∈ N0 ) .

Contributers
Alexander Aue (Department of Statistics, University of California, Davis)

This page titled 3.1: Introduction to Autoregressive Moving Average (ARMA) Processes is shared under a not declared license and was authored,
remixed, and/or curated by Alexander Aue.

3.1.3 https://stats.libretexts.org/@go/page/860
3.2: Causality and Invertibility
While a moving average process of order q will always be stationary without conditions on the coefficients θ ,…,θ , some deeper 1 q

thoughts are required in the case of AR(p) and ARMA(p, q) processes. For simplicity, we start by investigating the autoregressive
process of order one, which is given by the equations X = ϕX + Z (writing ϕ = ϕ ). Repeated iterations yield that
t t−1 t 1

N −1

2 N j
Xt = ϕXt−1 + Zt = ϕ Xt−2 + Zt + ϕZt−1 = … = ϕ Xt−N + ∑ ϕ Zt−j .

j=0

Letting N → ∞ , it could now be shown that, with probability one,


j
Xt = ∑ ϕ Zt−j (3.2.2)

j=0

is the weakly stationary solution to the AR(1) equations, provided that |ϕ| < 1 . These calculations would indicate moreover, that an
autoregressive process of order one can be represented as linear process with coefficients ψ = ϕ . j
j

Example 3.2.1: Mean and ACVF of an AR(1) process

Since an autoregressive process of order one has been identified as an example of a linear process, one can easily determine its
expected value as

j
E[ Xt ] = ∑ ϕ E[ Zt−j ] = 0, t ∈ Z.

j=0

For the ACVF, it is obtained that


γ(h) = Cov(Xt+h , Xt )

∞ ∞

j k
= E [ ∑ ϕ Zt+h−j ∑ ϕ Zt−k ]

j=0 k=0

∞ ∞ 2 h
2 k+h k 2 h 2k
σ ϕ
=σ ∑ϕ ϕ =σ ϕ ∑ϕ = ,
2
1 −ϕ
k=0 k=0

where h ≥0 . This determines the ACVF for all h using that γ(−h) = γ(h) . It is also immediate that the ACF satisfies
ρ(h) = ϕ
h
. See also Example 3.1.1 for comparison.

Example 3.2.2: Nonstationary AR(1) processes

In Example 1.2.3 we have introduced the random walk as a nonstationary time series. It can also be viewed as a nonstationary
AR(1) process with parameter ϕ = 1 . In general, autoregressive processes of order one with coefficients |ϕ| > 1 are called {\it
explosive}\/ for they do not admit a weakly stationary solution that could be expressed as a linear process. However, one may
proceed as follows. Rewrite the defining equations of an AR(1) process as
−1 −1
Xt = −ϕ Zt+1 + ϕ Xt+1 , t ∈ Z.

Apply now the same iterations as before to arrive at


N
−N −j
Xt = ϕ Xt+N − ∑ ϕ Zt+j , t ∈ Z.

j=1

Note that in the weakly stationary case, the present observation has been described in terms of past innovations. The
representation in the last equation however contains only future observations with time lags larger than the present time t .
From a statistical point of view this does not make much sense, even though by identical arguments as above we may obtain

3.2.1 https://stats.libretexts.org/@go/page/845

−j
Xt = − ∑ ϕ Zt+j , t ∈ Z,

j=1

as the weakly stationary solution in the explosive case.

The result of the previous example leads to the notion of causality which means that the process (X : t ∈ Z) has a representation t

in terms of the white noise (Z : s ≤ t) and that is hence uncorrelated with the future as given by (Z : s > t) . We give the
s s

definition for the general ARMA case.

Definition: Causality
An ARMA(p, q) process given by (3.1.1) is causal if there is a sequence (ψ j : j ∈ N0 ) such that ∑ ∞

j=0
| ψj | < ∞ and

Xt = ∑ ψj Zt−j , t ∈ Z.

j=0

Causality means that an ARMA time series can be represented as a linear process. It was seen earlier in this section how an AR(1)
process whose coefficient satisfies the condition |ϕ| < 1 can be converted into a linear process. It was also shown that this is
impossible if |ϕ| > 1 . The conditions on the autoregressive parameter ϕ can be restated in terms of the corresponding
autoregressive polynomial ϕ(z) = 1 − ϕz as follows. It holds that
|ϕ| < 1 if and only if ϕ(z) ≠ 0 for all |z| ≤ 1,

|ϕ| > 1 if and only if ϕ(z) ≠ 0 for all |z| ≥ 1 .


It turns out that the characterization in terms of the zeroes of the autoregressive polynomials carries over from the AR(1) case to the
general ARMA(p, q) case. Moreover, the ψ -weights of the resulting linear process have an easy representation in terms of the
polynomials ϕ(z) and θ(z) . The result is summarized in the next theorem.

Theorem 3.2.1
Let (Xt : t ∈ Z)be an ARMA(p, q) process such that the polynomials ϕ(z) and θ(z) have no common zeroes. Then
(Xt : t ∈ Z)is causal if and only if ϕ(z) ≠ 0 for all z ∈ C with |z| ≤ 1 . The coefficients (ψ : j ∈ N ) are determined by thej 0

power series expansion



θ(z)
j
ψ(z) = ∑ ψj z = , |z| ≤ 1.
ϕ(z)
j=0

A concept closely related to causality is invertibility. This notion is motivated with the following example that studies properties of
a moving average time series of order 1.

Example 3.2.3

Let (X t: t ∈ N) be an MA(1) process with parameter θ = θ . It is an easy exercise to compute the ACVF and the ACF as
1

2 2
⎧ (1 + θ )σ , h = 0, ⎧ 1 h = 0.
⎪ ⎪
2 −1
γ(h) = ⎨ θσ 2 , h =1 ρ(h) = ⎨ θ(1 + θ ) , h = 1.

⎪ ⎩

0 h > 1, 0 h > 1.

These results lead to the conclusion that ρ(h) does not change if the parameter θ is replaced with θ . Moreover, there exist −1

pairs (θ, σ ) that lead to the same ACVF, for example (5, 1) and (1/5, 25). Consequently, we arrive at the fact that the two
2

MA(1) models
1
Xt = Zt + Zt−1 , t ∈ Z, (Zt : t ∈ Z) ∼ iid N (0, 25),
5

and

3.2.2 https://stats.libretexts.org/@go/page/845
~ ~ ~
Xt = Z t + 5 Z t−1 , t ∈ Z, (Z : t ∈ Z) ∼ iid N (0, 1),

~
are indistinguishable because we only observe X but not the noise variables Z and Z .
t t t

For convenience, the statistician will pick the model which satisfies the invertibility criterion which is to be defined next. It
specifies that the noise sequence can be represented as a linear process in the observations.

Definition: Invertibility
An ARMA(p, q) process given by (3.1.1) is invertible if there is a sequence (π j: j ∈ N0 ) such that ∑ ∞

j=0
| πj | < ∞ and

Zt = ∑ πj Xt−j , t ∈ Z.

j=0

Theorem 3.2.2
Let (Xt : t ∈ Z)be an ARMA(p, q) process such that the polynomials ϕ(z) and θ(z) have no common zeroes. Then
(Xt : t ∈ Z)is invertible if and only if θ(z) ≠ 0 for all z ∈ C with |z| ≤ 1 . The coefficients (π ) are determined by the j j∈N0

power series expansion



ϕ(z)
j
π(z) = ∑ πj z = , |z| ≤ 1.
θ(z)
j=0

From now on it is assumed that all ARMA sequences specified in the sequel are causal and invertible unless explicitly stated
otherwise. The final example of this section highlights the usefulness of the established theory. It deals with parameter redundancy
and the calculation of the causality and invertibility sequences (ψ : j ∈ N ) and (π : j ∈ N ) .
j 0 j 0

Example 3.2.4: Parameter redundancy

Consider the ARMA equations

Xt = .4 Xt−1 + .21 Xt−2 + Zt + .6 Zt−1 + .09 Zt−2 ,

which seem to generate an ARMA(2,2) sequence. However, the autoregressive and moving average polynomials have a
common zero:
~ 2
ϕ (z) = 1 − .4z − .21 z = (1 − .7z)(1 + .3z),

~ 2 2
θ (z) = 1 + .6z + .09 z = (1 + .3z) .

Therefore, one can reset the ARMA equations to a sequence of order (1,1) and obtain

Xt = .7 Xt−1 + Zt + .3 Zt−1 .

Now, the corresponding polynomials have no common roots. Note that the roots of ϕ(z) = 1 − .7z and θ(z) = 1 + .3z are
10/7 > 1 and −10/3 < −1, respectively. Thus Theorems 3.2.1 and 3.2.2 imply that causal and invertible solutions exist. In

the following, the corresponding coefficients in the expansions


∞ ∞

Xt = ∑ ψj Zt−j and Zt = ∑ πj Xt−j , t ∈ Z,

j=0 j=0

are calculated. Starting with the causality sequence (ψ j : j ∈ N0 ) . Writing, for |z| ≤ 1 ,
∞ ∞
θ(z) 1 + .3z
j j
∑ ψj z = ψ(z) = = = (1 + .3z) ∑(.7z) ,
ϕ(z) 1 − .7z
j=0 j=0

it can be obtained from a comparison of coefficients that


j−1 j−1
ψ0 = 1 and ψj = (.7 + .3)(.7 ) = (.7 ) , j ∈ N.

3.2.3 https://stats.libretexts.org/@go/page/845
Similarly one computes the invertibility coefficients (π j : j ∈ N0 ) from the equation
∞ ∞
ϕ(z) 1 − .7z
j j
∑ πj z = π(z) = = = (1 − .7z) ∑(−.3z)
θ(z) 1 + .3z
j=0 j=0

(|z| ≤ 1 ) as
j j−1 j j−1
π0 = 1 and πj = (−1 ) (.3 + .7)(.3 ) = (−1 ) (.3 ) .

Together, the previous calculations yield to the explicit representations


∞ ∞

j−1 j j−1
Xt = Zt + ∑(.7 ) Zt−j and Zt = Xt + ∑(−1 ) (.3 ) Xt−j .

j=1 j=1

In the remainder of this section, a general way is provided to determine the weights (ψ : j ≥ 1) for a causal ARMA(p, q) process
j

given by ϕ(B)X = θ(B)Z , where ϕ(z) ≠ 0 for all z ∈ C such that |z| ≤ 1 . Since ψ(z) = θ(z)/ϕ(z) for these z , the weight ψ
t t j

can be computed by matching the corresponding coefficients in the equation ψ(z)ϕ(z) = θ(z) , that is,
2 p q
(ψ0 + ψ1 z + ψ2 z + …)(1 − ϕ1 z − … − ϕp z ) = 1 + θ1 z + … + θq z .

Recursively solving for ψ 0, ψ1 , ψ2 , … gives


ψ0 = 1,

ψ1 − ϕ1 ψ0 = θ1 ,

ψ2 − ϕ1 ψ1 − ϕ2 ψ0 = θ2 ,

and so on as long as j < max{p, q + 1} . The general solution can be stated as


j

ψj − ∑ ϕk ψj−k = θj , 0 ≤ j < max{p, q + 1}, (3.2.1)

k=1

ψj − ∑ ϕk ψj−k = 0, j ≥ max{p, q + 1}, (3.2.2)

k=1

if we define ϕ = 0 if j > p and θ = 0 if j > q . To obtain the coefficients ψ one therefore has to solve the homogeneous linear
j j j

difference equation (3.2.2) subject to the initial conditions specified by (3.2.1). For more on this subject, see Section 3.6 of
Brockwell and Davis (1991) and Section 3.3 of Shumway and Stoffer (2006).

R calculations

In R, these computations can be performed using the command ARMAtoMA. For example, one can use the commands
>ARMAtoMA(ar=.7,ma=.3,25)
>plot(ARMAtoMA(ar=.7,ma=.3,25))
which will produce the output displayed in Figure 3.4. The plot shows nicely the exponential decay of the ψ -weights which is
typical for ARMA processes. The table shows row-wise the weights ψ , … , ψ . This is enabled by the choice of 25 in the
0 24

argument of the function ARMAtoMA.

1.0000000000 0.7000000000 0.4900000000 0.3430000000 0.2401000000

0.1680700000 0.1176490000 0.0823543000 0.0576480100 0.0403536070

0.0282475249 0.0197732674 0.0138412872 0.0096889010 0.0067822307

0.0047475615 0.0033232931 0.0023263051 0.0016284136 0.0011398895

0.0007979227 0.0005585459 0.0003909821 0.0002736875 0.0001915812

3.2.4 https://stats.libretexts.org/@go/page/845
Figure 3.4: The R output for the ARMA(1,1) process of Example 3.2.4

Contributers
Alexander Aue (Department of Statistics, University of California, Davis)

This page titled 3.2: Causality and Invertibility is shared under a not declared license and was authored, remixed, and/or curated by Alexander
Aue.

3.2.5 https://stats.libretexts.org/@go/page/845
3.3: The PACF of a Causal ARMA Process
In this section, the partial autocorrelation function (PACF) is introduced to further assess the dependence structure of stationary
processes in general and causal ARMA processes in particular. To start with, let us compute the ACVF of a moving average
process of order q

Example 3.3.1: The ACVF of an MA(q ) process

Let (X t: t ∈ Z) be an MA(q) process specified by the polynomial θ(z) = 1 + θ 1z + … + θq z


q
. Then, letting θ0 =1 , it holds
that
q

E[ Xt ] = ∑ θj E[ Zt−j ] = 0.

j=0

Solution
To compute the ACVF, suppose that h ≥ 0 and write

γ(h) = C ov(Xt+h , Xt ) = E[ Xt+h Xt ]

q q

= E [( ∑ θj Zt+h−j ) ( ∑ θk Zt−k )]

j=0 k=0

q q

= ∑ ∑ θj θk E[ Zt+h−j Zt−k ]

j=0 k=0

q−h


⎪ 2
⎪σ ∑θ
k+h θk , 0 ≤ h ≤ q.
=⎨ k=0





0, h > q.

The result here is a generalization of the MA(1) case, which was treated in Example 3.2.3. It is also a special case of the linear
process in Example 3.1.4. The structure of the ACVF for MA processes indicates a possible strategy to determine in practice
the unknown order q: plot the the sample ACF and select as order q the largest lag such that ρ(h) is significantly different from
zero.

While the sample ACF can potentially reveal the true order of an MA process, the same is not true anymore in the case of AR
processes. Even for the AR(1) time series it has been shown in Example 3.2.1 that its ACF ρ(h) = ϕ is nonzero for all lags. As|h|

further motivation, however, we discuss the following example.


Example 3.3.2
Let (X t: t ∈ Z) be a causal AR(1) process with parameter |ϕ| < 1 . It holds that
2 2
γ(2) = C ov(X2 , X0 ) = C ov(ϕ X0 + ϕZ1 + Z2 , X0 ) = ϕ γ(0) ≠ 0.

To break the linear dependence between X and X , subtract ϕX from both variables. Calculating the resulting covariance yields
0 2 1

C ov(X2 − ϕX1 , X0 − ϕX1 ) = C ov(Z2 , X0 − ϕX1 ) = 0,

since, due to the causality of this AR(1) process, X0 − ϕX1 is a function of Z1 , Z0 , Z−1 , … and therefore uncorrelated with
X − ϕX = Z .
2 1 2

The previous example motivates the following general definition.


Definition 3.3.1 Partial autocorrelation function
Let (X t: t ∈ Z) be a weakly stationary stochastic process with zero mean. Then, the sequence (ϕ hh : h ∈ N) given by

3.3.1 https://stats.libretexts.org/@go/page/873
ϕ11 = ρ(1) = C orr(X1 , X0 ),

h−1 h−1
ϕhh = C orr(Xh − X , X0 − X ), h ≥ 2,
h 0

is called the partial autocorrelation function (PACF) of (X t: t ∈ Z) .


Therein,
h−1
X = regression of Xh on (Xh−1 , … , X1 )
h

= β1 Xh−1 + β2 Xh−2 + … + βh−1 X1

h−1
X = regression of X0 on (X1 , … , Xh−1 )
0

= β1 X1 + β2 X2 + … + βh−1 Xh−1 .

Notice that there is no intercept coefficient β in the regression parameters, since it is assumed that
0 E[ Xt ] = 0 . The following
example demonstrates how to calculate the regression parameters in the case of an AR(1) process.

Figure 3.5 The ACFs and PACFs of an AR(2) process (upper panel), and MA(3) process (middle panel) and and ARMA(1,1)
process (lower panel).
Example 3.3.3 PACF of an AR(1) process]
If (X : t ∈ Z) is a causal AR(1) process, then ϕ = ρ(1) = ϕ . To calculate ϕ , calculate first
t 11 22 X
2
1
= β X1 , that is β . This
coefficient is determined by minimizing the mean-squared error between X and βX : 2 1

2 2
E[ X2 − β X1 ] = γ(0) − 2βγ(1) + β γ(0)

which is minimized by β = ρ(1) = ϕ . (This follows easily by taking the derivative and setting it to zero.) Therefore X = ϕX . 1
2
1

Similarly, one computes X = ϕX and it follows from Example 3.3.2 that ϕ = 0 . Indeed all lags h ≥ 2 of the PACF are zero.
1
0
1 22

More generally, consider briefly a causal AR(p) process given by ϕ(B)X t = Zt with ϕ(z) = 1 − ϕ
1z − … − ϕp z
p
.
Then, for h > p ,

3.3.2 https://stats.libretexts.org/@go/page/873
p

h−1
X = ∑ ϕj Xh−j
h

j=1

and consequently
h−1 h−1 h−1
ϕhh = C orr(Xh − X , X0 − X ) = C orr(Zh , X0 − X ) =0
h 0 0

if h > p by causality (the same argument used in Example 3.3.2 applies here as well). Observe, however, that ϕ is not hh

necessarily zero if h ≤ p . The forgoing suggests that the sample version of the PACF can be utilized to identify the order of an
autoregressive process from data: use as p the largest lag h such that ϕ is significantly different from zero.
hh

On the other hand, for an invertible MA(q) process, one can write Z t = π(B)Xt or, equivalently,

Xt = − ∑ πj Xt−j + Zt

j=1

which shows that the PACF of an MA(q) process will be nonzero for all lags, since for a ``perfect'' regression one would have to
use all past variables (X : s < t) instead of only the quantity X
s given in Definition 3.3.1.
t
t−1

In summary, the PACF reverses the behavior of the ACVF for autoregressive and moving average processes. While the latter have
an ACVF that vanishes after lag q and a PACF that is nonzero (though decaying) for all lags, AR processes have an ACVF that is
nonzero (though decaying) for all lags but a PACF that vanishes after lag p.
ACVF (ACF) and PACF hence provide useful tools in assessing the dependence of given ARMA processes. If the estimated ACVF
(the estimated PACF) is essentially zero after some time lag, then the underlying time series can be conveniently modeled with an
MA (AR) process---and no general ARMA sequence has to be fitted. These conclusions are summarized in Table 3.3.1

Table 3.1: The behavior of ACF and PACF for AR, MA, and ARMA processes.
Example 3.3.4
Figure 3.5 collects the ACFs and PACFs of three ARMA processes. The upper panel is taken from the AR(2) process with
parameters ϕ = 1.5 and ϕ = −.75 . It can be seen that the ACF tails off and displays cyclical behavior (note that the
1 2

corresponding autoregressive polynomial has complex roots). The PACF, however, cuts off after lag 2. Thus, inspecting ACF and
PACF, we would correctly specify the order of the AR process.
The middle panel shows the ACF and PACF of the MA(3) process given by the parameters θ 1 = 1.5 ,θ2 = −.75 and θ3 =3 . The
plots confirm that q = 3 because the ACF cuts off after lag 3 and the PACF tails off.
Finally, the lower panel displays the ACF and PACF of the ARMA(1,1) process of Example 3.2.4. Here, the assessment is much
harder. While the ACF tails off as predicted (see Table 3.1), the PACF basically cuts off after lag 4 or 5. This could lead to the
wrong conclusion that the underlying process is actually an AR process of order 4 or 5. (The reason for this behavior lies in the fact
that the dependence in this particular ARMA(1,1) process can be well approximated by that of an AR(4) or AR(5) time series.)
To reproduce the graphs in R, you can use the commands
>ar2.acf=ARMAacf(ar=c(1.5,-.75), ma=0, 25)
>ar2.pacf=ARMAacf(ar=c(1.5,-.75), ma=0, 25, pacf=T)
for the AR(2) process. The other two cases follow from straightforward adaptations of this code.

3.3.3 https://stats.libretexts.org/@go/page/873
Figure 3.6: The recruitment series of Example 3.3.5 (left), its sample ACF (middle) and sample PACF (right).

Figure 3.7: Scatterplot matrix relating current recruitment to past recruitment for the lags h = 1, … , 12 .
Example 3.3.5 Recruitment Series
The data considered in this example consists of 453 months of observed recruitment (number of new fish) in a certain part of the
Pacific Ocean collected over the years 1950--1987. The corresponding time series plot is given in the left panel of Figure 3.6. The
corresponding ACF and PACF displayed in the middle and right panel of the same figure recommend fitting an AR process of
order p = 2 to the recruitment data. Assuming that the data is in rec, the R code to reproduce Figure 3.6 is
> rec = ts(rec, start=1950, frequency=12)
> plot(rec, xlab="", ylab="")
> acf(rec, lag=48)
> pacf(rec, lag=48)
This assertion is also consistent with the scatterplots that relate current recruitment to past recruitment at several time lags, namely
h = 1, … , 12. For lag 1 and 2, there seems to be a strong linear relationship, while this is not the case anymore for h ≥ 3 . The

corresponding R commands are


> lag.plot(rec, lags=12, layout=c(3,4), diag=F)
Denote by X the recruitment at time t . To estimate the AR(2) parameters, run a regression on the observed data triplets included
t

in the set (x , x − 1, x − 2): j = 3, … , 453 to fit a model of the form


t t t

Xt = ϕ0 + ϕ1 Xt − 1 + ϕ2 Xt − 2 + Zt , t = 3, … , 453,

where (Z t) ∼ WN (0, σ )
2
. This task can be performed in R as follows.

3.3.4 https://stats.libretexts.org/@go/page/873
> fit.rec = ar.ols(rec, aic=F, order.max=2, demean=F, intercept=T)
These estimates can be assessed with the command {\tt fit.rec} and the corresponding standard errors with fit. rec$asy. se .
Here the parameter estimates ϕ^ = 6.737(1.111), ϕ^ = 1.3541(.042), ϕ^ = −.4632(.0412) and σ
0 1 2
^ = 89.72 are obtained. The
2

standard errors are given in parentheses.

This page titled 3.3: The PACF of a Causal ARMA Process is shared under a not declared license and was authored, remixed, and/or curated by
Alexander Aue.

3.3.5 https://stats.libretexts.org/@go/page/873
3.4: Forecasting
Suppose that the variables X , … , X of a weakly stationary time series (X : t ∈ Z) have been observed with the goal to predict
1 n t

or forecast the future values of X , X , …. The focus is here on so-called one-step best linear predictors (BLP). These are, by
n+1 n+2

definition, linear combinations


^
X n+1 = ϕn0 + ϕn1 Xn + … + ϕnn X1 (3.4.1)

of the observed variables X 1, … , Xn that minimize the mean-squared error


2
E [{ Xn+1 − g(X1 , … , Xn )} ]

for functions g of X , … , X . Straightforward generalizations yield definitions for the m-step best linear predictors X
1 n
^
of n+m

X n+m for arbitrary m ∈ N in the same fashion. Using Hilbert space theory, one can prove the following theorem which will be the
starting point for our considerations.

Theorem 3.4.1: Best linear prediction (BLP)

Let (X : t ∈ Z) be a weakly stationary stochastic process of which


t X1 , … , Xn are observed. Then, the one-step BLP ^
X n+1

of X n+1is determined by the equations

^
E [(Xn+1 − X n+1 )Xn+1−j ] = 0

for all j = 1, … , n + 1 , where X 0 =1 .

The equations specified in Theorem 3.4.1 can be used to calculate the coefficients ϕ , … , ϕ in Equation 3.4.1. It suffices to n0 nn

focus on mean zero processes (X : t ∈ Z) and thus to set ϕ = 0 as the following calculations show. Assume that E[X ] = μ for
t n0 t

all t ∈ Z . Then, Theorem 3.4.1 gives that E[X


^
] = E[ X ] = μ (using the equation with j = n + 1 . Consequently, it holds
n+1 n+1

that
n n

^
μ = E[ X n+1 ] = E [ ϕn0 + ∑ ϕnℓ Xn+1−ℓ ] = ϕn0 + ∑ ϕnℓ μ.

ℓ=1 ℓ=1

Using now that ϕ n0 = μ(1 − ϕn1 − … − ϕnn ) , Equation 3.4.1 can be rewritten as
^
Y n+1 = ϕn1 Yn + … + ϕnn Y1 ,

where Y^ n+1
^
= X n+1 − μ has mean zero.
With the ACVF γ of (X t: t ∈ Z) , the equations in Theorem 3.4.1 can be expressed as
n

∑ ϕnℓ γ(j − ℓ) = γ(j), j = 1, … , n. (3.4.2)

ℓ=1

Note that due to the convention ϕ = 0 , the last equation in Theorem 3.4.1 (for which j = n + 1 ) is omitted. More conveniently,
n0

this is restated in matrix notation. To this end, let Γ = (γ(j − ℓ)) , ϕ = (ϕ , … , ϕ ) and γ = (γ(1), … , γ(n)) ,
n j,ℓ=1,…,n n n1 nn
T
n
T

where denotes the transpose. With these notations, (3.4.2.) becomes


T

−1
Γn ϕn = γn ⟺ ϕn = Γn γn , (3.4.3)

provided that Γ is nonsingular.


n

The determination of the coefficients ϕ has thus been reduced to solving a linear equation system and depends only on second-
nℓ

order properties of (X : t ∈ Z) which are given by the ACVF γ.


t

Let X = (X , X , … , X ) . Then, X
n n n−1
^
=ϕ1
T
n+1
T
n Xn . To assess the quality of the prediction, one computes the mean-squared
error with the help of Equation 3.4.3 as follows:

3.4.1 https://stats.libretexts.org/@go/page/880
^ 2
Pn+1 = E [(Xn+1 − X n+1 ) ]

T 2
= E [(Xn+1 − ϕn Xn ) ]

T −1 2
= E [(Xn+1 − γn Γn Xn ) ]

2 T −1 T −1 T −1
= E [X − 2 γn Γn Xn Xn+1 + γn Γn Xn Xn Γn γn ]
n+1

T −1 T −1 −1
= γ(0) − 2 γn Γn γn + γn Γn Γn Γn γn

T −1
= γ(0) − γn Γn γn . (3.4.4)

As an initial example, we explain the prediction procedure for an autoregressive process of order 2.

Example 3.4.1: Prediction of an AR(2) Process

Let (X : t ∈ Z) be the causal AR(2) process X = ϕ X + ϕ X + Z . Suppose that only an observation of


t t 1 t−1 2 t−2 t X1 is
available to forecast the value of X . In this simplified case, the single prediction Equation 3.4.2 is
2

ϕ11 γ(0) = γ(1),

so that ϕ 11 = ρ(1) and X


^
1+1 = ρ(1)X1 .
In the next step, assume that observed values of X and X are at hand to forecast the value of X . Then, one similarly obtains
1 2 3

from (3.4.2.) that the predictor can be computed from


^ T −1 T
X 2+1 = ϕ21 X2 + ϕ22 X1 = ϕ X2 = (Γ γ2 ) X2
2 2

−1
γ(0) γ(1) X2
= (γ(1), γ(2))( ) ( ).
γ(1) γ(0) X1

However, applying the arguments leading to the definition of the PAC in Section 3.3.3., one finds that

E [{ X3 − (ϕ1 X2 + ϕ2 X1 )} X1 ] = E[ Z3 X1 ] = 0,

E [{ X3 − (ϕ1 X2 + ϕ2 X1 )} X2 ] = E[ Z3 X2 ] = 0.

Hence, X ^
= ϕ X +ϕ X
2+1 1 2 and even X
2 1
^
= ϕ X +ϕ X n+1 for all n ≥ 2 , exploiting the particular autoregressive
1 n 2 n−1

structure.
Since similar results can be proved for general causal AR(p) processes, the one-step predictors have the form
^
X n+1 = ϕ1 Xn + … + ϕp Xn−p+1

whenever the number of observed variables n is at least p.

The major drawback of this approach is immediately apparent from the previous example: For larger sample sizes n, the prediction
procedure requires the calculation of the inverse matrix Γ which is computationally expensive. In the remainder of this section,
−1
n

two recursive prediction methods are introduced that bypass the inversion altogether. They are known as Durbin-Levinson
algorithm and innovations algorithm. Finally, predictors based on the infinite past are introduced which are often easily applicable
for the class of causal and invertible ARMA processes.

Method 1: The Durbin-Levinson algorithm

If (X : t ∈ Z) is a zero mean weakly stationary process with ACVF γ such that γ(0) > 0 and γ(h) → 0 as h → ∞ , then the
t

coefficients ϕ in (3.4.2.) and the mean squared errors P in (3.4.4.) satisfy the recursions
nℓ n

γ(1)
ϕ11 = , P0 = γ(0),
γ(0)

and, for n ≥ 1 ,

3.4.2 https://stats.libretexts.org/@go/page/880
n−1
1
ϕnn = (γ(n) − ∑ ϕn−1,ℓ γ(n − ℓ)) ,
Pn−1
ℓ=1

ϕn1 ϕn−1,1 ϕn−1,n−1


⎛ ⎞ ⎛ ⎞ ⎛ ⎞

⎜ ⎟ =⎜ ⎟−ϕ ⎜ ⎟
⎜ ⋮ ⎟ ⎜ ⋮ ⎟ nn ⎜ ⋮ ⎟

⎝ ⎠ ⎝ ⎠ ⎝ ⎠
ϕn,n−1 ϕn−1,n−1 ϕn−1,1

and
2
Pn = Pn−1 (1 − ϕnn ).

It can be shown that under the assumptions made on the process (X : t ∈ Z) , it holds indeed that ϕ is equal to the value of the
t nn

PACF of (X : t ∈ Z) at lag n. The result is formulated as Corollary 5.2.1 in Brockwell and Davis (1991). This fact is highlighted in
t

an example.

The PACF of an AR(2) process


Let (X t: t ∈ Z) be a causal AR(2) process. Then, ρ(1) = ϕ 1 /(1 − ϕ2 ) and all other values can be computed recursively from

ρ(h) − ϕ1 ρ(h − 1) − ϕ2 ρ(h − 2) = 0, h ≥ 2.

Note that the ACVF γ satisfies a difference equation with the same coefficients, which is seen by multiplying the latter
equation with γ(0). Applying the Durbin-Levinson algorithm gives first that
γ(1)
2 2
ϕ11 = = ρ(1) and P1 = P0 (1 − ϕ ) = γ(0)(1 − ρ(1 ) ).
11
γ(0)

Ignoring the recursion for the error terms P in the following, the next ϕ
n nℓ values are obtained a
1 1 2
ϕ22 = [γ(2) − ϕ11 γ(1)] = [ρ(2) − ρ(1 ) ]
2
P1 1 − ρ(1)

2 −1 −1 2
ϕ (1 − ϕ2 ) + ϕ2 − [ ϕ1 (1 − ϕ2 ) ]
1
= = ϕ2 ,
1 − [ ϕ1 (1 − ϕ2 )−1 ]2

ϕ21 = ϕ11 − ϕ22 ϕ11 = ρ(1)(1 − ϕ2 ) = ϕ1 ,

1 1
ϕ33 = [γ(3) − ϕ21 γ(2) − ϕ22 γ(1)] = [γ(3) − ϕ1 γ(2) − ϕ2 γ(2)] = 0.
P2 P2

Now, referring to the remarks after Example 3.3.7., no further computations are necessary to determine the PACF because
ϕnn= 0 for all n > p = 2 .

Method 2: The innovations algorithm

In contrast to the Durbin-Levinson algorithm, this method can also be applied to nonstationary processes. It should thus, in
general, be preferred over Method 1. The innovations algorithm gets its name from the fact that one directly uses the form of
the prediction equations in Theorem 3.4.1. which are stated in terms of the innovations (X − X ^
) . Observe that the t+1 t+1 t∈Z

sequence consists of uncorrelated random variables.


The one-step predictors X
^
n+1 can be calculated from the recursions
^
X 0+1 = 0, P1 = γ(0)

and, for n ≥ 1 ,
n

^ ^
X n+1 = ∑ θnℓ (Xn+1−ℓ − X n+1−ℓ )

ℓ=1

3.4.3 https://stats.libretexts.org/@go/page/880
n−1

2
Pn+1 = γ(0) − ∑ θ Pℓ+1 ,
n,n−ℓ

ℓ=0

where the coefficients are obtained from the equations


ℓ−1
1
θn,n−ℓ = [γ(n − ℓ) − ∑ θℓ,ℓ−i θn,n−i Pi+1 ] , ℓ = 0, 1, … , n − 1.
Pℓ+1
i=0

As example we show how the innovations algorithm is applied to a moving average time series of order 1.

Example 3.4.3: Prediction of an MA(1) Process

Let (X t: t ∈ Z) be the MA(1) process X t = Zt + θZt−1 . Note that


2 2 2
γ(0) = (1 + θ )σ , γ(1) = θσ and γ(h) = 0 (h ≥ 2).

Using the innovations algorithm, one can compute the one-step predictor from the values
2
θσ
θn1 = , θnℓ = 0 (ℓ = 2, … , n − 1),
Pn

and
2 2
P1 = (1 + θ )σ ,

2 2
Pn+1 = (1 + θ − θθn1 )σ

as
2
θσ
^ ^
X n+1 = (Xn − X n ).
Pn

Method 3: Prediction based on the infinite past


Suppose that a causal and invertible ARMA(p,q) process is analyzed. Assume further that (unrealistically) the complete history
of the process can be stored and that thus all past variables (X : t ≤ n) can be accessed. Define then
t

~
X n+m = E[ Xn+m | Xn , Xn−1 , …],

as the m-step ahead predictor based on the infinite past. It can be shown that, for large sample sizes n, the difference between
~
the values of X^
and X
n+m vanishes at an exponential rate. Exploiting causality and invertibility of the ARMA process,
n+m
~
one can transform the predictor X so that it is in a computationally more feasible form. To do so, note that by causality
n+m

~
X n+m = E[ Xn+m | Xn , Xn−1 , …]


= E [ ∑ ψj Zn+m−j Xn , Xn−1 , …]

j=0

= ∑ ψj Zn+m−j (3.4.5)

j=m

because E[Z |X , X , …] equals zero if t>n and equals Z_t if t ≤ n (due to invertibility!). The representation in (3.4.5.)
t n n−1
~
can be used to compute the mean squared prediction error P . It follows from causality that
n+m

2
m−1 m−1
~ ~ ⎡ ⎤
2 2 2
P n+m = E[(Xn+m − X n+m ) ] = E ( ∑ ψj Zn+m−j ) =σ ∑ψ . (3.4.6)
j
⎣ ⎦
j=0 j=0

~
On the other hand, Equation 3.4.5 does not allow to directly calculate the forecasts because X n+m is given in terms of the
noise variables Z . Instead invertibility will be utilized. Observe first that
n+m−j

3.4.4 https://stats.libretexts.org/@go/page/880
~
X n+m−j , j < m.
E[ Xn+m−j | Xn , Xn−1 , …] = {

Xn+m−j , j ≥ m.

By invertibility (the ``0='' part follows again from causality),


0 = E[ Zn+m | Xn , Xn−1 , …] (3.4.7)


= E [ ∑ πj Xn+m−j Xn , Xn−1 , …] (3.4.8)

j=0

= ∑ πj E[ Xn+m−j | Xn , Xn−1 , …]. (3.4.9)

j=0

Combining the previous two statements, yields


m−1 ∞
~ ~
X n+m = − ∑ πj X n+m−j − ∑ πj Xn+m−j . (3.4.10)

j=1 j=m

The equations can now be solved recursively for m = 1, 2, … Note, however, that for any m ≥ 1 the sequence
~
(X −X
n+m+t : t ∈ Z) does not consist of uncorrelated random variables. In fact, if h ∈ N , it holds that
n+m+t 0

~ ~
E[(Xn+m − X n+m )(Xn+m+h − X n+m+h )] (3.4.11)

m−1 m+h−1

= E [ ∑ ψj Zn+m−j ∑ ψi Zn+m+h−i ] (3.4.12)

j=0 i=0

m−1

2
=σ ∑ ψj ψj+h . (3.4.13)

j=0

Finally, for practical purposes the given forecast needs to be truncated. This is accomplished by setting

∑ πj Xn+m−j = 0.

j=n+m

The resulting equations (see Equation 3.4.10 for comparison) yield recursively the truncated m-step predictors X ∗
n+m
:
m−1 n+m−1

∗ ∗
X = − ∑ πj X − ∑ πj Xn+m−j . (3.4.14)
n+m n+m−j

j=1 j=m

Contributers
Alexander Aue (Department of Statistics, University of California, Davis)
Integrated by Brett Nakano (statistics, UC Davis)

This page titled 3.4: Forecasting is shared under a not declared license and was authored, remixed, and/or curated by Alexander Aue.

3.4.5 https://stats.libretexts.org/@go/page/880
3.5: Parameter Estimation
Let (X t: t ∈ Z) be a causal and invertible ARMA(p,q)
process with known orders p and q, possibly with mean μ . This section is concerned with estimation procedures for the unknown
parameter vector
2 T
β = (μ, ϕ1 , … , ϕp , θ1 , … , θq , σ ) . (3.5.1)

To simplify the estimation procedure, it is assumed that the data has already been adjusted by subtraction of the mean and the
discussion is therefore restricted to zero mean ARMA models.
In the following, three estimation methods are introduced. The method of moments works best in case of pure AR processes, while
it does not lead to optimal estimation procedures for general ARMA processes. For the latter, more efficient estimators are
provided by the maximum likelihood and least squares methods which will be discussed subsequently.
Method 1 (Method of Moments) Since this method is only efficient in their case, the presentation here is restricted to AR(p)
processes

Xt = ϕ1 Xt−1 + … + ϕp Xt−p + Zt , t ∈ Z,

where (Z : t ∈ Z) ∼ WN(0, σ ) . The parameter vector


t
2
β consequently reduces to (ϕ, σ )
2 T
with ϕ = (ϕ1 , … , ϕp )
T
and can be
estimated using the Yule-Walker equations
2 T
Γp ϕ = γp and σ = γ(0) − ϕ γp ,

where Γ = (γ(k − j))


p and γ = (γ(1), … , γ(p)) . Observe that the equations are obtained by the same arguments
k,j=1,…,p p
T

applied to derive the Durbin-Levinson algorithm in the previous section. The method of moments suggests to replace every
quantity in the Yule-Walker equations with their estimated counterparts, which yields the Yule-Walker estimators
−1 −1
^
ϕ̂ = Γ γ
^ ^
=R ρ
^ (3.5.2)
p p p p

−1 −1
2 T T
σ
^ =γ
^(0) − γ
^ ^
Γ γ
^ =γ
^(0) [1 − ρ
^ ^
R ρ
^ ]. (3.5.3)
p p p p p p

Therein, R
^
p
^(0 )

−1 ^
Γp and ρ^ p
−1
^(0 )
=γ ^
γ p
with
^(h)
γ defined as in (1.2.1). Using γ^(h) as estimator for the ACVF at lag h , a dependence on the sample size n is obtained in an
implicit way. This dependence is suppressed in the notation used here. The following theorem contains the limit behavior of the
Yule-Walker estimators as n tends to infinity.
Theorem 3.5.1. If (X t: t ∈ Z) is a causal AR(p) process, then
D P
− 2 −1 2 2
√n (ϕ̂ − ϕ) ⟶ N (0, σ Γp ) and σ
^ ⟶ σ

as n → ∞ , where → indicates convergence in probability.


P

A proof of this result is given in Section 8.10 of Brockwell and Davis (1991). Since equations (3.5.2) and (3.5.3) have the same
structure as the corresponding equations (3.4.3) and (3.4.4), the Durbin-Levinson algorithm can be used to solve recursively for the
estimators ϕˆ
= (ϕ
ˆ
h
,…,ϕ
h1
ˆ
) . Moreover, since ϕ
hh
is equal to the value of the PACF of (X : t ∈ Z) at lag h, the estimator ϕ
hh
ˆ
t hh

can be used as its proxy. Since it is already known that, in the case of AR(p) processes, ϕ = 0 if h>p, Theorem (3.5.1) implies hh

immediately the following corollary.


Corollary 3.5.1 If (X t: t ∈ Z) is a causal AR(p) process, then
D

√n ϕ̂ hh ⟶ Z (n → ∞)

for all h>p, where Z stands for a standard normal random variable.
Example 3.5.1. (Yule-Walker estimates for AR(2) processes). Suppose that n = 144 values of the autoregressive process
X = 1.5 X
t − .75 X
t−1 +Z have been observed, where (Z : t ∈ Z) is a sequence of independent standard normal variates.
t−2 t t

3.5.1 https://stats.libretexts.org/@go/page/936
Assume further that γ^(0) = 8.434, ρ^(1) = 0.834 and ^(2) = 0.476
ρ have been calculated from the data. The Yule-Walker
estimators for the parameters are then given by
−1
ˆ
ϕ1 1.000 0.834 0.834 1.439
ˆ
ϕ =( ) =( ) ( ) =( )

ϕ̂ 0.834 1.000 0.476 −0.725


2

and

1.439
2
σ
^ = 8.434 [1 − (0.834, 0.476) ( )] = 1.215.
−0.725

To construct asymptotic confidence intervals using Theorem 3.5.1, the unknown limiting covariance matrix σ Γp
2 −1
needs to be
estimated. This can be done using the estimator
−1 −1
2
σ
^ Γ ^ 2
p 1 1.215 1.000 0.834 0.057 −0.003
= ( ) =( ).
n 144 8.434 2
0.834 1.000 −0.003 0.057

Then, the 1 − α level confidence interval for the parameters ϕ and ϕ are computed as 1 2

1.439 ± 0.057 z1−α/2 and − 0.725 ± 0.057 z1−α/2 ,

respectively, where z 1−α/2 is the corresponding normal quantile.


Example 3.5.2 (Recruitment Series).
Let us reconsider the recruitment series of Example 3.3.5. There, an AR(2) model was first established as appropriate for the data
and the model parameters were then estimated using an ordinary least squares approach. Here, the coefficients will instead be
estimated with the Yule-Walker procedure. The R command is
> rec.yw = ar.yw(rec, order=2)}
The mean estimate can be obtained from rec.yw$x.mean as μ ^ = 62.26, while the autoregressive parameter estimates and their

standard errors are accessed with the commands rec.yw$ar and sqrt(rec.yw$asy.var.coef as ϕ^ = 1.3316(.0422) 1
2
and ϕ^ = −.4445(.0422). Finally, the variance estimate is obtained from rec.yw$var.pred as σ
2
^ = 94.7991. All values are

close to their counterparts in Example 3.3.5.


Example 3.5.3. Consider the invertible MA(1) process Xt = Zt + θZt−1 , where |θ| < 1 . Using invertibility, each Xt has an
infinite autoregressive representation

j
Xt = ∑(−θ) Xt−j + Zt

j=1

that is nonlinear in the unknown parameter θ to be estimated. The method of moments is here based on solving

γ
^(1) ^
θ
^(1) =
ρ = .
2
^(0)
γ ^
1+θ

for θ^ . The foregoing quadratic equation has the two solutions


−−−−−−− −
^(1 )2
1 ± √ 1 − 4ρ
^
θ = ,

^(1)

of which we pick the invertible one. Note moreover, that |ρ^(1)| is not necessarily less or equal to 1/2 which is required for the
existence of real solutions. (The theoretical value |ρ(1)|, however, is always less than 1/2 for any MA(1) process, as an easy
computation shows). Hence, θ can not always be estimated from given data samples.
Method 2 (Maximum Likelihood Estimation) The innovations algorithm of the previous section applied to a causal ARMA(p,q)
process (X t: t ∈ Z) gives

3.5.2 https://stats.libretexts.org/@go/page/936
i

^ ^
X i+1 = ∑ θij (Xi+1−j − X i+1−j ), 1 ≤ i < max{p, q},

j=1

p q

^ ^
X i+1 = ∑ ϕj Xi+1−j + ∑ θij (Xi+1−j − X i+1−j ), i ≥ max{p, q},

j=1 j=1

with prediction error


2
Pi+1 = σ Ri+1 .

In the last expression, σ has been factored out due to reasons that will become apparent from the form of the likelihood function to
2

be discussed below. Recall that the sequence (X − X ^


: i ∈ Z) consists of uncorrelated random variables if the parameters are
i+1 i+1

known. Assuming normality for the errors, we moreover obtain even independence. This can be exploited to define the Gaussian
maximum likelihood estimation(MLE) procedure. Throughout, it is assumed that (X : t ∈ Z) has zero mean (μ = 0 ). The t

parameters of interest are collected in the vectors β = (ϕ, θ, σ ) and β = (ϕ, θ) , where ϕ = (ϕ , … , ϕ ) 2 T
and ′ T
1 p
T

θ = (θ , … , θ ) . Assume finally that we have observed the variables X , … , X . Then, the Gaussian likelihood function for the
T
1 q 1 n

innovations is
n n ^ 2
1 1/2 1 (Xj − X j )
L(β) = (∏ R ) exp(− ∑ ). (3.5.4)
i 2
(2πσ 2 )n/2 2σ Rj
i=1 j=1

Taking the partial derivative of ln L(β) with respect to the variable σ reveals that the MLE for σ can be 2 2

calculated from
^ ^ n ^ 2
S(ϕ , θ ) (Xj − X j )
2 ^ ^
σ
^ = , S(ϕ , θ ) = ∑ .
n Rj
j=1

Therein, ϕ^ and θ^ denote the MLEs of ϕ and θ obtained from minimizing the profile likelihood or reduced likelihood
n
S(ϕ, θ) 1
ℓ(ϕ, θ) = ln( )+ ∑ ln(Rj ).
n n
j=1

Observe that the profile likelihood ℓ(ϕ, θ) can be computed using the innovations algorithm. The speed of these computations
depends heavily on the quality of initial estimates. These are often provided by the non-optimal Yule-Walker procedure. For
numerical methods, such as the Newton-Raphson and scoring algorithms, see Section 3.6 in Shumway and Stoffer (2006).
The limit distribution of the MLE procedure is given as the following theorem. Its proof can be found in Section 8.8 of Brockwell
and Davis (1991).
Theorem 3.5.2. Let (X t: t ∈ Z) be a causal and invertible ARMA(p,q) process defined with an iid sequence
(Zt : t ∈ Z)satisf yingE[ Zt ] = 0 and

E[ Z
t
2
] =σ
2
. Consider the MLE β^ of β that is initialized with the moment estimators of

Method 1. Then,

D
− ^ ′ 2 −1
√n (β − β ) ⟶ N (0, σ Γp,q ) (n → ∞).

The result is optimal. The covariance matrix Γp,q is in block form and can be evaluated in terms of covariances of various
autoregressive processes.
Example 3.5.4 (Recruitment Series). The MLE estimation procedure for the recruitment series can be applied in R as follows:
>rec.mle = ar.mle(rec, order=2)
The mean estimate can be obtained from rec.mle$x.mean as μ
^ = 62.26, while the autoregressive parameter estimates and

their standard errors are accessed with the commands rec.mle$ar and sqrt(rec.mle$asy.var.coef) as

3.5.3 https://stats.libretexts.org/@go/page/936
^
ϕ 1 = 1.3513(.0410) and ϕ^ = −.4099(.0410). Finally, the variance estimate is obtained from rec.yw$var.pred as
2

= 89.3360. All values are very close to their counterparts in Example 3.3.5.
2
σ
^

Method 3 (Least Squares Estimation) An alternative to the method of moments and the MLE is provided by the least squares
estimation (LSE). For causal and invertible ARMA(p,q) processes, it is based on minimizing the weighted sum of squares
^ 2
n ( Xj −X j )
S(ϕ, θ) = ∑j=1 (3.5.5)
Rj

~ ~
with respect to ϕ and θ , respectively. Assuming that ϕ and θ denote these LSEs, the LSE for σ is computed as
2

~ ~
S(ϕ , θ )
~2
σ = .
n−p −q

The least squares procedure has the same asymptotics as the MLE.
′ ~′
Theorem 3.5.3. The result of Theorem 3.5.2. holds also if β^ is replaced with β .
Example 3.5.5 (Recruitment Series). The least squares estimation has already been discussed in Example 3.3.5, including the R
commands.

Contributers
Alexander Aue (Department of Statistics, University of California, Davis)
Integrated by Brett Nakano (statistics, UC Davis)

This page titled 3.5: Parameter Estimation is shared under a not declared license and was authored, remixed, and/or curated by Alexander Aue.

3.5.4 https://stats.libretexts.org/@go/page/936
3.6: Model Selection
In this section, a rough guide for going about the data analysis will be provided. It consists of several parts, most of which have
been discussed previously. The main focus is on the selection of p and q in the likely case that these parameters are unknown.
Step 1. Plot the data and check whether or not the variability remains reasonably stable throughout the observation period. If that is
not the case, use preliminary transformations to stabilize the variance. One popular class is given by the Box-Cox
transformations (Box and Cox, 1964)
−1 λ
λ (U − 1), Ut ≥ 0, λ > 0.
t
fλ (Ut ) = {

ln Ut Ut > 0, λ = 0.

In practice f or f
0 1/2 are often adequate choices. (Recall, for instance, the Australian wine sales data of Example 1.4.1.)

Step 2. Remove, if present, trend and seasonal components from the data. Chapter 1 introduced a number of tools to do so, based
on
the classical decomposition of a time series

Yt = mt + st + Xt

into a trend, a seasonality and a residual component. Note that differencing works also without the specific representation in the last
display. If the data appears stationary, move on to the next step. Else apply, for example, another set of difference operations.
Step 3. Suppose now that Steps 1 and 2 have provided us with observations that are well described by a stationary sequence
(X : t ∈ Z) . The goal is then to find the most appropriate ARMA(p, q) model to describe the process. In the unlikely case that p
t

and q can be assumed known, utilize the estimation procedures of Section 3.5 directly. Otherwise, choose them according to one of
the following criteria.
(a) The standard criterion that is typically implemented in software packages is a modification of Akaike's information criterion,
see Akaike (1969), which was given by Hurvich and Tsai (1989). In this paper it is suggested that the ARMA model parameters be
chosen to minimize the objective function
2(p + q + 1)n
AICC (ϕ, θ, p, q) = −2 ln L(ϕ, θ, S(ϕ, θ)/n) + . (3.6.1)
n−p −q −2

Here, L(ϕ, θ, σ ) denotes the Gaussian likelihood defined in (3.5.4) and S(ϕ, θ) is the weighted sum of squares in (3.5.5). It can be
2

seen from the definition that the AIC does not attempt to minimize the log-likelihood function directly. The introduction of the
C

penalty term on the right-hand side of (3.6.1) reduces the risk of overfitting.
(b) For pure autoregressive processes, Akaike (1969) introduced criterion that is based on a minimization of the final prediction
error. Here, the order p is chosen as the minimizer of the objective function
2 n+p
^
FPE = σ ,
n−p

where σ^ denotes the MLE of the unknown noise variance σ . For more on this topic and other procedures that help fit a model, we
2 2

refer here to Section 9.3 of Brockwell and Davis (1991).


Step 4. The last step in the analysis is concerned with diagnostic checking by applying the goodness of fit tests of Section 1.5.

This page titled 3.6: Model Selection is shared under a not declared license and was authored, remixed, and/or curated by Alexander Aue.

3.6.1 https://stats.libretexts.org/@go/page/929
3.7: Summary
The class of autoregressive moving average processes has been introduced to model stationary stochastic processes. Theoretical
properties such as causality and invertibility have been examined, which depend on the zeroes of the autoregressive and moving
average polynomials, respectively.
It has been shown how the causal representation of an ARMA process can be utilized to compute its covariance function which
contains all information about the dependence structure. Assuming known parameter values, several forecasting procedures have
been discussed. The Durbin- Levinson algorithm works well for pure AR processes, while the innovations algorithm is particularly
useful for pure MA processes. Predictions using an infinite past work well for causal and invertible ARMA processes. For practical
purposes, however, a truncated version is more relevant.
Since the exact parameter values are in general unknown, several estimation procedures were introduced. The Yule-Walker
procedure is only optimal in the AR case but provides useful initial estimates that can be used for the numerical derivation of
maximum likelihood or least squares estimates.
Finally, a framework has been provided that may potentially be useful when facing the problem of analyzing a data set in practice.

This page titled 3.7: Summary is shared under a not declared license and was authored, remixed, and/or curated by Alexander Aue.

3.7.1 https://stats.libretexts.org/@go/page/888
CHAPTER OVERVIEW

4: Spectral Analysis

This page is a draft and is under active development.

In this chapter, a general method is discussed to deal with the periodic components of a time series.
4.1: Introduction to Spectral Analysis
4.2: The Spectral Density and the Periodogram
4.3: Large Sample Properties
4.4: Linear Filtering
4.5: Summary

This page titled 4: Spectral Analysis is shared under a not declared license and was authored, remixed, and/or curated by Alexander Aue.

1
4.1: Introduction to Spectral Analysis
Many of the time series discussed in the previous chapters displayed strong periodic components: The sunspot numbers of Example
1.1.1, the number of trapped lynx of Example 1.1.2 and the Australian wine sales data of Example 1.4.1. Often, there is an obvious
choice for the period d of this cyclical part such as an annual pattern in the wine sales. Given d , one could then proceed by
removing the seasonal effects as in Section 1.4. In the first two examples it is, however, somewhat harder to determine the precise
value of d . In this chapter, a general method is therefore discussed to deal with the periodic components of a time series. To
complicate matters, it is usually the case that several cyclical patterns are simultaneously present in a time series. As an example
recall the southern oscillation index (SOI) data which exhibits both an annual pattern and a so-called El Nin~
o pattern.
The sine and cosine functions are the prototypes of periodic functions. They are going to be utilized here to describe cyclical
behavior in time series. Before doing so, a cycle is defined to be one complete period of a sine or cosine function over a time
interval of length 2π. Define also the frequency
1
ω =
d

as the number of cycles per observation, where d denotes the period of a time series (that is, the number of observations in a cycle).
For monthly observations with an annual period, d = 12 and hence ω = 1/12 = 0.083 cycles per observation. Now reconsider the
process

Xt = R sin(2πωt + φ)

as introduced in Example 1.2.2, using the convention λ = 2πω. To include randomness in this process, choose the amplitude R and
the phase φ to be random variables. An equivalent representation of this process is given by

Xt = A cos(2πωt) + B sin(2πωt),

with A = R sin(φ) and B = R cos(φ) usually being independent standard normal variates. Then, R = A + B is a χ-squared 2 2 2

random variable with 2 degrees of freedom and φ = tan (B/A) is uniformly distributed on (−π, π]. Moreover, R and φ are
−1

independent. Choosing now the value of ω one particular periodicity can be described. To accommodate more than one, it seems
natural to consider mixtures of these periodic series with multiple frequencies and amplitudes:
m

Xt = ∑ [ Aj cos(2π ωj t) + Bj sin(2π ωj t)], t ∈ Z,

j=1

where A , … , A and B , … , B are independent random variables with zero mean and variances σ , … , σ , and ω
1 m 1 m
2
1
2
m 1, … , ωm

are distinct frequencies. It can be shown that (X : t ∈ Z) is a weakly stationary process with lag-h ACVF
t

2
γ(h) = ∑ σ cos(2π ωj h), h ∈ Z.
j

j=1

The latter result yields in particular that γ(0) = σ


2
1
+ … + σm
2
. The variance of Xt is consequently the sum of the component
variances.
Example 4.1.1. Let m =2 and choose A1 = B1 = 1 , A2 = B2 = 4 to be constant as well as ω1 = 1/12 and ω2 = 1/6 . This
means that
(1) (2)
Xt = X +X = [ cos(2πt/12) + sin(2πt/12)] + [4 cos(2πt/6) + 4 sin(2πt/6)]
t t

is the sum of two periodic components of which one exhibits an annual cycle and the other a cycle of six months. For all processes
involved, realizations of n = 48 observations (4 years of data) are displayed in Figure 4.1. Also shown is a fourth time series plot
~
which contains the X distorted by standard normal independent noise, X . The corresponding R code is:
t t

>t=1:48
>x1=cos(2*pi*t/12)+sin(2*pi*t/12)
>x2=4*cos(2*pi*t/6)+4*sin(2*pi*t/6)

4.1.1 https://stats.libretexts.org/@go/page/847
>x=x1+x2
>tildex=x+rnorm(48)
(1) (1) –
Note that the squared amplitude of X t
is 1 + 1 2 2
=2 . The maximum and minimum values of X
t
are therefore ±√2 .
−−
Similarly, we obtain ±√32 for the second component.
For a statistician it is now important to develop tools to recover the periodicities from the data. The branch of statistics concerned
with this problem is called spectral analyis. The standard method in this area is based on the periodogram which is introduced now.
Suppose for the moment that the frequency parameter ω = 1/12 in Example 4.1.1 is known. To obtain estimates of A and B ,
1 1 1

one could try to run a regression using the explanatory variables Y = cos(2πt/12) or Y = sin(2πt/12) to compute the least
t,1 t,2

squares estimators
n n
∑ Xt Yt,1 2
^ t=1
A1 = = ∑ Xt cos(2πt/12),
n 2
∑t=1 Y n
t,1 t=1

n n
∑t=1 Xt Yt,2 2
^
B1 = = ∑ Xt sin(2πt/12).
n 2
∑ Y n
t=1 t,2 t=1

~
Figure 4.1: Time series plots of (Xt(1)), (Xt(2)), (Xt) and X t

4.1.2 https://stats.libretexts.org/@go/page/847
Since, in general, the frequencies involved will not be known to the statistician prior to the data analysis, the foregoing suggests to
pick a number of potential \(\omega's, say j/n for j = 1, … , n/2 and to run a long regression of the form
n/2

Xt = ∑ [ Aj cos(2πjt/n) + Bj sin(2πjt/n)]. (4.1.1)

j=0

This leads to least squares estimates A^


jand B
^
of which the "significant'' ones should be selected. Note that the regression in 4.1.1
j

is a perfect one because there are as many unknowns as variables! Note also that
2 2
^ ^
P (j/n) = Aj + Bj

is essentially (up to a normalization) an estimator for the correlation between the time series X and the corresponding sum of the
t

periodic cosine and sine functions at frequency j/n. The collection of all P (j/n) , j = 1, … , n/2 , is called the scaled
periodogram. It can be computed quickly via an algorithm known as the fast Fourier transform (FFT) which in turn is based on the
discrete Fourier transform (DFT)
n
1
d(j/n) = ∑ Xt exp(−2πijt/n).

√n
t=1

The frequencies j/n are called the Fourier or fundamental frequencies. Since exp(−ix) = cos(x) − i sin(x) and
2
|z| = zz̄ = (a + ib)(a − ib) = a
2
+b
2
for any complex number z = a + ib , it follows that
2 2
n n
2
1 1
I (j/n) = |d(j/n)| = ( ∑ Xt cos(2πjt/n)) + ( ∑ Xt sin(2πjt/n)) .
n n
t=1 t=1

The quantity I (j/n) is referred to as the periodogram. It follows immediately that the periodogram and the scaled periodogram are
related via the identity 4I (j/n) = nP (j/n) .

Example 4.1.2. Using the expressions and notations of Example 4.1.1, the periodogram and the scaled periodogram are computed
in R as follows:

>t=1:48
>l=abs(fft(x)/sqrt(48))^ 2
>P=4*I/48
>f=0:24/48
>plot(f, P[1:25], type="l")
>abline(v=1/12)
>abline(v=1/6)
~
The corresponding (scaled) periodogram for (X ) can be obtained in a similar fashion. The scaled periodograms are shown in the
t
~
left and middle panel of Figure 4.2. The right panel displays the scaled periodogram of another version of (X ) in which the t

standard normal noise has been replaced with normal noise with variance 9. From these plots it can be seen that the six months
periodicity is clearly visible in the graphs (see the dashed vertical lines at x=1/6. The less pronounced annual cycle (vertical line at
x=1/12 is still visible in the first two scaled periodograms but is lost if the noise variance is increased as in the right plot. Note,
however, that the y-scale is different for all three plots.

4.1.3 https://stats.libretexts.org/@go/page/847
~ (1) ~ (2)
Figure 4.2: The scaled periodograms of (X ), (X
t t
,
) (X
t
)

In the ideal situation that we observe the periodic component without additional contamination by noise, we can furthermore see
why the periodogram may be useful in uncovering the variance decomposition from above. We have shown in the lines preceding
(1) (2)
Example 4.1.1 that the squared amplitudes of X and Xt
are 2 and 32, respectively. These values are readily read from the
t

scaled periodogram in the left panel of Figure 4.2. The contamination with noise alters these values.
In the next section, it is established that the time domain approach (based on properties of the ACVF, that is, regression on past
values of the time series) and the frequency domain approach (using a periodic function approach via fundamental frequencies, that
is, regression on sine and cosine functions) are equivalent. Some details are given on the spectral density (the population
counterpart of the periodogram) and on properties of the periodogram itself.

Contributors and Attributions


Alexander Aue (Department of Statistics, University of California, Davis)
Integrated by Brett Nakano (statistics, UC Davis)

This page titled 4.1: Introduction to Spectral Analysis is shared under a not declared license and was authored, remixed, and/or curated by
Alexander Aue.

4.1.4 https://stats.libretexts.org/@go/page/847
4.2: The Spectral Density and the Periodogram
The fundamental technical result which is at the core of spectral analysis states that any (weakly) stationary time series can be
viewed (approximately) as a random superposition of sine and cosine functions varying at various frequencies. In other words, the
regression in (4.1.1) is approximately true for all weakly stationary time series. In Chapters 1-3, it is shown how the characteristics
of a stationary stochastic process can be described in terms of its ACVF γ(h). The first goal in this section is to introduce the
quantity corresponding to γ(h) in the frequency domain.
Definition 4.2.1 (Spectral Density)
If the ACVF γ(h) of a stationary time series (Xt)t\in\mathbb{Z} \nonumber \] satisfies the condition

∑ |γ(h)| < ∞,

h=−∞

then there exists a function f defined on (-1/2,1/2] such that


1/2

γ(h) = ∫ exp(2πiωh)f (ω)dω, h ∈ Z,


−1/2

and

f (ω) = ∑ γ(h) exp(−2πiωh), ω ∈ (−1/2, 1/2].

h=−∞

The function f is called the spectral density of the process X t: t ∈ Z) .


Definition 4.2.1 (which contains a theorem part as well) establishes that each weakly stationary process can be equivalently
described in terms of its ACVF or its spectral density. It also provides the formulas to compute one from the other. Time series
analysis can consequently be performed either in the time domain (using γ(h)) or in the frequency domain (using f(ω)) . Which
approach is the more suitable one cannot be decided in a general fashion but has to be reevaluated for every application of interest.
In the following, several basic properties of the spectral density are collected and evaluated f for several important examples. That
the spectral density is analogous to a probability density function is established in the next proposition.
Proposition 4.2.1
If f(ω) is the spectral density of a weakly stationary process (X t: t ∈ Z) , then the following statements hold:
a. f(ω) ≥ 0 for all ω. This follows from the positive definiteness of γ(h)
b. f(ω)=f(-ω) and f(ω + 1 )=f(ω)
c. The variance of (X : t ∈ Z) is given by
t

1/2

γ(0) = ∫ f (ω)dω.
−1/2

Part (c) of the proposition states that the variance of a weakly stationary process is equal to the integrated spectral density over all
frequencies. This property is revisited below, when a spectral analysis of variance (spectral ANOVA) will be discussed. In the
following three examples are presented.
Example 4.2.1 (White Noise)
If (Z : t ∈ Z) ∼ WN(0, σ ) , then its ACVF is nonzero only for h=0, in which case
t
2
γZ (h) = σ
2
. Plugging this result into the
defining equation in Definition4.2.1 yields that
2
fZ (ω) = γZ (0) exp(−2πiω0) = σ .

The spectral density of a white noise sequence is therefore constant for all ω ∈ (−1/2, 1/2], which means that every frequency ω

contributes equally to the overall spectrum. This explains the term ``white'' noise (in analogy to ``white'' light).
Example 4.2.2 (Moving Average)
Let (Z t: t ∈ Z) ∼ WN(0, σ )
2
and define the time series (X t: t ∈ Z) by

4.2.1 https://stats.libretexts.org/@go/page/844
1
Xt = (Zt + Zt−1 ) , t ∈ Z.
2

It can be shown that


2
σ
γX (h) = (2 − |h|) , h = 0, ±1
4

Figure 4.3: Time series plot of white noise (Z t: t ∈ Z) (left), two-point moving average (X t: t ∈ Z) (middle) and spectral density
of (X : t ∈ Z) (right).
t

and that γ_X=0 otherwise. Therefore,


1

fX (ω) = ∑ γX (h) exp(2πiωh)

h=−1

2
σ
= (exp(−2πiω(−1))) + 2 exp(−2πiω0) + exp(−2πiω1)
4

2
σ
= (1 + cos(2πω))
2

using that exp(ix) = cos(x) + i sin(x) , cos(x) = cos(−x) and sin(x) = − sin(−x) . It can be seen from the two time series plots
in Figure 4.3 that the application of the two-point moving average to the white noise sequence smoothes the sample path. This is
due to an attenuation of the higher frequencies which is visible in the form of the spectral density in the right panel of Figure 4.3.
All plots have been obtained using Gaussian white noise with σ = 1 . 2

Example 4.2.3 (AR(2) Process).


Let (X t: t ∈ Z) be an AR(2) process which can be written in the form

Zt = Xt − ϕ1 Xt−1 − ϕ2 Xt−2 , t ∈ Z

In this representation, it can be seen that the ACVF γ of the white noise sequence can be obtained as
Z

γZ (h) = E[(Xt − ϕ1 Xt−1 − ϕ2 Xt−2 )(Xt+h − ϕ1 Xt+h−1 − ϕ2 Xt+h−2 )]

2 2
= (1 + ϕ + ϕ )γX (h) + (ϕ1 ϕ2 − ϕ1 )[ γX (h + 1) + γX (h − 1)]
1 2

−ϕ2 [ γX (h + 2) + γX (h − 2)]

4.2.2 https://stats.libretexts.org/@go/page/844
Now it is known from Definition 4.2.1 that
1/2

γX (h) = ∫ exp(2πiωh)fX (ω)dω


−1/2

and
1/2

γZ (h) = ∫ exp(2πiωh)fZ (ω)dω,


−1/2

Figure 4.4: Time series plot and spectral density of the AR(2) process in Example 4.2.3.
where fX (ω) and f
Z (ω) denote the respective spectral densities. Consequently,
1/2

γZ (h) = ∫ exp(2πiωh)fZ (ω)dω


−1/2

2 2
= (1 + ϕ + ϕ )γX (h) + (ϕ1 ϕ2 − ϕ1 )[ γX (h + 1) + γX (h − 1)] − ϕ2 [ γX (h + 2) + γX (h − 2)]
1 2

1/2

2 2
=∫ [(1 + ϕ + ϕ ) + (ϕ1 ϕ2 − ϕ1 )(exp(2πiω) + exp(−2πiω))
1 2
−1/2

−ϕ2 (exp(4πiω) + exp(−4πiω))] exp(2πiωh)fX (ω)dω

1/2

2 2
=∫ [(1 + ϕ + ϕ ) + 2(ϕ1 ϕ2 − ϕ1 ) cos(2πω) − 2 ϕ2 cos(4πω)] exp(2πiωh)fX (ω)dω.
1 2
−1/2

The foregoing implies together with f Z (ω) =σ


2
that
2 2 2
σ = [(1 + ϕ + ϕ ) + 2(ϕ1 ϕ2 − ϕ1 ) cos(2πω) − 2 ϕ2 cos(4πω)] fX (ω).
1 2

Hence, the spectral density of an AR(2) process has the form


2 −1
2 2
fX (ω) = σ [(1 + ϕ + ϕ ) + 2(ϕ1 ϕ2 − ϕ1 ) cos(2πω) − 2 ϕ2 cos(4πω)] .
1 2

Figure 4.4 displays the time series plot of an AR(2) process with parameters ϕ = 1.35, ϕ = −.41 and σ = 89.34. These values
1 2
2

are very similar to the ones obtained for the recruitment series in Section 3.5. The same figure also shows the corresponding

4.2.3 https://stats.libretexts.org/@go/page/844
spectral density using the formula just derived.
With the contents of this Section, it has so far been established that the spectral density $f(\omega)$ is a population quantity
describing the impact of the various periodic components. Next, it is verified that the periodogram $I(\omega_j)$ introduced in
Section ??? is the sample counterpart of the spectral density.
Proposition 4.2.2.
2
Let ω = j/n denote the Fourier frequencies. If
j I (ωj ) = |d(ωj )| is the periodogram based on observations X1 , … , Xn of a
weakly stationary process (X : t ∈ Z) , then
t

n−1

I (ωj ) = ∑ ^ (h) exp(−2πi ωj h),


γ j ≠ 0.
n

h=−n+1

2
If j = 0 , then I (ω 0)
¯
= I (0) = nX n .
Proof. Let first j ≠ 0 . Using that ∑ n

t=1
exp(−2πi ωj t) = 0 , it follows that
n n
1
I (ωj ) = ∑ ∑(Xt − X̄ n )(Xs − X̄ n ) exp(−2πi ωj (t − s))
n
t=1 s=1

n−1 n−|h|
1
= ∑ ∑ (Xt+|h| − X̄ n )(Xt − X̄ n ) exp(−2πi ωj h)
n
h=−n+1 t=1

n−1

= ∑ γ
^ (h) exp(−2πi ωj h),
n

h=−n+1

2
which proves the first claim of the proposition. If j=0 , the relations cos(0) = 1 and sin(0) = 0 imply that ¯
I (0) = nX n . This
completes the proof.
More can be said about the periodogram. In fact, one can interpret spectral analysis as a spectral analysis of variance (ANOVA). To
see this, let first
n
1
dc (ωj ) = Re(d(ωj )) = − ∑ Xt cos(2π ωj t),
√n
t=1

n
1
ds (ωj ) = Im(d(ωj )) = − ∑ Xt sin(2π ωj t).
√n
t=1

Then, I (ω j)
2 2
= dc (ωj ) + ds (ωj ) . Let us now go back to the introductory example and study the process
m

Xt = A0 + ∑ [ Aj cos(2π ωj t) + Bj sin(2π ωj t)],

j=1

where m = (n − 1)/2 and n odd. Suppose X 1, … , Xn have been observed. Then, using regression techniques as before, it can be
seen that A = X̄ and
0 n

n
2 2
Aj = ∑ Xt cos(2π ωj t) = − dc (ωj ),
n √n
t=1

n
2 2
Bj = ∑ Xt sin(2π ωj t) = − ds (ωj ).
n √n
t=1

Therefore,
n m m

¯ 2 2 2
∑(Xt − X n ) = 2 ∑ [ dc (ωj ) + ds (ωj )] = 2 ∑ I (ωj )

t=1 j=1 j=1

4.2.4 https://stats.libretexts.org/@go/page/844
and the following ANOVA table is obtained. If the underlying stochastic process exhibits a strong periodic pattern at a certain
frequency, then the periodogram will most likely pick these up.

Example 4.2.4
Consider the n = 5 data points X = 2 , X = 4 , X = 6 , X = 4 and X = 2 , which display a cyclical but nonsinusoidal
1 2 3 4 5

pattern. This suggests that ω = 1/5 is significant and ω = 2/5 is not. In R, the spectral ANOVA can be produced as follows.
>x = c(2,4,6,4,2), t=1:5
>cos1 = cos(2*pi*t*1/5)
>sin1 = sin(2*pi*t*1/5)
>cos2 = cos(2*pi*t*2/5)
>sin2 = sin(2*pi*t*2/5)
This generates the data and the independent cosine and sine variables. Now run a regression and check the ANOVA output.
>reg = lm(x\~{}cos1+sin1+cos2+sin2)
>anova(reg)
This leads to the following output.
Response: x
Df Sum Sq Mean Sq F value Pr(>F)
cos1 1 7.1777 7.1777
cos2 1 0.0223 0.0223
sin1 1 3.7889 3.7889
sin2 1 0.2111 0.2111
Residuals 0 0.0000
According to previous reasoning (check the last table!), the periodogram at frequency ω = 1/5 is given as the sum of the cos1
1

and sin1 coefficients, that is, I (1/5) = (d (1/5) + d (1/5))/2 = (7.1777 + 3.7889)/2 = 5.4833.
c s Similarly,
I (2/5) = (dc (2/5) + ds (2/5))/2 = (0.0223 + 0.2111)/2 = 0.1167.

Note, however, that the mean squared error is computed differently in R. We can compare these values with the periodogram:
> abs(fft(x))ˆ 2/5
[1] 64.8000000 5.4832816 0.1167184 0.1167184 5.4832816
2
The first value here is I (0) = nX
¯
n
= 5 ∗ (18/5 ) = 64.8 . The second and third value are I (1/5) and I (2/5), respectively, while
2

I (3/5) = I (2/5) and I (4/5) = I (1/5) complete the list.

In the next section, some large sample properties of the periodogram are discussed to get a better understanding of spectral
analysis. \

Contributers
Alexander Aue (Department of Statistics, University of California, Davis)
Integrated by Brett Nakano (statistics, UC Davis)

4.2.5 https://stats.libretexts.org/@go/page/844
This page titled 4.2: The Spectral Density and the Periodogram is shared under a not declared license and was authored, remixed, and/or curated
by Alexander Aue.

4.2.6 https://stats.libretexts.org/@go/page/844
4.3: Large Sample Properties
Let (X : t ∈ Z) be a weakly stationary time series with mean
t μ , absolutely summable ACVF γ(h) and spectral density f (ω) .
Proceeding as in the proof of Proposition4.2.2., one obtains
n−1 n−|h|
1
I (ωj ) = ∑ ∑ (Xt+|h| − μ)(Xt − μ) exp(−2πi ωj h), (4.3.1)
n
h=−n+1 t=1

provided ω j ≠0 . Using this representation, the limiting behavior of the periodogram can be established.
Proposition 4.3.1
Let I (⋅) be the periodogram based on observations X 1, … , Xn of a weakly stationary process (X t: t ∈ Z) , then, for any ω ≠ 0 ,

E[I (ωj:n )] → f (ω) (n → ∞),

where ω j:n = jn /n with (j n )n∈N chosen such that ω j:n → ω as n → ∞ . If ω = 0 , then


2
E[I (0)] − nμ → f (0) (n → ∞).

Proof. There are two limits involved in the computations of the periodogram mean. First, take the limit as n → ∞ . This, however,
requires secondly that for each n we have to work with a different set of Fourier frequencies. To adjust for this, we have introduced
the notation ω . If ω ≠ 0 is a Fourier frequency (n fixed!), then
j:n j

n−1
n − |h|
E[I (ωj )] = ∑ ( ) γ(h) exp(−2πi ωj h).
n
h=−n+1

Therefore (n → ∞ !),

E[I (ωj:n )] → ∑ γ(h) exp(−2πiωh) = f (ω),

h=−∞

2
thus proving the first claim. The second follows from I (0) = nX ¯
(see Proposition 4.2.2.),
n
so that
2
E[I (0)] − nμ
2
= n(E[ X̄
n
2
] − μ ) = nVar(X̄ n ) → f (0) as n → ∞ as in Chapter 2. The proof is complete.
Proposition 4.3.1. shows that the periodogram I (ω) is asymptotically unbiased for f (ω). It is, however, inconsistent. This is
implied by the following proposition which is given without proof. It is not surprising considering that each value I (ω ) is the sum j

of squares of only two random variables irrespective of the sample size.


Proposition 4.3.2.
If (X t: t ∈ Z) is a (causal or noncausal) weakly stationary time series such that

Xt = ∑ ψj Zt−j , t ∈ Z,

j=−∞

with ∑ ∞

j=−∞
| ψj | < ∞and(Zt )t∈Z ∼ WN(0, σ )
2
, then

4.3.1 https://stats.libretexts.org/@go/page/848
2I (ω1:n ) 2I (ωm:n ) D
( ,…, ) → (ξ1 , … , ξm ),
f (ω1 ) f (ωm )

where ω , … , ω are m distinct frequencies with ω


1 m j:n → ωj and f (ω j) >0 . The variables ξ1 , … , ξm are independent, identical
chi-squared distributed with two degrees of freedom.
The result of this proposition can be used to construct confidence intervals for the value of the spectral density at frequency ω. To
this end, denote by χ (α) the lower tail probability of the chi-squared variable ξ , that is,
2
2
j

2
P (ξj ≤ χ (α)) = α.
2

Then, Proposition 4.3.2. implies that an approximate confidence interval with level 1 − α is given by
2I (ωj:n ) 2I (ωj:n )
≤ f (ω) ≤ .
2 2
χ (1 − α/2) χ (α/2)
2 2

Proposition 4.3.2. also suggests that confidence intervals can be derived simultaneously for several frequency components. Before
confidence intervals are computed for the dominant frequency of the recruitment data return for a moment to the computation of
the FFT which is the basis for the periodogram usage. To ensure a quick computation time, highly composite integers n have to be ′

used. To achieve this in general, the length of time series is adjusted by padding the original but detrended data by adding zeroes. In
R, spectral analysis is performed with the function spec.pgram. To find out which $n^\prime$ is used for your particular data,
type nextn(length(x)), assuming that your series is in x.

Figure 4.6: Averaged periodogram of the recruitment data discussed in Example 4.3.1.
Example 4.3.1.
Figure 4.5 displays the periodogram of the recruitment data which has been discussed in Example 3.3.5. It shows a strong annual
frequency component at ω = 1/12 as well as several spikes in the neighborhood of the El Nin ~
o frequency ω = 1/48. Higher
frequency components with ω > .3 are virtually absent. Even though an AR(2) model was fitted to this data in Chapter 3 to
produce future values based on this fit, it is seen that the periodogram here does not validate this fit as the spectral density of an
AR(2) process (as computed in Example 4.2.3.) is qualitatively different. In R, the following commands can be used
(nextn(length(rec)) gives n = 480 here if the recruitment data is stored in rec as before).

>rec.pgram=spec.pgram(rec, taper=0, log="no")


>abline(v=1/12, lty=2)
>abline(v=1/48, lty=2)
The function spec.pgram allows you to fine-tune the spectral analysis. For our purposes, we always use the specifications
given above for the raw periodogram (taper allows you, for example, to exclusively look at a particular frequency band, log
allows you to plot the log-periodogram and is the R standard).
To compute the confidence intervals for the two dominating frequencies 1/12 and 1/48, you can use the following R code, noting
that 1/12 = 40/480 and 1/48 = 10/480.
>rec.pgram{\$}spec[40]
[1] 21332.94
>rec.pgram{\$}spec[10]

4.3.2 https://stats.libretexts.org/@go/page/848
[1] 14368.42
>u=qchisq(.025, 2); l=qchisq(.975, 2)
>2*rec.pgram{\$}spec[40]/l
>2*rec.pgram{\$}spec[40]/u
>2*rec.pgram{\$}spec[10]/l
~2*rec.pgram{\$}spec[10]/u
Using the numerical values of this analysis, the following confidence intervals are obtained at the level α = .1 :

f (1/12) ∈ (5783.041, 842606.2) and f (1/48) ∈ (3895.065, 567522.5).

These are much too wide and alternatives to the raw periodogram are needed. These are provided, for example, by a smoothing
approach which uses an averaging procedure over a band of neighboring frequencies. This can be done as follows.
>k=kernel("daniell",4)
>rec.ave=spec.pgram(rec, k, taper=0, log="no")
> abline(v=1/12, lty=2)
> abline(v=1/48, lty=2)
> rec.ave$bandwidth
[1] 0.005412659\medskip
The resulting smoothed periodogram is shown in Figure 4.6. It is less noisy, as is expected from taking averages. More precisely, a
two-sided Daniell filter with m = 4 was used here with L = 2m + 1 neighboring frequencies
k
ωk = ωj + , k = −m, … , m,
n

to compute the periodogram at ω = j/n . The resulting plot in Figure 4.6 shows, on the other hand, that the sharp annual peak has
j
−−
been flattened considerably. The bandwidth reported in R can be computed as b = L/(√12n) . To compute confidence intervals
one has to adjust the previously derived formula. This is done by taking changing the degrees of freedom from 2 to df = 2Ln/n ′

(if the zeroes where appended) and leads to


m m
df k df k
∑ f (ωj + ) ≤ f (ω) ≤ ∑ f (ωj + )
2 2
χ (1 − α/2) n χ (α/2) n
df k=−m df k=−m

for ω ≈ ω . For the recruitment data the following R code can be used:
j

>df=ceiling(rec.ave{\$}df)
>u=qchisq(.025,df), l~=~qchisq(.975,df)
>df*rec.ave{\$}spec[40]/l
>df*rec.ave{\$}spec[40]/u
>df*rec.ave{\$}spec[10]/l
>df*rec.ave{\$}spec[10]/u

4.3.3 https://stats.libretexts.org/@go/page/848
Figure 4.7: The modified Daniell periodogram of the recruitment data discussed in Example 4.3.1.
to get the confidence intervals

f (1/12) ∈ (1482.427, 5916.823) and f (1/48) ∈ (4452.583, 17771.64).

The compromise between the noisy raw periodogram and further smoothing as described here (with L = 9 ) reverses the magnitude
of the 1/12 annual frequency and the 1/48 El Nin ~
o component. This is due to the fact that the annual peak is a very sharp one,
with neighboring frequencies being basically zero. For the 1/48 component, there are is a whole band of neighboring frequency
which also contribute the El Nin~
o phenomenon is irregular and does only on average appear every four years). Moreover, the
annual cycle is now distributed over a whole range. One way around this issue is provided by the use of other kernels such as the
modified Daniell kernel given in R as kernel("modified.daniell", c(3,3)). This leads to the spectral density in
Figure 4.7.

Contributers
Alexander Aue (Department of Statistics, University of California, Davis)
Integrated by Brett Nakano (statistics, UC Davis)
Demo: I really love the way that Equation 4.3.1 looks.

This page titled 4.3: Large Sample Properties is shared under a not declared license and was authored, remixed, and/or curated by Alexander Aue.

4.3.4 https://stats.libretexts.org/@go/page/848
4.4: Linear Filtering
A linear filter uses specific coefficients (ψ : s ∈ Z) , called the impulse response function, to transform a weakly stationary input
s

series (X : t ∈ Z) into an output series (Y : t ∈ Z) via


t t

Yt = ∑ ψs Xt−s , t ∈ Z,

s=−∞


where ∑ s=−∞
| ψs | < ∞ . Then, the frequency response function

Ψ(ω) = ∑ ψs exp(−2πiωs)

s=−∞

is well defined. Note that the two-point moving average of Example 4.2.2 and the differenced sequence ∇X are examples of t

linear filters. On the other hand, any \/ causal ARMA process can be identified as a linear filter applied to a white noise sequence.
Implicitly this concept was already used to compute the spectral densities in Exampels 4.2.2 and 4.2.3. To investigate this in further
detail, let γ (h) and γ (h) denote the ACVF of the input process (X : t ∈ Z) and the output process (Y : t ∈ Z) , respectively,
X Y t t

and denote by f (ω) and f (ω) the corresponding spectral densities. The following is the main result in this section.
X Y

Theorem 4.4.1.
Under the assumptions made in this section, it holds that f Y (ω) = |Ψ(ω)| fX (ω)
2
.
Proof. First note that
γY (h) = E[(Yt+h − μY )(Yt − μY )]

∞ ∞
=∑ ∑ ψr ψs γ(h − r + s)
r=−∞ s=−∞

∞ ∞ 1/2
=∑ ∑ ψr ψs ∫ exp(2πiω(h − r + s))fX (ω)dω
r=−∞ s=−∞ −1/2

1/2 ∞ ∞
=∫ (∑ ψr exp(−2πiωr))( ∑ ψs exp(2πiωs)) exp(2πiωh)fX (ω)dω
−1/2 r=−∞ s=−∞

1/2 2
=∫ exp(2πiωh)|Ψ(ω)| fX (ω)dω.
−1/2

Now identify f Y (ω) = |Ψ(ω)| fX (ω)


2
, which is the assertion of the theorem.
Theorem 4.4.1 suggests a way to compute the spectral density of a causal ARMA process. To this end, let (Yt : t ∈ Z) be such a
causal ARMA(p,q) process satisfying Y = ψ(B)Z , where (Z : t ∈ Z) ∼ WN(0, σ ) and
t t t
2


θ(z)
s
ψ(z) = = ∑ ψs z , |z| ≤ 1.
ϕ(z)
s=0

with θ(z) and ϕ(z) being the moving average and autoregressive polynomial, respectively. Note that the (ψs : s ∈ N0 ) can be
viewed as a special impulse response function.
Corollary 4.4.1.
If (Y
t: t ∈ Z) be a causal ARMA(p,q)\) process. Then, its spectral density is given by
−2πiω 2
|θ(e )|
2
fY (ω) = σ .
2
−2πiω
|ϕ(e )|

Proof. Apply Theorem 4.4.1 with input sequence (Z t: t ∈ Z) . Then f Z (ω) =σ


2
, and moreover the frequency response function is
∞ −2πiω
θ(e )
−2πiω
Ψ(ω) = ∑ ψs exp(−2πiωs) = ψ(e ) = .
−2πiω
ϕ(e )
s=0

Since f Y (ω) = |Ψ(ω)| fX (ω)


2
, the proof is complete.

4.4.1 https://stats.libretexts.org/@go/page/849
Corollary 4.4.1 gives an easy approach to define parametric spectral density estimates for causal ARMA(p,q) processes by simply
replacing the population quantities by appropriate sample counterparts. This gives the spectral density estimator
^ −2πiω 2
| θ (e )|
^ 2
^
f (ω) = σ .
n
2
^ −2πiω
| ϕ (e )|

Now any of the estimation techniques discussed in Section 3.5 may be applied when computing f^(ω) .

Contributers
Alexander Aue (Department of Statistics, University of California, Davis)
Integrated by Brett Nakano (statistics, UC Davis)

This page titled 4.4: Linear Filtering is shared under a not declared license and was authored, remixed, and/or curated by Alexander Aue.

4.4.2 https://stats.libretexts.org/@go/page/849
4.5: Summary
In this chapter, the basic methods for frequency domain time series analysis were introduced. These are based on a regression of the
given data on cosine and sine functions varying at the Fourier frequencies. On the population side, spectral densities were identified
as the frequency domain counterparts of absolutely summable autocovariance functions. These are obtained from one another by
the application of (inverse) Fourier transforms. On the sample side, the periodogram has been shown to be an estimator for the
unknown spectral density. Since it is an inconsistent estimator, various techniques have been discussed to overcome this fact.
Finally, linear filters were introduced which can, for example, be used to compute spectral densities of causal ARMA processes and
to derive parametric spectral density estimators other than the periodogram.

Contributers
Alexander Aue (Department of Statistics, University of California, Davis)
Integrated by Brett Nakano (statistics, UC Davis)

This page titled 4.5: Summary is shared under a not declared license and was authored, remixed, and/or curated by Alexander Aue.

4.5.1 https://stats.libretexts.org/@go/page/866
Index
A B M
ARMA BEST LINEAR PREDICTION method of moments
3.1: Introduction to Autoregressive Moving Average 3.4: Forecasting 3.5: Parameter Estimation
(ARMA) Processes moving average polynomial
ARMA Processes C 3.1: Introduction to Autoregressive Moving Average
3: ARMA Processes (ARMA) Processes
Causality
asymptotic unbiasedness 3.2: Causality and Invertibility
2.2: Estimation of the Autocovariance Function P
autocovariance function F partial autocorrelation function
2.2: Estimation of the Autocovariance Function 3.3: The PACF of a Causal ARMA Process
Forecasting
autoregressive moving average time periodogram
3.4: Forecasting
series 4.2: The Spectral Density and the Periodogram
3.1: Introduction to Autoregressive Moving Average I
(ARMA) Processes
Invertibility
S
autoregressive polynomial spectral density
3.2: Causality and Invertibility
3.1: Introduction to Autoregressive Moving Average 4.2: The Spectral Density and the Periodogram
(ARMA) Processes
L stochastic process
1.1: Introduction and Examples
linear filtering
4.4: Linear Filtering

1 https://stats.libretexts.org/@go/page/9135
Glossary
Sample Word 1 | Sample Definition 1

1 https://stats.libretexts.org/@go/page/13572
Detailed Licensing
Overview
Title: Time Series Analysis (Aue)
Webpages: 34
All licenses found:
Undeclared: 100% (34 pages)

By Page
Time Series Analysis (Aue) - Undeclared 3.1: Introduction to Autoregressive Moving Average
Front Matter - Undeclared (ARMA) Processes - Undeclared
TitlePage - Undeclared 3.2: Causality and Invertibility - Undeclared
InfoPage - Undeclared 3.3: The PACF of a Causal ARMA Process -
Table of Contents - Undeclared Undeclared
Licensing - Undeclared 3.4: Forecasting - Undeclared
3.5: Parameter Estimation - Undeclared
1: Basic Concepts in Time Series - Undeclared
3.6: Model Selection - Undeclared
1.1: Introduction and Examples - Undeclared 3.7: Summary - Undeclared
1.2: Stationary Time Series - Undeclared
4: Spectral Analysis - Undeclared
1.3: Eliminating Trend Components - Undeclared
1.4: Eliminating Trend and Seasonal Components - 4.1: Introduction to Spectral Analysis - Undeclared
Undeclared 4.2: The Spectral Density and the Periodogram -
1.5: Assessing the Residuals - Undeclared Undeclared
1.6: Summary - Undeclared 4.3: Large Sample Properties - Undeclared
4.4: Linear Filtering - Undeclared
2: The Estimation of Mean and Covariances -
4.5: Summary - Undeclared
Undeclared
Back Matter - Undeclared
2.1: Estimation of the Mean - Undeclared
Index - Undeclared
2.2: Estimation of the Autocovariance Function -
Glossary - Undeclared
Undeclared
Detailed Licensing - Undeclared
3: ARMA Processes - Undeclared

1 https://stats.libretexts.org/@go/page/32556

You might also like