Activités pratiques
Exercice 1
Soit la série suivante :
Pour la créer sous R :
y <- ts(c(123,39,78,52,110), start=2012)
Lorsque les observations sont plus fréquentes dans l’année, on ajoute frequency :
z<-c(120, 122, 115, 102, 130, 135, 120, 115, 140, 142, 130, 120, 160, 170,
140, 135, 180, 185, 150, 145, 185, 190, 165, 160, 188, 195, 165, 168, 188,
198, 170, 173, 192, 196, 170, 160)
> z<-ts(z, start = 2012, frequency = 12)
> z
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2012 120 122 115 102 130 135 120 115 140 142 130 120
2013 160 170 140 135 180 185 150 145 185 190 165 160
2014 188 195 165 168 188 198 170 173 192 196 170 160
D’une manière générale, frequency prend l’une des valeurs suivantes sous R :
install.packages("fpp2")
library(fpp2)
> melsyd
Time Series:
Start = c(1987, 26)
End = c(1992, 48)
Frequency = 52
First.Class Business.Class Economy.Class
1987.481 1.912 NA 20.167
1987.500 1.848 NA 20.161
1987.519 1.856 NA 19.993
1987.538 2.142 NA 20.986
…………………………………………………………………………
class(melsyd)
[1] "mts" "ts"
Data set : the weekly economy passenger
autoplot(melsyd[,"Economy.Class"]) +ggtitle("Economy class passengers: Melb
ourne-Sydney") +xlab("Year") +ylab("Thousands")
Exercice 2
Table 6.1: Annual electricity sales to residential customers in South Australia. 1989–2008.
Year Sales (GWh) 5-MA
1989 2354.34
1990 2379.71
1991 2318.52 2381.53
1992 2468.99 2424.56
1993 2386.09 2463.76
1994 2569.47 2552.60
1995 2575.72 2627.70
1996 2762.72 2750.62
1997 2844.50 2858.35
1998 3000.70 3014.70
1999 3108.10 3077.30
2000 3357.50 3144.52
2001 3075.70 3188.70
2002 3180.60 3202.32
2003 3221.60 3216.94
2004 3176.20 3307.30
2005 3430.60 3398.75
2006 3527.48 3485.43
2007 3637.89
2008 3655.00
Annual electricity sales to residential customers in South Australia. 1989–2008.
Year Sales (GWh) 5-MA
1989 2354.34
1990 2379.71
1991 2318.52
1992 2468.99
1993 2386.09
1994 2569.47
1995 2575.72
1996 2762.72
1997 2844.50
1998 3000.70
1999 3108.10
2000 3357.50
2001 3075.70
2002 3180.60
Annual electricity sales to residential customers in South Australia. 1989–2008.
Year Sales (GWh) 5-MA
2003 3221.60
2004 3176.20
2005 3430.60
2006 3527.48
2007 3637.89
2008 3655.00
Pour calculer une moyenne mobile:
ma(x, 5) : moyenne mobile d’ordre5.
Pour tracer le graphique correspondant :
autoplot(elecsales, series="Data") +
autolayer(ma(elecsales,5), series="5-MA") +
xlab("Year") + ylab("GWh") +
ggtitle("Annual electricity sales: South Australia") +
scale_colour_manual(values=c("Data"="grey50","5-MA"="red"),
breaks=c("Data","5-MA"))
Commenter.
Exercice 3
library(fpp2)
Registered S3 method overwritten by 'quantmod':
method from
as.zoo.data.frame zoo
-- Attaching packages ------------------------ fpp2 2.4 --
v ggplot2 3.3.2 v fma 2.4
v forecast 8.13 v expsmooth 2.3
> elecequip
Jan Feb Mar Apr May Jun Jul
1996 79.35 75.78 86.32 72.60 74.86 83.81 79.80
1997 78.64 77.42 89.86 81.27 78.68 89.51 83.67
…………………………………………………………………………….
1) Visulatisation optimale du graphique:
autoplot(elecequip, series="Data") +
+ autolayer(ma(elecequip, 12), series="12-MA") +
+ xlab("Year") + ylab("New orders index") +
+ ggtitle("Electrical equipment manufacturing (Euro area)") +
+ scale_colour_manual(values=c("Data"="grey","12-MA"="red"),
+ breaks=c("Data","12-MA"))
On va utiliser la moyenne mobile pondérée Weighted moving averages
elecequip %>% decompose(type="multiplicative") %>%
autoplot() + xlab("Year") +
ggtitle("Classical multiplicative decomposition
of electrical equipment index")
Une autre méthode de décomposition plus récente (2016) permet de décomposer une série en
utilisant seas() function from the seasonal package for R.
elecequip %>% seas(x11="") -> fit
autoplot(fit) +
ggtitle("X11 decomposition of electrical equipment index")
STL Decomposition
C’est une méthode robuste de decomposition d’une série, applicable pour des
series même non linéaires (Seasonal and Trend decomposition using Loess,” while
Loess is a method for estimating nonlinear relationships):
elecequip %>%
stl(t.window=13, s.window="periodic", robust=TRUE) %>%
autoplot()
t.window Nombre d’observations consecutives utilisées pour estimer le trend-cycle;
s.window Nombre d’années consecutives utilisées pour estimer chaque valeur dans la
composante saisonnière.
Pour faire des prévisions :
stlf(elecequip, method='naive')
Point Forecast Lo 80 Hi 80 Lo 95
Apr 2012 83.94889 79.78271 88.11506 77.57727
…………………………………………………….
Exercice 4
The plastics data set consists of the monthly sales (in thousands) of product A for
a plastics manufacturer for five years.
a. Plot the time series of sales of product A. Can you identify seasonal fluctuations
and/or a trend-cycle?
b. Use a classical multiplicative decomposition to calculate the trend-cycle and
seasonal indices.
c. Do the results support the graphical interpretation from part a?
d. Compute and plot the seasonally adjusted data.
e. Change one observation to be an outlier (e.g., add 500 to one observation), and
recompute the seasonally adjusted data. What is the effect of the outlier?
f. Does it make any difference if the outlier is near the end rather than in the middle
of the time series?
Exercice 5
The data set fancy concerns the monthly sales figures of a shop which opened in
January 1987 and sells gifts, souvenirs, and novelties. The shop is situated on the
wharf at a beach resort town in Queensland, Australia. The sales volume varies
with the seasonal population of tourists. There is a large influx of visitors to the
town at Christmas and for the local surfing festival, held every March since 1988.
Over time, the shop has expanded its premises, range of products, and staff.
a. Produce a time plot of the data and describe the patterns in the graph. Identify any
unusual or unexpected fluctuations in the time series.
b. Explain why it is necessary to take logarithms of these data before fitting a model.
c. Use R to fit a regression model to the logarithms of these sales data with a linear
trend, seasonal dummies and a “surfing festival” dummy variable.
d. Plot the residuals against time and against the fitted values. Do these plots reveal
any problems with the model?
e. Do boxplots of the residuals for each month. Does this reveal any problems with
the model?
f. What do the values of the coefficients tell you about each variable?
g. What does the Breusch-Godfrey test tell you about your model?
h. Regardless of your answers to the above questions, use your regression model to
predict the monthly sales for 1994, 1995, and 1996. Produce prediction intervals
for each of your forecasts.
i. Transform your predictions and intervals to obtain predictions and intervals for the
raw data.
j. How could you improve these predictions by modifying the model?