Econometrics: Univariate Timeseries Modelling and Forecasting

Univariate Timeseries Modelling (UTM) is simply the modelling and prediction of future economic variables using information from own past values and present/previous residual values. In other words, the values of y-variable in past periods (along with the current residual) influences the value of y-variable. We neither have explanatory x-variables nor other explained y-variables; we hardly have a theory when we carry out an empirical analysis of the observed data (Brooks, 2008). UTM is recommended when we are unable to define the independent x-variables, and/or are unable to observe/ measure them; it is also recommended when x-variables and y-variables are measured at different point in time (years, months, days etc.).

Ideally, if you have developed a good econometric model, you can use the model to make economic forecast. Forecasting is when we predict the future values that a time series will get. There are variety of forecasting models:

No change (just use current value)
Long-term arithmetic average
Structural models (with y- and x-variables)
Time series forecasting (AR = with known data and data for forecast; MA = with present data and previously known data; ARMA)

In the case study below, I will use a time series data to estimate parameters of Autoregressive-Moving-Average (ARMA) model, and then forecast with the ARMA model.

According to Brooks (2008), there are also certain decisions to be made before forecasting, namely:

Point forecasts or interval forecasts
In-sample forecasts or out-of-sample forecasts
One-step-ahead (for a single period) or multi-step-ahead (for multiple periods)
Data used
1. Recursive window
2. Rolling window

In the case study below, I will focus on equity market (stock market).

Case study

Problem definition and specific goal

The stock market is a crucial part of the economy in many countries. Billions of dollars are traded on a daily basis on the stock markets. Investment in shares is the main source of income for many investors because they can get paid dividends. Also, the share could also appreciate. Despite these benefits, such investment could also be risky. For instance, the current performance of the stock market is influenced by its performance in the past. So, the question is: to what extent does the past values exert influence on the stock market performance?

Identify the data that needs to be collected

For this case study, I am using a part of the data collected by a friend of mine for her master’s thesis (she gave me the permission to do this! She did a panel data analysis in her thesis. But here, I am interested in doing a univariate time-series analysis). I chose the UK, where most of the data were readily available on OECD. The UK is a commodity dependent country. It has the second largest economy in Europe after Germany. It is a monthly dataset that contains the UK stock market index in the years 2002 through 2017. The choice of monthly data was to have more frequency in the observations as opposed to using yearly data. For this analysis, I selected the period from January 2002 – December 2017 in order to observe the changes in the stock markets over a considerable long period more than ten years; hence the last period is December 2017. In addition, the year 2017 was the last year before my friend started the analysis of her master’s thesis. There are two main ways to measure the stock market performance: Stock market index (SMI) and return on SMI. My friend measured the stock market using SMI as a proxy for the stock market performance because they are widely published and provide a clear benchmark for evaluating performance. She also chose the closing value of the SMI for her analysis. I did the same.

Organize the data

I load my data and packages, then take the logarithm of SMI before I use the autoregressive (AR) model to estimate the present values of SMI in the UK and make a forecast of its future SMI based on the estimated AR model. By the way, the R codes are adapted from Colonescu (2016).

rm(list=ls()) #Removes all items in Environment!
library(dynlm) #for function `dynlm()`
library(lmtest) #for `coeftest()` and `bptest()`.
library(broom) #for `glance(`) and `tidy()`
library(sandwich)
library(knitr) #for `kable()`
library(forecast) # for AR prediction

ukload <- read.table("ukdata.csv", header=TRUE, 
                     sep=",", na.strings="NA", dec=".", strip.white=TRUE)# load data
ukload$period<- as.Date(ukload$period)# recognize year as date
uk.ts <- ts(ukload, start=c(2002,1), # start January 2002
            end=c(2017,12), # end December 2017
            frequency=12)# monthly data
lnSMI<-log(uk.ts[, "SMI"]) # log of smi

Extract features

ARMA Estimation: To what extent does the past SMI explain the present SMI?

First, we need to know how many lags (how far in the past we need to go) to include. This could be known by either using correlogram or AIC and BIC. I used the latter, and the results suggested that I use 5 lags (the lowest number is -609.9 and -587.3).

aics <- rep(0,5)
bics <- rep(0,5)
y <-log(uk.ts[, "SMI"]) # log of smi
for (i in 1:5){
  ari <- dynlm(y~L(y,1:i), start=i)
  aics[i] <- AIC(ari)
  bics[i] <- BIC(ari)
}
tbl <- data.frame(rbind(aics, bics))
names(tbl) <- c("1","2","3","4","5")
row.names(tbl) <- c("AIC","BIC")
kable(tbl, digits=1, align='c',
      caption="Lag order selection for an AR model")

	1	2	3	4	5
AIC	-627.8	-622.6	-617.5	-612.3	-609.9
BIC	-618.1	-609.6	-601.3	-592.9	-587.3

Lag order selection for an AR model

Analyze

There is enough evidence at 0.05 level of significance to support the claim that last month values of SMI exert a positive influence on the present values of SMI.

uk.ar5 <- dynlm(lnSMI~L(lnSMI)+L(lnSMI,2)+L(lnSMI,3)+L(lnSMI,4)+L(lnSMI,5), data=uk.ts)

kable(tidy(uk.ar5), digits=3,
      caption="Summary of the AR (5) model")

term	estimate	std.error	statistic	p.value
(Intercept)	0.246	0.123	2.000	0.047
L(lnSMI)	1.036	0.074	14.053	0.000
L(lnSMI, 2)	-0.118	0.107	-1.105	0.271
L(lnSMI, 3)	0.093	0.106	0.876	0.382
L(lnSMI, 4)	0.101	0.106	0.955	0.341
L(lnSMI, 5)	-0.143	0.073	-1.950	0.053

Summary of the AR (5) model

Since this is a monthly data, I checked to see if lags removed autocorrelation. Breusch-Godfrey test is a general test for autocorrelation. Autocorrelation is present when the errors are correlated with each other. The consequence of serial correlation is that the estimators will lack efficiency. The error term should be independent of each other. A p-value <0.05 means that there is the presence of autocorrelation. But in this case, the lags included were appropriate because we fail to reject the null hypothesis that there is no serial correlation.

a<-bgtest(uk.ar5 , order=1, type="F", fill=0)
b<- bgtest(uk.ar5 ,order=1, type="F", fill=NA)
c<-bgtest(uk.ar5 , order=5, type="Chisq", fill=0)
d<-bgtest(uk.ar5 , order=5, type="Chisq", fill=NA)


dfr<- data.frame(rbind(a[c(1,2,4)],
                       b[c(1,2,4)],
                       c[c(1,2,4)],
                       d[c(1,2,4)]
))

dfr<- cbind(c("1, F, 0",
              "1, F,NA",
              "4, Chisq, 0",
              "4, Chisq, NA"), dfr)

names(dfr) <-c ("Meth", "Stat", "Par", "p-Val")
dfr

##           Meth     Stat    Par     p-Val
## 1      1, F, 0 1.193718 1, 180  0.276041
## 2      1, F,NA 1.232804 1, 179 0.2683516
## 3  4, Chisq, 0 5.479133      5 0.3602395
## 4 4, Chisq, NA 5.248268      5 0.3863386

ARMA Forecast: How well will the UK SMI perform in the next 5 months?

The SMI values tend to decrease over the next 5 months.

ar5smi <- ar(y, aic=FALSE, order.max=5, method="ols")
fcst <- data.frame(forecast(ar5smi, 5))
kable(fcst, digits=3,
      caption="Forcasts for the AR(5) model")

	Point.Forecast	Lo.80	Hi.80	Lo.95	Hi.95
Jan 2018	8.430	8.372	8.489	8.341	8.520
Feb 2018	8.424	8.340	8.508	8.295	8.553
Mar 2018	8.417	8.316	8.518	8.263	8.572
Apr 2018	8.418	8.303	8.534	8.241	8.595
May 2018	8.414	8.282	8.546	8.212	8.615

Forecasts for the AR(5) model

Reach an insight or recommendation

First, it seems that the last month values of SMI exert a positive influence on its present values. Second, suppose we were still in early 2018 when I collected the data, the forecast model predicts decreasing values of SMI between January 2018 and May 2018. But one should take caution in using past data (data based on certain period) to predict the future (predictions outside the period in the data). Projecting too far into the future with a model that relies on past data could have negative consequences.

Finally, the relevance of AR model is questionable here. There are some macroeconomic variables that affect SMI that should also be included, namely interest rate, inflation, oil price. Let’s do a brief analysis that takes these factors into consideration. I used the short-term interest rate as treasury bill (TB) rate for the interest rate because money market interest rate has a negative relationship with stock prices. I selected Consumer Price Index (CPI) to measure inflation. Since investors are consumers, I selected the CPI so that the investment potential of investors can be observed in the presence of inflation. In addition, CPI data is provided monthly when compared to other indexes. The proxy for crude oil price was the average spot price of Brent, Texas, and Dubai equally weighed.

Therefore, using Autoregressive Distributed Lag (ARDL) model, I present a result below where the SMI depends on its past values. It also depends on the current and previous values of interest rate, oil price and inflation rate.

uk.gen <- dynlm(lnSMI ~ L(lnSMI, -1) + INTR + L(INTR, -1) + 
  OP + L(OP, -1) + INFL + L(INFL, -1), data=uk.ts)
kable(tidy(uk.gen), digits = 3,
      caption="Using dynlm with lag operators")

term	estimate	std.error	statistic	p.value
(Intercept)	0.329	0.147	2.230	0.027
L(lnSMI, -1)	0.955	0.019	50.908	0.000
INTR	-0.027	0.015	-1.774	0.078
L(INTR, -1)	0.031	0.015	2.021	0.045
OP	0.004	0.001	4.661	0.000
L(OP, -1)	-0.004	0.001	-3.840	0.000
INFL	-0.028	0.014	-1.912	0.057
L(INFL, -1)	0.026	0.015	1.816	0.071

Using dynlm with lag operators

Most of the coefficients are signficant at the .05 level, meaning that, in addition to past month values of SMI, it is important to also consider the impact of the present and past values of macroeconomic variables on the current stock market performance.

References

Brooks, C., 2008. Introductory Econometrics for Finance. Cambridge University Press, 2a upplagan. ISBN-13 (2008), pp.978-0.

Colonescu, C. (2016). Principles of Econometrics with R. Retrieved June, 11, 2020.