ARIMA.r

### This R script downloads 10 yr/2 yr yield curve data from FRED and computes
### the date the yield curve will invert, first using a traditional least-squares
### method, then using a more accurate ARIMA model.

### Go to https://www.quandl.com and sign up for a free account.   Once done, go
### to "Account Settings", and copy your API key.

### Now open R and run the following commands.  You can either save it as a .R
### script or just enter the commands manually.   Lines starting with "#" are
### comments and can be ignored.

### We need the following packages for this example.
packages <- c("Quandl", "lubridate", "lmtest", "forecast")

### Install (if needed) and load needed packages.
new.packages <- packages[!(packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages, quiet=TRUE)
lapply(packages, "library", character.only=TRUE)

### Initialize Quandl with our API key
Quandl.api_key("<INSERT_YOUR_QUANDL_KEY_HERE>")

### Load the 10-2 yield curve data from FRED.   The FRED code for this data
### set is "T10Y2Y".   If you want to analyze a different data set, just
### Google for "FRED GRAPHS <insert data set here>" and it will usually
### return the code you want.   In this example, I'm graphing data since
### December 13, 2016 (two complete years of data).
Data <- Quandl("FRED/T10Y2Y", start_date="2017-01-01")
attach(Data)

### Note that when importing dates, R imports them as t"2018-03-05".  For
### purposes of computing a fit, something like "2018.165" is more useful.
Decimal_Date <- decimal_date(Date)

### Do a linear regression fit of the data
Fit.lm <- lm(Value ~ Decimal_Date)
summary(Fit.lm)

### Calculate the inversion date given the simple LM fit
InversionDate.lm <- - Fit.lm$coefficients[1] / Fit.lm$coefficients[2]
date_decimal(InversionDate.lm)

### According to a simple linear regression, the yield curve will invert
### on March 19, 2019.  Now let's examine the residuals of the fit...
plot(residuals(Fit.lm), type="l")

### The "eyeball test" shows that the residuals aren't random.   There appears
### to be a strong quasi-perioidic component embedded in the residuals which
### renders the data "non-stationary".  So we need to examine the residuals to
### see what's going on.
###
### The Durbin-Watson test attempts to see if there is significant
### autocorrelation in the residuals.  The test is bounded between 0 and 4.
###
### d = 2  -> residuals show no correlation
### d > 2  -> residuals show positive correlation
### d < 2  -> residuals show negative correlation
dwtest(Value ~ Decimal_Date)

### The Durbin-Watson test score is 0.099033, indicating a strong
### positive correlation in the residuals.   This is due to the fact
### that bond prices residuals *aren't* random (which a simple linear
### regression assumes).  Rather, the price of the bond on day N is
### usually strongly correlated with the price on day N-1.   So we need
### to use a more advanced model to capture that autocorrelation.
###
### The ARIMA regression model is designed to do exactly that.  Explaining
### it is well beyond what a basic tutorial can handle, so I'm just going
### to do it, and you can Google more about it if you want.
###
### A generic ARIMA model has three parameters:
###
### p -> the order (number of time lags) of the autoregressive model
### d -> the degree of differencing (number of times past values were subtracted)
### q -> the order of the moving-average model
###
### As I write this, the data is best modeled by a (1,0,0) model.  Sometimes
### the auto.arima function seems to pick an unnecessarily complex model to
### fit because it minimizes AIC by a microscopic amount compared to a
### simple (1,0,0) model.  This is one of those areas where you need a bit
### of experience so you know when to use auto.arima and when to ignore it.
### Try to use both "aic" and "bic" in the "ic=" parameter to see which models
### Fit.arima <- auto.arima(Value, xreg=Decimal_Date, stepwise=FALSE, approximation=FALSE, ic="bic")

Fit.arima <- Arima(Value, xreg=Decimal_Date, order=c(1,0,0))
summary(Fit.arima)

### And the ARIMA model thinks the inversion date is...
InversionDate.arima = - Fit.arima$coef[2] / Fit.arima$coef[3]
date_decimal(InversionDate.arima)

### So the more advanced model predicts the yield curve will invert a few days
### later than the simple linear regression (03/25/19 vs 03/19/19).

### Now, let's check the residuals of the ARIMA fit againt the residuals from
### our original LM fit...
plot(residuals(Fit.lm), type="l", col="red")
lines(residuals(Fit.arima), type="l", col="blue")

### Our residuals are now smaller, and look much more random.