Sunday, 21 June 2015

A Closer Look at the Oil Price - Gasoline Price Relationship in Ontario

In a recent post, Myles Harrison investigated the relationship between Ontario average gasoline prices and US oil prices. He concluded by showing a strong positive relationship between the two. The recent drop in oil prices has led to a drop in gasoline prices, but my question is whether gas prices are as low as they should be given that oil is currently a few cents below $60 per barrel.

In this post, I will conduct further analysis by extending his work in several directions. First, Myles loaded the gasoline price data from the Ontario government website using a call to wget. This is certainly an easy way to pull down data from websites. The only drawback is that two separate programs are being used: one to load the data, and another to analyze the data. I will load the data directly into R. In addition I will load additional data on oil prices, exchange rates, and the Canadian CPI. My approach results in code that is more lengthy than Myles, but has the advantage of being contained in one program. The data set is monthly starting in January 1990 and continuing until March of 2015. Second, I will investigate the gasoline price - oil price relationship using 4 different specifications of the data. From these relationships I will make a forecast of current gasoline prices based off of current oil prices. Third, I will provide some tests for residual stationarity to ensure that the the variables are cointegrated.

The gasoline price data comes from an Ontario government website. Additional data on oil prices, exchange rates, and Canadian consumer price index (CPI) comes from the Federal Reserve of St. Louis database (FRED data).

First. here are some plots of Ontario gasoline prices and US oil prices. As expected, the two series move together closely.



 I estimate four specifications.

S1: gasoline prices ~ oil prices (nominal values not adjusted for exchange rates)
S2: log(gasoline prices) ~ log(oil prices) (nominal values not adjusted for exchange rates)
S3:  gasoline prices ~ oil prices (adjusted for exchange rates and inflation)
S4: log( gasoline prices) ~ log(oil prices) (adjusted for exchange rates and inflation)

Specification S1 is the most basic and does not adjust for inflation or different currencies. Notice the strong positive correlation between gasoline prices and oil prices. The simple correlation between the two is 0.973.

 


Using this specification to forecast gas prices (for oil prices of $50, $55, $60, $65, labelled as points 1 through 4 respectively) results in the following table. For example, at oil prices of $60 (point 3) the forecast of average Ontario gasoline prices is 90.9 cents/liter. The most recent actual value is $1.16 which is beyond the upper 95% confidence level. Based on this analysis, gas prices are too high.


Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
1           82.5  74.3  90.6  70.0  95.0
2           86.7  78.5  94.9  74.2  99.2
3           90.9  82.8  99.1  78.4 103.4
4           95.1  87.0 103.3  82.6 107.7
 

Also notice the increase in variability towards the right of the plot. This can be controlled for by estimating a log - log specification (S2).

Specification 4 is the one that is most consistent because each variable is expressed in real Canadian dollars. The log - log specification accounts for changing variability in the data.




Here is a table comparing the gasoline price forecasts from each of the four specifications.


                      S1     S2     S3     S4
Oil coefficient    0.845  0.477  0.702  0.439
t stat            72.871 81.483 55.576 51.631
R-squared          0.946  0.957  0.911  0.899
Forecast (oil=50) 82.472 86.766 86.606 88.295
Forecast (oil=55) 86.696 90.801 90.116 92.068
Forecast (oil=60) 90.921 94.648 93.626 95.653
Forecast (oil=65) 95.146 98.330 97.136 99.074


The estimated coefficient on the oil price variable is positive and statistically significant in each specification. The dependent variable in each specification is different so it is not possible to compare across models, but clearly for each specification, oil prices have a statistically significant positive impact on Ontario gasoline prices. The gasoline price forecast for oil prices at $60 range between 90.9 cents/liter to 95.7 cents/liter. The interesting point is that the current price of gasoline at $1.16 per liter is higher than the upper 95% confidence bound in all cases.

Lastly, the table below shows the p-values from an ADF test on the residuals from each specification. The residuals from each specification are stationary at 5% which is consistent with a cointegrating relationship between gasoline prices and oil prices.


                            S1       S2       S3     S4
Residuals adf p value 0.000867 0.000264 0.000044 0.0464


The main take-away is that Ontario gasoline prices and oil prices are cointegrated and this is robust to several different regression specifications. Based on this analysis, gasoline prices are higher than the upper 95% confidence band.



Here is the R code.

#########################################################
#  Economic forecasting and analysis
#  Perry Sadorsky
#  June 2015
#  Forecasting Ontario gasoline prices
##########################################################

## basic reference is from this post
## http://www.r-bloggers.com/what-a-gas-the-falling-price-of-oil-and-ontario-gasoline-prices/


# load libraries
library(fpp)
library(quantmod)
library(reshape2)
library(urca)


##########################################################
## collect data from Ontario government
##########################################################

furl = c("http://www.energy.gov.on.ca/fuelupload/ONTREG1991.csv",
         "http://www.energy.gov.on.ca/fuelupload/ONTREG1992.csv",
         "http://www.energy.gov.on.ca/fuelupload/ONTREG1993.csv",
         "http://www.energy.gov.on.ca/fuelupload/ONTREG1994.csv",
         "http://www.energy.gov.on.ca/fuelupload/ONTREG1995.csv",
         "http://www.energy.gov.on.ca/fuelupload/ONTREG1996.csv",
         "http://www.energy.gov.on.ca/fuelupload/ONTREG1997.csv",
         "http://www.energy.gov.on.ca/fuelupload/ONTREG1998.csv",
         "http://www.energy.gov.on.ca/fuelupload/ONTREG1999.csv",
         "http://www.energy.gov.on.ca/fuelupload/ONTREG2000.csv",
         "http://www.energy.gov.on.ca/fuelupload/ONTREG2001.csv",
         "http://www.energy.gov.on.ca/fuelupload/ONTREG2002.csv",
         "http://www.energy.gov.on.ca/fuelupload/ONTREG2003.csv",
         "http://www.energy.gov.on.ca/fuelupload/ONTREG2004.csv",
         "http://www.energy.gov.on.ca/fuelupload/ONTREG2005.csv",
         "http://www.energy.gov.on.ca/fuelupload/ONTREG2006.csv",
         "http://www.energy.gov.on.ca/fuelupload/ONTREG2007.csv",
         "http://www.energy.gov.on.ca/fuelupload/ONTREG2008.csv",
         "http://www.energy.gov.on.ca/fuelupload/ONTREG2009.csv",
         "http://www.energy.gov.on.ca/fuelupload/ONTREG2010.csv",
         "http://www.energy.gov.on.ca/fuelupload/ONTREG2011.csv",
         "http://www.energy.gov.on.ca/fuelupload/ONTREG2012.csv",
         "http://www.energy.gov.on.ca/fuelupload/ONTREG2013.csv",
         "http://www.energy.gov.on.ca/fuelupload/ONTREG2014.csv"        
         )

gasp = c(1,2,3,4,5,6,7,8,9,10,11,12)
xx = read.csv("http://www.energy.gov.on.ca/fuelupload/ONTREG1990.csv",skip=1)
xx = head(xx,-3)
xx = tail(xx,12)
xx = xx[,"ON.Avg"]
gasp = cbind(gasp,xx)


for (i in 1:24) {
  xx = read.csv(furl[i],skip=1)
  xx = head(xx,-3)
  xx = tail(xx,12)
  xx = xx[,"ON.Avg"]
  gasp =cbind(gasp,xx)
}

## output is a nice matrix
View(gasp)


## stack the data into a column
s_gasp = melt(gasp)
s_gasp = s_gasp[-c(1:12),] 
View(s_gasp)


on_avg = s_gasp[,3]
tail(on_avg)
View(on_avg)


## append 2015 data
# "http://www.energy.gov.on.ca/fuelupload/ONTREG2015.csv"

yy = read.csv("http://www.energy.gov.on.ca/fuelupload/ONTREG2015.csv",skip=1)
yy = head(yy,-3)
yy = tail(yy,6)
yy= yy[,"ON.Avg"]
yy

on_avg_a = append(on_avg,yy)
tail(on_avg_a)
View(on_avg_a)

gas = ts(on_avg_a, start=1990, end=c(2015,3), frequency=12)
plot(gas,main="Ontario gas prices (cents per liter)", xlab="",ylab="")


##########################################################
## now load data on oil prices and FX
##########################################################

# load data from FRED into new environment
symbol.vec = c("MCOILWTICO" , "EXCAUS", "CANCPIALLMINMEI")
data <- new.env()
getSymbols(symbol.vec, src="FRED", env = data)


(data$MCOILWTICO["1990-01-01::"])-> oil
(data$EXCAUS["1990-01-01::"])-> fx
(data$CANCPIALLMINMEI["1990-01-01::"])-> cpi

# rebase cpi
tail(cpi,14)
cpi2014 = mean(  cpi[(300-11):300]  )
cpi_2014 = cpi/cpi2014 * 100
mean(  cpi_2014[(300-11):300]  )


par(mfrow = c(3, 1))
plot(oil,main="Oil prices")
plot(fx,main="$C/$US")
plot(cpi_2014,main="CPI")
par(mfrow = c(1, 1))


df_fred = cbind(oil,fx,cpi_2014)
head(df_fred)
tail(df_fred)
## latest cpi value is for March

df_fred_1 =  ts(df_fred, start=1990, end=c(2015,3), frequency=12)
nrow(df_fred_1)
length(gas)

oil_n = df_fred_1[,1]

par(mfrow = c(2, 1))
plot(gas,main="Ontario gas prices (cents per liter)", xlab="",ylab="")
plot(df_fred_1[,1],main="Oil prices ($/bbl)", xlab="",ylab="")
par(mfrow = c(1, 1))

cor(gas,df_fred_1[,1])


##########################################################
## linear regressions
##########################################################

table = matrix( NA, nrow=7, ncol=4)
table2 = matrix( NA, nrow=1, ncol=4)

lm1 = lm(gas ~ oil_n)

lm1s = summary(lm1)
table[1,1] = lm1s$coefficients[2,1]
table[2,1] = lm1s$coefficients[2,3]
table[3,1] = lm1s$r.squared


plot(gas ~ oil_n,
     ylab="Ontario gas prices (cents per liter)", xlab="WTI oil prices ($/bbl)")
abline(lm1)
adf1 =  ur.df(lm1$residuals, type="drift",selectlags="AIC", lags=12)
adf1@testreg$coefficients
table2[1,1] = adf1@testreg$coefficients[2,4]

## some forecasting
fcast <- forecast(lm1, newdata=data.frame(oil_n=c(50, 55, 60, 65)))
fcast
plot(fcast)


table[4,1] = fcast$mean[1]
table[5,1] = fcast$mean[2]
table[6,1] = fcast$mean[3]
table[7,1] = fcast$mean[4]
table



## log- log specification
lm2 = lm(log(gas) ~ log(oil_n))
summary(lm2)

lm2s = summary(lm2)
table[1,2] = lm2s$coefficients[2,1]
table[2,2] = lm2s$coefficients[2,3]
table[3,2] = lm2s$r.squared

plot(log(gas) ~ log(oil_n),
     ylab="Log of Ontario gas prices (cents per liter)", xlab="Log of WTI oil prices ($/bbl)")
abline(lm2)
adf2 =  ur.df(lm2$residuals, type="drift",selectlags="AIC", lags=12)
adf2@testreg$coefficients
table2[1,2] = adf2@testreg$coefficients[2,4]

## some forecasting
fcast <- forecast(lm2, newdata=data.frame(oil_n=c(50, 55, 60, 65)))
fcast
plot(fcast)

exp(fcast$mean)

table[4,2] = exp(fcast$mean[1])
table[5,2] = exp(fcast$mean[2])
table[6,2] = exp(fcast$mean[3])
table[7,2] = exp(fcast$mean[4])



## now do this in real Canadian dollars
gas_r = gas/df_fred_1[,3]*100
oil_r = (df_fred_1[,1]*df_fred_1[,2])/df_fred_1[,3]*100


lm3 = lm(gas_r ~ oil_r)
summary(lm3)

lm3s = summary(lm3)
table[1,3] = lm3s$coefficients[2,1]
table[2,3] = lm3s$coefficients[2,3]
table[3,3] = lm3s$r.squared

plot(gas_r ~ oil_r,
     ylab="Ontario real gas prices (cents per liter)", xlab="Real oil prices ($/bbl)")
abline(lm3)
adf3 =  ur.df(lm3$residuals, type="drift",selectlags="AIC", lags=12)
adf3@testreg$coefficients
table2[1,3] = adf3@testreg$coefficients[2,4]

## some forecasting
fcast <- forecast(lm3, newdata=data.frame(oil_r=c(50, 55, 60, 65)))
fcast
plot(fcast)

table[4,3] = fcast$mean[1]
table[5,3] = fcast$mean[2]
table[6,3] = fcast$mean[3]
table[7,3] = fcast$mean[4]



## log- log specification in real Canadian dollars
lm4 = lm(log(gas_r) ~ log(oil_r))
summary(lm4)

lm4s = summary(lm4)
table[1,4] = lm4s$coefficients[2,1]
table[2,4] = lm4s$coefficients[2,3]
table[3,4] = lm4s$r.squared


plot(log(gas_r) ~ log(oil_r),
     ylab="Log of real Ontario gas prices (cents per liter)", xlab="Log of real oil prices ($/bbl)")
abline(lm4)
adf4 =  ur.df(lm4$residuals, type="drift",selectlags="AIC", lags=12)
adf4@testreg$coefficients
table2[1,4] = adf4@testreg$coefficients[2,4]

## some forecasting
fcast <- forecast(lm4, newdata=data.frame(oil_r=c(50, 55, 60, 65)))
fcast
(fcast$mean)
(fcast$lower)
(fcast$upper)
# plot(fcast)
exp(fcast$mean)
exp(fcast$lower)
exp(fcast$upper)

fcast4 = cbind(exp(fcast$mean),exp(fcast$lower),  exp(fcast$upper))
colnames(fcast4) = c("Forecast", "Lo 80", "Lo 95", "Hi 80", "Hi 95")
fcast4

table[4,4] = exp(fcast$mean[1])
table[5,4] = exp(fcast$mean[2])
table[6,4] = exp(fcast$mean[3])
table[7,4] = exp(fcast$mean[4])


colnames(table) = c("S1", "S2", "S3", "S4")
rownames(table) = c("Oil coefficient", "t stat", "R-squared", "Forecast (oil=50)", "Forecast (oil=55)", "Forecast (oil=60)", "Forecast (oil=65)")

options(digits=3, scipen=10)
table


colnames(table2) = c("S1", "S2", "S3", "S4")
rownames(table2) = c("Residuals adf p value")
table2


Friday, 5 June 2015

Forecasting Ontario Housing Starts


With all of the talk about the housing market in Ontario, I decided to forecast Ontario housing starts using many of the methods that I teach in my Economic Forecasting and Analysis course. These methods are all covered in Rob Hyndmans's textbook Forecasting: Principles and Practice which I use as a textbook for my course.

Forecasting housing starts is important for the real estate market because housing starts are one of the key drivers of the real estate market.

Here are what Ontario housing starts look like over a 65 year period. Notice two peaks in the housing starts data: one in the early 1970s and the other in the late 1980s. The data comes from CANSIM.

 
 
The two peaks in housing starts are unlikely to be repeated in the near future so for forecasting, a shorter time frame is desirable. For the forecasts, the training period was set as 1994 quarter 1 to 2010 quarter 4. The test period was set as 2011 first quarter to 2015 first quarter. Forecasts were produced for each model and their accuracy measures compared.

Here are what the forecast accuracy measures look like.


For model comparison, I like to focus on the out-of-sample or test values. From the table, Seasonal Naive has the lowest MASE among the test value, indicating that based on MASE this is the best forecasting method. The next best methods are STL and ARIMA. Notice, that from the MPE values there may be some opportunity to combine forecasts, but I do not do this here.


The Seasonal naive method produces forecasts for Ontario housing starts of 59,688 units for each of 2015, 2016, and 2017. By comparison, Canada Mortgage and Housing Corporation (CMHC) forecasts Ontario housing starts of 61,700 and 60,600 for 2015 and 2016 respectively. Both sets of forecasts agree that Ontario housing starts are likely to remain flat for the next few years. Flat housing starts combined with strong demand for houses implies upward pressure on house prices. This is especially the case in the GTA. A good time for sellers, but not such a good time to be a buyer. The forecasts are also useful for helping predict future sales for any company that is directly linked to housing starts.


The R code is posted below. The data you have to source yourself from CANSIM (CANSIM data label is v730423).

#########################################################
#  Economic forecasting and analysis
#  Perry Sadorsky
#  Forecasting Ontario housing starts
##########################################################


# load libraries
library(fpp)


# import data
ontario.housing.starts <- read.csv("C:/house prices/ontario housing starts.csv")
View(ontario.housing.starts)

                                             
# tell R that data set is a time series
df = ts(ontario.housing.starts, start=1948, frequency=4)
df

hist(df[,"hs"], breaks=100)


# create a named variable
y = df[,"hs"]
                      
                      
# some graphs
hist(y)
plot(y)
Acf(y)


# add labels to the plot
plot(y, main="Ontario housing starts",  xlab="", ylab="",col = 'blue', lwd=4)



##########################################################
##########################################################
# set training data, test data, out of sample
train <- window(y,start=c(1994, 1),end=c(2010, 4))
train
plot(train)

test <- window(y, start=2011)
both <- window(y,start=c(1994, 1))
h=17

##########################################################
##########################################################


##########################################################
## forecast with benchmarks
##########################################################

fit1 <- meanf(train, h=h)
fit2 <- naive(train, h=h)
fit3 <- snaive(train, h=h)

plot(fit1)
plot(fit2)
plot(fit3)


# make a nice plot showing the forecasts
plot(fit1, plot.conf=FALSE,
main="Forecasts for Ontario housing starts")
lines(fit2$mean,col=2)
lines(fit3$mean,col=3)
legend("topleft",lty=1,col=c(4,2,3),
legend=c("Mean method","Naive method","Seasonal naive method"),bty="n")


# plot with forecasts and actual values
plot(fit1, plot.conf=FALSE,
     main="Forecasts for Ontario housing starts")
lines(fit2$mean,col=2)
lines(fit3$mean,col=3)
lines(hs)
legend("topleft",lty=1,col=c(4,2,3),
       legend=c("Mean method","Naive method","Seasonal naive method"),bty="n")



##########################################################
# exponential smoothing approaches
##########################################################

# simple exponential moving averages
fit4 <- ses(train, h = h)
summary(fit4)
plot(fit4)


# holt's linear trend method
fit5 <- holt(train,  h=h)
summary(fit5)
plot(fit5)


# holt's exponential trend method
fit6 <- holt(train, exponential=TRUE, h=h)
summary(fit6)
plot(fit6)


# holt's damped trend method
fit7 <- holt(train, damped=TRUE, h=h)
summary(fit7)
plot(fit7)


# holt winter's  method
fit8 <- hw(train, seasonal="multiplicative", h=h)
summary(fit8)
plot(fit8)


# ETS  method
y.ets <- ets(train, model="ZZZ")
summary(y.ets)
fit9 <- forecast(y.ets, h=h)
summary(fit9)
plot(fit9)


# STL  method
y.stl <- stl(train, t.window=15, s.window="periodic", robust=TRUE)
summary(y.stl)
fit10 <- forecast(y.stl, method="rwdrift",h=h)
summary(fit10)
plot(fit10)


# trend plus seasonal
tps <- tslm(train ~ trend + season)
summary(tps)
fit11 = forecast(tps,h=h)
plot(fit11)


# arima modelling and forecasting
# test for unit root
adf.test(train, alternative = "stationary")
kpss.test(train)
ndiffs(train)


y.arima <- auto.arima(train)
fit12 <- forecast(y.arima, h=h)
plot(fit12)

# take logs to reduce variability
y.arima.lambda <- auto.arima(train, lambda=0)
fit13 <- forecast(y.arima.lambda, h=h)
plot(fit13)
tsdiag(y.arima.lambda)


## ANN
ann <- nnetar(train, repeats=100,lambda=0)
fit14 = forecast(ann,h=h)
plot(fit14)

## bats
bats_fit <- bats(train)
fit15 = forecast(bats_fit,h=h)
plot(fit15)



# accuracy measures
a1 = accuracy(fit1, test)
a2 = accuracy(fit2, test)
a3 = accuracy(fit3, test)
a4 = accuracy(fit4, test)
a5 = accuracy(fit5, test)
a6 = accuracy(fit6, test)
a7 = accuracy(fit7, test)
a8 = accuracy(fit8, test)
a9 = accuracy(fit9, test)
a10 = accuracy(fit10, test)
a11 = accuracy(fit11, test)
a12 = accuracy(fit12, test)
a13 = accuracy(fit13, test)
a14 = accuracy(fit14, test)
a15 = accuracy(fit15, test)


#Combining forecast summary statistics into a table with row names
a.table<-rbind(a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, a14, a15)

row.names(a.table)<-c('Mean training','Mean test', 'Naive training', 'Naive test', 'S. Naive training', 'S. Naive test' ,
                      'ES training','ES test', 'Holt linear training', 'Holt linear test', 'Holt ES training', 'Holt ES test' ,
                      'Holt dampled training','Holt damped test', 'Holt Winters training', 'Holt Winters test', 'ETS training', 'ETS test' ,     
                  'STL training','STL test','linear trend training','linear trend test',
                  'ARIMA training', 'ARIMA test',
                  'ARIMA training (log)', 'ARIMA test (log)',
                 'ANN training', 'ANN test',
                  'BATS training', 'BATS test'
                )

# order the table according to MASE
a.table<-as.data.frame(a.table)
a.table<-a.table[order(a.table$MASE),]
a.table

# write table to csv file
## write.csv(a.table, "C:/house prices/atable.csv")
##########################################################
##########################################################


## based on MASE to choose model, forecast hs periods into the future
hs = 11

fit.os.1 <- snaive(both, h=hs)
plot(fit.os.1, include=48)
fit.os.1


fit.os.2 = stl(both, t.window=15, s.window="periodic", robust=TRUE)
fit.os.2.f  <- forecast(fit.os.2 , method="rwdrift",h=hs)
plot(fit.os.2.f, include=48)