Friday, 5 June 2015

Forecasting Ontario Housing Starts


With all of the talk about the housing market in Ontario, I decided to forecast Ontario housing starts using many of the methods that I teach in my Economic Forecasting and Analysis course. These methods are all covered in Rob Hyndmans's textbook Forecasting: Principles and Practice which I use as a textbook for my course.

Forecasting housing starts is important for the real estate market because housing starts are one of the key drivers of the real estate market.

Here are what Ontario housing starts look like over a 65 year period. Notice two peaks in the housing starts data: one in the early 1970s and the other in the late 1980s. The data comes from CANSIM.

 
 
The two peaks in housing starts are unlikely to be repeated in the near future so for forecasting, a shorter time frame is desirable. For the forecasts, the training period was set as 1994 quarter 1 to 2010 quarter 4. The test period was set as 2011 first quarter to 2015 first quarter. Forecasts were produced for each model and their accuracy measures compared.

Here are what the forecast accuracy measures look like.


For model comparison, I like to focus on the out-of-sample or test values. From the table, Seasonal Naive has the lowest MASE among the test value, indicating that based on MASE this is the best forecasting method. The next best methods are STL and ARIMA. Notice, that from the MPE values there may be some opportunity to combine forecasts, but I do not do this here.


The Seasonal naive method produces forecasts for Ontario housing starts of 59,688 units for each of 2015, 2016, and 2017. By comparison, Canada Mortgage and Housing Corporation (CMHC) forecasts Ontario housing starts of 61,700 and 60,600 for 2015 and 2016 respectively. Both sets of forecasts agree that Ontario housing starts are likely to remain flat for the next few years. Flat housing starts combined with strong demand for houses implies upward pressure on house prices. This is especially the case in the GTA. A good time for sellers, but not such a good time to be a buyer. The forecasts are also useful for helping predict future sales for any company that is directly linked to housing starts.


The R code is posted below. The data you have to source yourself from CANSIM (CANSIM data label is v730423).

#########################################################
#  Economic forecasting and analysis
#  Perry Sadorsky
#  Forecasting Ontario housing starts
##########################################################


# load libraries
library(fpp)


# import data
ontario.housing.starts <- read.csv("C:/house prices/ontario housing starts.csv")
View(ontario.housing.starts)

                                             
# tell R that data set is a time series
df = ts(ontario.housing.starts, start=1948, frequency=4)
df

hist(df[,"hs"], breaks=100)


# create a named variable
y = df[,"hs"]
                      
                      
# some graphs
hist(y)
plot(y)
Acf(y)


# add labels to the plot
plot(y, main="Ontario housing starts",  xlab="", ylab="",col = 'blue', lwd=4)



##########################################################
##########################################################
# set training data, test data, out of sample
train <- window(y,start=c(1994, 1),end=c(2010, 4))
train
plot(train)

test <- window(y, start=2011)
both <- window(y,start=c(1994, 1))
h=17

##########################################################
##########################################################


##########################################################
## forecast with benchmarks
##########################################################

fit1 <- meanf(train, h=h)
fit2 <- naive(train, h=h)
fit3 <- snaive(train, h=h)

plot(fit1)
plot(fit2)
plot(fit3)


# make a nice plot showing the forecasts
plot(fit1, plot.conf=FALSE,
main="Forecasts for Ontario housing starts")
lines(fit2$mean,col=2)
lines(fit3$mean,col=3)
legend("topleft",lty=1,col=c(4,2,3),
legend=c("Mean method","Naive method","Seasonal naive method"),bty="n")


# plot with forecasts and actual values
plot(fit1, plot.conf=FALSE,
     main="Forecasts for Ontario housing starts")
lines(fit2$mean,col=2)
lines(fit3$mean,col=3)
lines(hs)
legend("topleft",lty=1,col=c(4,2,3),
       legend=c("Mean method","Naive method","Seasonal naive method"),bty="n")



##########################################################
# exponential smoothing approaches
##########################################################

# simple exponential moving averages
fit4 <- ses(train, h = h)
summary(fit4)
plot(fit4)


# holt's linear trend method
fit5 <- holt(train,  h=h)
summary(fit5)
plot(fit5)


# holt's exponential trend method
fit6 <- holt(train, exponential=TRUE, h=h)
summary(fit6)
plot(fit6)


# holt's damped trend method
fit7 <- holt(train, damped=TRUE, h=h)
summary(fit7)
plot(fit7)


# holt winter's  method
fit8 <- hw(train, seasonal="multiplicative", h=h)
summary(fit8)
plot(fit8)


# ETS  method
y.ets <- ets(train, model="ZZZ")
summary(y.ets)
fit9 <- forecast(y.ets, h=h)
summary(fit9)
plot(fit9)


# STL  method
y.stl <- stl(train, t.window=15, s.window="periodic", robust=TRUE)
summary(y.stl)
fit10 <- forecast(y.stl, method="rwdrift",h=h)
summary(fit10)
plot(fit10)


# trend plus seasonal
tps <- tslm(train ~ trend + season)
summary(tps)
fit11 = forecast(tps,h=h)
plot(fit11)


# arima modelling and forecasting
# test for unit root
adf.test(train, alternative = "stationary")
kpss.test(train)
ndiffs(train)


y.arima <- auto.arima(train)
fit12 <- forecast(y.arima, h=h)
plot(fit12)

# take logs to reduce variability
y.arima.lambda <- auto.arima(train, lambda=0)
fit13 <- forecast(y.arima.lambda, h=h)
plot(fit13)
tsdiag(y.arima.lambda)


## ANN
ann <- nnetar(train, repeats=100,lambda=0)
fit14 = forecast(ann,h=h)
plot(fit14)

## bats
bats_fit <- bats(train)
fit15 = forecast(bats_fit,h=h)
plot(fit15)



# accuracy measures
a1 = accuracy(fit1, test)
a2 = accuracy(fit2, test)
a3 = accuracy(fit3, test)
a4 = accuracy(fit4, test)
a5 = accuracy(fit5, test)
a6 = accuracy(fit6, test)
a7 = accuracy(fit7, test)
a8 = accuracy(fit8, test)
a9 = accuracy(fit9, test)
a10 = accuracy(fit10, test)
a11 = accuracy(fit11, test)
a12 = accuracy(fit12, test)
a13 = accuracy(fit13, test)
a14 = accuracy(fit14, test)
a15 = accuracy(fit15, test)


#Combining forecast summary statistics into a table with row names
a.table<-rbind(a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, a14, a15)

row.names(a.table)<-c('Mean training','Mean test', 'Naive training', 'Naive test', 'S. Naive training', 'S. Naive test' ,
                      'ES training','ES test', 'Holt linear training', 'Holt linear test', 'Holt ES training', 'Holt ES test' ,
                      'Holt dampled training','Holt damped test', 'Holt Winters training', 'Holt Winters test', 'ETS training', 'ETS test' ,     
                  'STL training','STL test','linear trend training','linear trend test',
                  'ARIMA training', 'ARIMA test',
                  'ARIMA training (log)', 'ARIMA test (log)',
                 'ANN training', 'ANN test',
                  'BATS training', 'BATS test'
                )

# order the table according to MASE
a.table<-as.data.frame(a.table)
a.table<-a.table[order(a.table$MASE),]
a.table

# write table to csv file
## write.csv(a.table, "C:/house prices/atable.csv")
##########################################################
##########################################################


## based on MASE to choose model, forecast hs periods into the future
hs = 11

fit.os.1 <- snaive(both, h=hs)
plot(fit.os.1, include=48)
fit.os.1


fit.os.2 = stl(both, t.window=15, s.window="periodic", robust=TRUE)
fit.os.2.f  <- forecast(fit.os.2 , method="rwdrift",h=hs)
plot(fit.os.2.f, include=48)



No comments:

Post a Comment