Time Series Modeling Example using ARIMA, Regression, and LSTM

Zihao Zhang
8 min readMay 17, 2022

I tested several common time series models on a simple and easy-to-replicate problem. The article is not a complete guide to the time series model but a true example and my workflow on this problem, including data preprocessing, trying different models, tuning parameters, and model selection.

I used a dataset from oasis.caiso.com. I selected SYSTEM DEMAND->CAISO Demand Forecast and data from 03/01/2022 to 03/31/2022 as the training data, and 04/01/2022 as the test data. And I focused on the region called SCE-TAC. (The dataset uses GMT as the timestamp, so the timestamp in the dataset has slightly deviated from the date. The code and dataset are available on my GitHub)

For a brief intuition about the dataset, the key column is EXECUTION_TYPE, which differentiates actual or forecast values, the EXECUTION_TYPE includes:

- ACTUAL: The actual or real demand in the system. (Hourly)

- RTM: Real-Time 5 min demand forecast. (5-min slice)

- RTPD: Real-Time 15 min demand forecast. (15-min slice)

- DAM: One-day ahead demand forecast. (Hourly)

- 2DA: Two-day ahead demand forecast. (Hourly)

- 7DA: Seven-day ahead demand forecast. (Hourly)

And as is shown above, the granularity of some features is hourly and some is minute level. But we do not need minute granularity so we shall aggregate minute data using the median to align them.

Our goal is to predict ACTUAL demand in 04/01/2022 with training only on March data, which is a quite typical time series forecast problem.

Dataset and preprocessing

I filtered data whose ‘TAC_AREA_NAME’ is ‘SCE-TAC’ and chose the only meaningful numerical variable ‘MW’. At first, I need to check the missing and null values.

TAC_AREA_NAME     0
EXECUTION_TYPE 0
MW 0
dtype: int64
---
RTD 288
RTPD 96
7DA 24
ACTUAL 24
2DA 24
DAM 24
Name: EXECUTION_TYPE, dtype: int64
---
RTD 8858
RTPD 2961
2DA 743
7DA 743
DAM 743
ACTUAL 743
Name: EXECUTION_TYPE, dtype: int64

Overall, the dataset is quite clean, with no null value, only some missing values. The data on 04/01/2022 is perfect but the data on March is not. We can find that firstly the number of hours is not 31*24 = 744 since March lost 1 hour in the US (winter time to summer time) so one day only has 23 hours. But I parsed the date and made it continuous so it won’t bother the overall time characteristic of the dataset. Plus, the number of 15-min slice is not exact 4 times as the hourly data, and the number of 5-min slice is not exact 3 times as the15-min slice data, which indicates missing records here. Fortunately, minute data will be aggregated into hourly data so once there is at least one record within the hour, there will be no missing record after aggregation.

Then I made a pivot table to show the dataset.

We can check what the dataset looks like:

We can easily see the daily trend and weekly trends.

For a deeper understanding of the dataset, let’s recall that ‘ACTUAL’ is the actual demand we want to predict. Other values are some existing forecast values provided by the dataset. Basically, for time series forecast problems, we know some common metrics like RMSE, MSE, MAE, and MAPE to evaluate the goodness of a prediction. Here I chose RMSE and MAPE to check how good these existing forecast values are.

The result shows

2DA RMSE: 434.469
7DA RMSE: 484.985
ACTUAL RMSE: 0.000
DAM RMSE: 371.914
RTD RMSE: 948.978
RTPD RMSE: 957.156
2DA MAPE: 3.359
7DA MAPE: 3.478
ACTUAL MAPE: 0.000
DAM MAPE: 2.774
RTD MAPE: 9.190
RTPD MAPE: 9.249

So DAM, the one-day ahead forecast is the best prediction. And I printed the MAPE of DAM on 04/01/2022, which can be regarded as a baseline.

Baseline MAPE of DAM on 04/01/2022: 5.2267

It means that we need to provide better prediction than this baseline given MAPE as the metric.

Model

There are some common machine learning models to handle time series prediction problems like regression, LSTM, Prophet, and ARIMA. Here I tested Linear Regression, and different LSTM, and ARIMA. Overall, ARIMA shows the best prediction on MAPE. It is quite intuitive since we can add some prior knowledge to the model as we have observed a regular and periodic trend.

Linear Regression

Here I used the shift method to match the actual demand value (target) and existing forecast values (features). I shifted the actual demand value to 24 rows (hour) ahead. The logic is shown below. By training a linear model on the training set below, I can use the last 24h data (test set) in March to predict a 24h target, which is the predicted 04/01 actual demand value.

which shows a good MAPE.

MAPE of linear regression: 3.1629

LSTM

For LSTM, it is a well-known time series model which can be used for prediction. There are some common LSTM models.

Vanilla LSTM
A Vanilla LSTM is an LSTM model that has a single hidden layer of LSTM units, and an output layer used to make a prediction.

Stacked LSTM
Multiple hidden LSTM layers can be stacked one on top of another in what is referred to as a Stacked LSTM model.

Bidirectional LSTM
On some sequence prediction problems, it can be beneficial to allow the LSTM model to learn the input sequence both forward and backwards and concatenate both interpretations.

I split the original dataset as training set and validation set with a 4:1 split ratio. I set an early stop method on validation set loss. And given the same layer_size, and split ratio, I tested Stacked LSTM, Vanilla LSTM, and Bidirectional LSTM and compared their MAPE as the criterion. Finally, I chose the Stacked LSTM.

The result shows

MAPE of LSTM: 3.5701

It is not as good as LR but still better than baseline.

We cannot add prior knowledge to a deep learning model but can only tune hyperparameters, compiler, and model framework.

ARIMA

ARIMA with parameter (p, d, q) (And Seasonal ARIMA, with parameter (p, d, q)(P, D, Q)[m]) is a good model for time series forecast. I don’t interpret what these parameters mean here but only show how to choose a good model by tuning hyperparameters.

Before building models, we need to check the ACF and PACF plots to get a brief understanding of the data and its seasonal feature.

Apparently, We can confirm the true demand series is not stationary. Plus, both plots above show a seasonal feature of the data. So it is reasonable to implement a seasonal ARIMA model on this prediction work.

Tuning hyperparameters of ARIMA models can be annoying. There is a package named pdmarima to help automatically search p,d,q and P, D, Q by calling the auto_arima function but this package fails sometimes on Colab, so it is not easily reproducible. Also, it costs a lot of time to train a seasonal ARIMA model and select the best hyperparameters.

Usually, these packages use AIC or BIC as the criteria to select ARIMA models, as they are more robust compared with other metrics like using MAPE as the selection criterion, as these metrics like MAPE are sensitive to other unimportant hyperparameters like the train test split ratio. In addition, the minimal MAPE does not come with minimal AIC.

Therefore, I decided to search the p, d, q by only tuning on the regular ARIMA model, and used AIC as the selection criterion at first. However, the result shows a (p, d, q)=(12, 0, 12), which is not consistent with the prior judgment about stationarity. On the other hand, both (p, d, q)=(12, 0, 12) and (p, d, q)=(12, 1, 12) performs well with the smallest AIC. Hence I tried both (p, d, q) on the test set and compare the result. Fortunately, (p, d, q)=(12, 1, 12) performs better on the test set, which is consistent with our prior judgment.

Then I involved seasonal parameters in the model by applying seasonal ARIMA. I tried this intuitive method but I cannot tell if it is always the best solution.

I created a grid for p, d, q, and searched the minimal AIC.

It shows

Best model is ARIMA(12, 0, 12), with AIC: 7518.9765
Test size: 148
MAPE of ARIMA: 1.4576
RMSE of ARIMA: 184.5320

It is much better than the previous two methods.

Then, I tried to add the seasonal feature into the model. But training a seasonal ARIMA model costs time if I keep using the grid search method. So I set the seasonal parameter as 24 since we’ve observed there is a daily cycle which is 24 records (24 hours) long and tried several P, and Q following the Open Machine Learning Course article to compare the result.

The result shows

MAPE of SARIMA: 1.4485
RMSE of SARIMA: 201.6864

It is not a significant improvement.

Result

Finally, let’s visualize the prediction and compare it with the true value.

For this toy dataset, there are still a lot of tricks to try like how to tune hyperparameters and how to select the best models.

Reference

--

--