PSF : R Package for Pattern Sequence based Forecasting Algorithm

Neeraj Dhanraj Bokde (neerajdhanraj@gmail.com)

2016-06-18

Introduction

The Algorithm Pattern Sequence based Forecasting (PSF) was first proposed by Martinez Alvarez, et al., 2008 and then modified and suggested improvement by Martinez Alvarez, et al., 2011. The technical detailes are mentioned in referenced articles. PSF algorithm consists of various statistical operations like:

Examples

This section demonstrates the Functions used in PSF along with the examples. The data used is from iris data set, provided by R.

Install library

Download the Package and install with instruction:

library(PSF)

pred_for_w (data_in, w, k, next_val)

This Function predict the values for given data, Window size (W) and Cluster size (k).

# Considering `data_in` = iris[1], `w` = 3, `k` = 4 and `next_val` = 5
pred_for_w(iris[1], 3, 4, 5)
## [1] 7.082143 6.271795 5.623684 5.623684 5.623684

This function returns the predicted values and corresponding graph plot. The graph region with Red Color is the original data, whereas the region with Blue color is corresponding to the predicted values.

pred_for_w_plot (data_in, w, k, next_val)

This Function is similar to pred_for_w(), except this function is able to plot the predicted values.

# Considering `data_in` = iris[1], `w` = 3, `k` = 4 and `next_val` = 5
pred_for_w_plot(iris[1], 3, 4, 5)

## $Predicted_Values
## [1] 7.082143 6.271795 5.623684 5.623684 5.623684
## 
## $Plot
## NULL

optimum_w (data_in, next_val)

This Function calculates the optimum value of Window size w to be used in prediction function such that the RMSE and MAE values should be minimum.

# Considering `data_in` = iris[1] and `next_val` = 5
optimum_w(iris[1], 5)
## $Optimum_W
## [1] 1
## 
## $RMSE_Values
## [1] 0.3059953
## 
## $Prediction
##   Original_Data W_size.1
## 1           6.7 6.461538
## 2           6.3 6.461538
## 3           6.5 6.461538
## 4           6.2 6.461538
## 5           5.9 6.461538

This Function returns the optimum value for w, it’s corresponding RMSE values, prediction values and plots. The predicted values in optimum_w() function are different than those obtained in pred_for_w(), since, optimum_w() function removes last next_val numbers of integers and predicts that much values and compare them with original data.

optimum_k (data_in)

This Function is to determine the Optimum value of Cluster size (K) based on Average silhouette method.

# Considering `data_in` = iris[1] 
optimum_k(iris[1])
## [1] 10

This Function returns K value, an integer as optimum number clusters.

AUTO_PSF (data_in, next_val)

This Function takes input data and autogenerate optimum Window size (W) and cluster size (K) and predicts the next_val numbers of future values

# Considering `data_in` = iris[1] and next_val = 3
AUTO_PSF(iris[1],3)

## $Predicted_Values
## [1] 5.546154 5.546154 5.772222

This function calculate suitable values of w and k such that mean erroe between original and predicted data should be minimum. And ultimately, forecasts the series with PSF algorithm methodology.

References

Martínez-Álvarez, F., Troncoso, A., Riquelme, J.C. and Ruiz, J.S.A., 2008, December. LBF: A labeled-based forecasting algorithm and its application to electricity price time series. In Data Mining, 2008. ICDM’08. Eighth IEEE International Conference on (pp. 453-461). IEEE.

Martinez Alvarez, F., Troncoso, A., Riquelme, J.C. and Aguilar Ruiz, J.S., 2011. Energy time series forecasting based on pattern sequence similarity. Knowledge and Data Engineering, IEEE Transactions on, 23(8), pp.1230-1243.