The Algorithm Pattern Sequence based Forecasting (PSF) was first proposed by Martinez Alvarez, et al., 2008
and then modified and suggested improvement by Martinez Alvarez, et al., 2011
. The technical detailes are mentioned in referenced articles. PSF algorithm consists of various statistical operations like:
This section demonstrates the Functions used in PSF along with the examples. The data used is from iris
data set, provided by R.
Download the Package and install with instruction:
library(PSF)
pred_for_w (data_in, w, k, next_val)
This Function predict the values for given data, Window size (W
) and Cluster size (k
).
data_in
as Input data, in any format (data matrix data frame or vector). All variables should be numeric and NA values will get removed while execution.w
as window size (Can be obtained with function optimum_w
)k
as cluster size for Kmeans (Can be obtained with function optimum_k
)next_val
as Integer number. It states the number of predicted values to be obtained.# Considering `data_in` = iris[1], `w` = 3, `k` = 4 and `next_val` = 5
pred_for_w(iris[1], 3, 4, 5)
## [1] 7.082143 6.271795 5.623684 5.623684 5.623684
This function returns the predicted values and corresponding graph plot. The graph region with Red Color is the original data, whereas the region with Blue color is corresponding to the predicted values.
pred_for_w_plot (data_in, w, k, next_val)
This Function is similar to pred_for_w()
, except this function is able to plot the predicted values.
# Considering `data_in` = iris[1], `w` = 3, `k` = 4 and `next_val` = 5
pred_for_w_plot(iris[1], 3, 4, 5)
## $Predicted_Values
## [1] 7.082143 6.271795 5.623684 5.623684 5.623684
##
## $Plot
## NULL
optimum_w (data_in, next_val)
This Function calculates the optimum value of Window size w
to be used in prediction function such that the RMSE
and MAE
values should be minimum.
data_in
as Input data, in any format (data matrix data frame or vector). All variables should be numeric and NA values will get removed while execution.next_val
as Integer number. Such that last next_val
numbers will be used for test of prediction accuracy.# Considering `data_in` = iris[1] and `next_val` = 5
optimum_w(iris[1], 5)
## $Optimum_W
## [1] 1
##
## $RMSE_Values
## [1] 0.3059953
##
## $Prediction
## Original_Data W_size.1
## 1 6.7 6.461538
## 2 6.3 6.461538
## 3 6.5 6.461538
## 4 6.2 6.461538
## 5 5.9 6.461538
This Function returns the optimum value for w
, it’s corresponding RMSE
values, prediction values and plots. The predicted values in optimum_w()
function are different than those obtained in pred_for_w()
, since, optimum_w()
function removes last next_val
numbers of integers and predicts that much values and compare them with original data.
optimum_k (data_in)
This Function is to determine the Optimum value of Cluster size (K
) based on Average silhouette method.
data_in
as Input data, in any format (data matrix data frame or vector). All variables should be numeric and NA values will get removed while execution.# Considering `data_in` = iris[1]
optimum_k(iris[1])
## [1] 10
This Function returns K value, an integer as optimum number clusters.
AUTO_PSF (data_in, next_val)
This Function takes input data and autogenerate optimum Window size (W
) and cluster size (K
) and predicts the next_val
numbers of future values
data_in
as Input data, in any format (data matrix data frame or vector). All variables should be numeric and NA values will get removed while execution.next_val
as Integer number. It states the number of predicted values to be obtained.# Considering `data_in` = iris[1] and next_val = 3
AUTO_PSF(iris[1],3)
## $Predicted_Values
## [1] 5.546154 5.546154 5.772222
This function calculate suitable values of w
and k
such that mean erroe between original and predicted data should be minimum. And ultimately, forecasts the series with PSF algorithm methodology.
Martínez-Álvarez, F., Troncoso, A., Riquelme, J.C. and Ruiz, J.S.A., 2008, December. LBF: A labeled-based forecasting algorithm and its application to electricity price time series. In Data Mining, 2008. ICDM’08. Eighth IEEE International Conference on (pp. 453-461). IEEE.
Martinez Alvarez, F., Troncoso, A., Riquelme, J.C. and Aguilar Ruiz, J.S., 2011. Energy time series forecasting based on pattern sequence similarity. Knowledge and Data Engineering, IEEE Transactions on, 23(8), pp.1230-1243.