By applying RapidMiner workflows has been processed a dataset originated from different data files, and containing information about the sales over three years of a large chain of retail stores. Subsequently, has been constructed a Deep Learning model performing a predictive algorithm suitable for sales forecasting. This model is based on artificial neural network –ANN- algorithm able to learn the model starting from sales historical data and by pre-processing the data. The best built model uses a multilayer neural network together with an “optimized operator” able to find automatically the best parameter setting of the implemented algorithm. In order to prove the best performing predictive model, other machine learning algorithms have been tested. The performance comparison has been performed between Support Vector Machine –SVM-, k-Nearest Neighbor k NN-,Gradient Boosted Trees, Decision Trees, and Deep Learning algorithms. The comparison of the degree of correlation between real and predicted values, the average absolute error and the relative average error proved that ANN exhibited the best performance. The Gradient Boosted Trees approach represents an alternative approach having the second best performance. The case of study has been developed within the framework of an industry project oriented on the integration of high performance data mining models able to predict sales using–ERP- and customer relationship management –CRM- tools.

Data Mining Model Performance of Sales Predictive Algorithms Based on RapiMiner Workflow

Massaro A;
2018-01-01

Abstract

By applying RapidMiner workflows has been processed a dataset originated from different data files, and containing information about the sales over three years of a large chain of retail stores. Subsequently, has been constructed a Deep Learning model performing a predictive algorithm suitable for sales forecasting. This model is based on artificial neural network –ANN- algorithm able to learn the model starting from sales historical data and by pre-processing the data. The best built model uses a multilayer neural network together with an “optimized operator” able to find automatically the best parameter setting of the implemented algorithm. In order to prove the best performing predictive model, other machine learning algorithms have been tested. The performance comparison has been performed between Support Vector Machine –SVM-, k-Nearest Neighbor k NN-,Gradient Boosted Trees, Decision Trees, and Deep Learning algorithms. The comparison of the degree of correlation between real and predicted values, the average absolute error and the relative average error proved that ANN exhibited the best performance. The Gradient Boosted Trees approach represents an alternative approach having the second best performance. The case of study has been developed within the framework of an industry project oriented on the integration of high performance data mining models able to predict sales using–ERP- and customer relationship management –CRM- tools.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12572/18149
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact