The organized large-scale retail sector has been gradually establishing itself around the world, and has increased activities exponentially in the pandemic period. This modern sales system uses Data Mining technologies processing precious information to increase profit. In this direction, the extreme gradient boosting (XGBoost) algorithm was applied in an industrial project as a supervised learning algorithm to predict product sales including promotion condition and a multiparametric analysis. The implemented XGBoost model was trained and tested by the use of the Augmented Data (AD) technique in the event that the available data are not sufficient to achieve the desired accuracy, as for many practical cases of artificial intelligence data processing, where a large dataset is not available. The prediction was applied to a grid of segmented customers by allowing personalized services according to their purchasing behavior. The AD technique conferred a good accuracy if compared with results adopting the initial dataset with few records. An improvement of the prediction error, such as the Root Mean Square Error (RMSE) and Mean Square Error (MSE), which decreases by about an order of magnitude, was achieved. The AD technique formulated for large-scale retail sector also represents a good way to calibrate the training model.
Augmented Data and XGBoost Improvement for Sales Forecasting in the Large-Scale Retail Sector
Massaro A;
2021-01-01
Abstract
The organized large-scale retail sector has been gradually establishing itself around the world, and has increased activities exponentially in the pandemic period. This modern sales system uses Data Mining technologies processing precious information to increase profit. In this direction, the extreme gradient boosting (XGBoost) algorithm was applied in an industrial project as a supervised learning algorithm to predict product sales including promotion condition and a multiparametric analysis. The implemented XGBoost model was trained and tested by the use of the Augmented Data (AD) technique in the event that the available data are not sufficient to achieve the desired accuracy, as for many practical cases of artificial intelligence data processing, where a large dataset is not available. The prediction was applied to a grid of segmented customers by allowing personalized services according to their purchasing behavior. The AD technique conferred a good accuracy if compared with results adopting the initial dataset with few records. An improvement of the prediction error, such as the Root Mean Square Error (RMSE) and Mean Square Error (MSE), which decreases by about an order of magnitude, was achieved. The AD technique formulated for large-scale retail sector also represents a good way to calibrate the training model.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.