Data wrangling framework on clickstream for enhancing seat sales prediction

Revenue Management is one of the essential functions in every airline business, and the seat (ticket) is the main product of an airline. The purpose of revenue management is to maximize the revenue of each airline routes based on demand. This demand defined by seat sales depends on the factors such...

Full description

Saved in:
Bibliographic Details
Main Author: Alauddin, Md
Format: Thesis
Published: 2021
Subjects:
Tags: Add Tag
No Tags, Be the first to tag this record!
id my-mmu-ep.11100
record_format uketd_dc
spelling my-mmu-ep.111002023-04-17T07:03:16Z Data wrangling framework on clickstream for enhancing seat sales prediction 2021-05 Alauddin, Md QA76.75-76.765 Computer software Revenue Management is one of the essential functions in every airline business, and the seat (ticket) is the main product of an airline. The purpose of revenue management is to maximize the revenue of each airline routes based on demand. This demand defined by seat sales depends on the factors such as historical transaction data, seasonality, ticket pricing based on advanced purchase trends, competitors pricing, and customer behaviour. Prediction of passenger seat sales helps to estimate revenue on future flights and allows the airline to generate optimal prices for the corresponding flights. Current prediction models use structured transactional and operational data to predict airline seat sales or passenger demand. As the airlines are undergoing a digital transformation in the past two decades, large volumes of user activity data are becoming available to airline companies. In this study, the efficacy of a third data source, namely, digital clickstream data, in providing improved airline seat sales prediction is presented. The digital data has been thus far ignored in most research works due to the lack of proper extraction and processing pipeline of this massive volume of available but unstructured data. This study developed a suitable ETL framework for data wrangling and identified 191 features from transactional, operational and digital data to create the analytical dataset. The wrapper-based Boruta algorithm was chosen through experimentation that selects 22 features as input to the prediction models (10, 10 and 2 features from transactional, digital and operational data sources, respectively). Ten models, namely, 1) Linear regression, 2) Support Vector Machine (SVM), 3) Generalized Linear Model (GLM), 4) CART, 5) GBRT, 6) Random Forest (RF), 7) Histogram Gradient Boosting Regressor, 8) Extreme Gradient Boosting Regressor (XGBRegressor), 9) Light GBM Regressor (LGBMRegressor), and 10) Category Boosting Regressor (CatBoostRegressor) have been studied and experimented on the analytical dataset. With hyperparameter tuning, the CatBoostRegressor, LGBMRegressor, and XGBRegressor tree-based models were found to be most effective in predicting airline sales 30 and 60 days prior to departure, with 91-94% accuracy. The contribution of including digital data sources can be observed as a 2-6% improvement of MAPE compared to that without digital data. 2021-05 Thesis http://shdl.mmu.edu.my/11100/ http://erep.mmu.edu.my/ masters Multimedia University Faculty of Computing and Informatics EREP ID: 10277
institution Multimedia University
collection MMU Institutional Repository
topic QA76.75-76.765 Computer software
spellingShingle QA76.75-76.765 Computer software
Alauddin, Md
Data wrangling framework on clickstream for enhancing seat sales prediction
description Revenue Management is one of the essential functions in every airline business, and the seat (ticket) is the main product of an airline. The purpose of revenue management is to maximize the revenue of each airline routes based on demand. This demand defined by seat sales depends on the factors such as historical transaction data, seasonality, ticket pricing based on advanced purchase trends, competitors pricing, and customer behaviour. Prediction of passenger seat sales helps to estimate revenue on future flights and allows the airline to generate optimal prices for the corresponding flights. Current prediction models use structured transactional and operational data to predict airline seat sales or passenger demand. As the airlines are undergoing a digital transformation in the past two decades, large volumes of user activity data are becoming available to airline companies. In this study, the efficacy of a third data source, namely, digital clickstream data, in providing improved airline seat sales prediction is presented. The digital data has been thus far ignored in most research works due to the lack of proper extraction and processing pipeline of this massive volume of available but unstructured data. This study developed a suitable ETL framework for data wrangling and identified 191 features from transactional, operational and digital data to create the analytical dataset. The wrapper-based Boruta algorithm was chosen through experimentation that selects 22 features as input to the prediction models (10, 10 and 2 features from transactional, digital and operational data sources, respectively). Ten models, namely, 1) Linear regression, 2) Support Vector Machine (SVM), 3) Generalized Linear Model (GLM), 4) CART, 5) GBRT, 6) Random Forest (RF), 7) Histogram Gradient Boosting Regressor, 8) Extreme Gradient Boosting Regressor (XGBRegressor), 9) Light GBM Regressor (LGBMRegressor), and 10) Category Boosting Regressor (CatBoostRegressor) have been studied and experimented on the analytical dataset. With hyperparameter tuning, the CatBoostRegressor, LGBMRegressor, and XGBRegressor tree-based models were found to be most effective in predicting airline sales 30 and 60 days prior to departure, with 91-94% accuracy. The contribution of including digital data sources can be observed as a 2-6% improvement of MAPE compared to that without digital data.
format Thesis
qualification_level Master's degree
author Alauddin, Md
author_facet Alauddin, Md
author_sort Alauddin, Md
title Data wrangling framework on clickstream for enhancing seat sales prediction
title_short Data wrangling framework on clickstream for enhancing seat sales prediction
title_full Data wrangling framework on clickstream for enhancing seat sales prediction
title_fullStr Data wrangling framework on clickstream for enhancing seat sales prediction
title_full_unstemmed Data wrangling framework on clickstream for enhancing seat sales prediction
title_sort data wrangling framework on clickstream for enhancing seat sales prediction
granting_institution Multimedia University
granting_department Faculty of Computing and Informatics
publishDate 2021
_version_ 1776101395246612480