BLEVE Project: Predicting Explosion Pressure with Machine Learning
Notes from the COMP3010 Machine Learning assignment.
Purpose and Background
The goal of this assignment was to develop a machine learning model capable of predicting peak overpressure generated by Boiling Liquid Expanding Vapour Explosions (BLEVEs). These explosions pose significant safety risks during the transport of liquefied petroleum gas (LPG), particularly in urban environments. Traditional modeling approaches struggle to handle the complex physics involved, making this a fitting challenge for data-driven methods.
I found it fascinating that something as practical and dangerous as BLEVE prediction could be approached with tools from our COMP3010 lectures and labs. The project provided a unique opportunity to apply machine learning to a real-world safety-critical application.
Thought Process and Approach
I started by deeply analyzing the provided dataset (train.csv and test.csv). Key steps included:
- Data Cleaning: I removed rows with missing values, fixed inconsistent categorical labels (e.g., “Saperheated” → “Superheated”), and eliminated duplicates. This resulted in a clean dataset of 9,890 rows.
- Feature Selection: I initially used correlation heatmaps to understand which features most impacted the target pressure.
- Feature Engineering: I created new variables such as
Tank Volume,Sensor Distance, andTankWidthToLengthRatio. While these added physical intuition, I found that too many engineered features often hurt performance. In the end, the best-performing model used mostly raw features with minor refinement.
Models and Evaluation
To ensure diversity, I explored three fundamentally different machine learning models:
- Linear Regression: Used as a baseline. It underperformed due to the non-linear nature of BLEVE dynamics and skewed target distribution.
- XGBoost Regressor: Selected for its robustness with structured/tabular data. I tuned key hyperparameters like
max_depth,learning_rate, andsubsample, achieving good generalization performance. - Artificial Neural Network (ANN): This model ultimately yielded the best results. It captured complex nonlinear relationships effectively, especially after proper preprocessing and scaling.
Hyperparameter Tuning
- XGBoost: Manually tuned using 5-fold cross-validation.
- ANN: Tuned using Optuna. Parameters such as hidden layer size, activation functions, learning rate, and dropout were optimized.
Metrics
I evaluated models using:
- Mean Absolute Percentage Error (MAPE) – used for Kaggle leaderboard scoring.
- R² Score – provided insight into explained variance.
- MAE (Mean Absolute Error) – used during training as a quick progress check.
The final ANN model achieved a competitive MAPE score on the private Kaggle leaderboard.
Key Learnings
- Feature engineering is powerful, but sometimes less is more. Minimal, meaningful transformations often perform better.
- Data preprocessing—especially label cleaning and normalization—had the most significant impact on results.
- Tools like Optuna and Scikit-learn made hyperparameter optimization and model comparison much easier and more systematic.
- Neural networks require more training time but can outperform simpler models in complex, non-linear domains.
Reflection
In conclusion, this assignment was rewarding and enjoyable, allowing me to apply what I’ve learned from lectures and labs in a practical context.
I focused mainly on developing a neural network model, expecting that feature engineering, such as creating custom features like Tank Volume, would boost performance.
However, I quickly learned that these efforts didn’t always yield better results.
Instead, I found that data cleaning and preprocessing had the biggest impact on model accuracy. Fixing inconsistent labels, handling missing values, and removing duplicates were critical steps.
Most modern modelling and hyperparameter tuning tasks are now heavily supported by pre-built Python libraries, which help us to reduce development time.
This made it easier to iterate and test different architectures and settings without starting from scratch. The two BLEVE-related research papers also helped shape my understanding of the domain and guided my journey.
The ideas of using standard scaling and Optuna for tuning were inspired by these.
If I were to approach this again, I would explore model ensembling and apply stronger regularisation techniques to improve generalisation and further reduce Kaggle MAPE scores. Overall, I invested a lot of time into this assignment, but it was truly a worthwhile learning experience.
Leave a comment