Home About Projects Blog CV Contact
Portfolio

Built with purpose.
Driven by data.

Featured ● Live App
🎗️
DetectIQ
Breast Cancer Early Detection Dashboard
99%
Survival (early)
26%
Survival (late)
Live
Deployed App
PythonScikit-learnStreamlitML ClassificationEDAHealthcare AI

Problem Statement

In the UK, one woman is diagnosed with breast cancer every 9 minutes. Survival rate is 99% when caught early but drops to 26% when caught late. The gap between early and late detection is often access to fast, reliable diagnostic support.

Solution

DetectIQ is a machine learning dashboard that classifies tumours as Malignant or Benign using biopsy cell nucleus features. It is not designed to replace doctors, but to support faster, more informed clinical decisions.

Key Contributions

  • Exploratory data analysis on the Wisconsin Breast Cancer Dataset
  • Feature engineering and selection of key cell nucleus measurements
  • Trained and evaluated multiple classification models (Logistic Regression, SVM, Random Forest)
  • Deployed as an interactive Streamlit web application
  • Endorsed by clinical officers and registered nurses on LinkedIn

Impact

The project has attracted attention from NHS-tagged organisations on LinkedIn and received endorsement from healthcare professionals including a Registered Nurse and a Clinical Officer who noted it "makes diagnosis faster."

🧠
MediPublish NLP
Medical Text Classification with BioMedBERT
82%
Accuracy
5
Disease Classes
2M+
Words Processed
NLPBioMedBERTPyTorchSHAPTransformersChi-Square

Problem Statement

MediPublish struggled to efficiently onboard and route medical publications to the correct clinical departments, with significant redundancy, class imbalance, and duplicate abstracts compounding the problem.

Solution

Fine-tuned Microsoft BioMedBERT — a transformer pretrained on biomedical text — to auto-classify medical abstracts across 5 disease departments: neoplasms, digestive, nervous system, cardiovascular, and general pathological conditions.

Technical Highlights

  • Chi-Square test (χ²=247.43, p=2.33e-52) to identify duplicate clustering
  • Removed 4,061 duplicate rows after statistical validation
  • Stratified 80/20 validation split to prevent overfitting
  • Weighted cross-entropy loss to handle class imbalance
  • SHAP explainability showing "osteosarcoma" as top classifier for neoplasms
  • Weighted model achieves better recall on minority classes

Result

The weighted model (80% accuracy) outperforms the standard model (82%) for minority class recall, making it better suited for routing niche publications to specialised departments reliably.

🌊
Subsea Corrosion Detection
Computer Vision for Underwater Infrastructure
Computer VisionDeep LearningCNNOpenCVNumPyPython

Problem Statement

Corrosion on metallic surfaces and underwater structures is a major safety and maintenance challenge. Manual inspection is costly, dangerous, and inconsistent.

Solution

A computer vision pipeline that classifies images as corrosion-positive or corrosion-negative using progressively advanced models.

Technical Approach

  • Images resized to 128×128 pixels and converted to NumPy arrays
  • Binary label assignment: 1 (corrosion) / 0 (no corrosion)
  • Progression from traditional ML to basic neural networks to advanced deep learning (CNN)
  • Image preprocessing, feature extraction, and flattening pipeline
  • Model evaluation using precision, recall, and F1-score
❤️
Cardiac Ultrasound Classification
Mitral Valve State Detection
Medical ImagingComputer VisionCNNDeep LearningScikit-learn

Problem Statement

Accurate and rapid assessment of cardiac valve state from ultrasound imaging is critical in clinical settings but requires expert interpretation.

Solution

Binary classification of cardiac ultrasound images to detect whether the mitral valve is open or closed, using a published clinical dataset from Cervantes-Guzmán et al. (2023).

Technical Approach

  • Images converted to grayscale, resized to 128×128 pixels
  • Progressive model complexity: traditional ML → neural networks → CNN
  • Normalisation and binarisation pipeline for cardiac imaging data
  • Model evaluation with clinically relevant metrics