Projects — Isaac Kuria

Featured ● Live App

🎗️

DetectIQ

Breast Cancer Early Detection Dashboard

99%

Survival (early)

26%

Survival (late)

Live

Deployed App

PythonScikit-learnStreamlitML ClassificationEDAHealthcare AI

View Live App →

Problem Statement

In the UK, one woman is diagnosed with breast cancer every 9 minutes. Survival rate is 99% when caught early but drops to 26% when caught late. The gap between early and late detection is often access to fast, reliable diagnostic support.

Solution

DetectIQ is a machine learning dashboard that classifies tumours as Malignant or Benign using biopsy cell nucleus features. It is not designed to replace doctors, but to support faster, more informed clinical decisions.

Key Contributions

Exploratory data analysis on the Wisconsin Breast Cancer Dataset
Feature engineering and selection of key cell nucleus measurements
Trained and evaluated multiple classification models (Logistic Regression, SVM, Random Forest)
Deployed as an interactive Streamlit web application
Endorsed by clinical officers and registered nurses on LinkedIn

Impact

The project has attracted attention from NHS-tagged organisations on LinkedIn and received endorsement from healthcare professionals including a Registered Nurse and a Clinical Officer who noted it "makes diagnosis faster."

🧠

MediPublish NLP

Medical Text Classification with BioMedBERT

82%

Accuracy

5

Disease Classes

2M+

Words Processed

NLPBioMedBERTPyTorchSHAPTransformersChi-Square

Problem Statement

MediPublish struggled to efficiently onboard and route medical publications to the correct clinical departments, with significant redundancy, class imbalance, and duplicate abstracts compounding the problem.

Solution

Fine-tuned Microsoft BioMedBERT — a transformer pretrained on biomedical text — to auto-classify medical abstracts across 5 disease departments: neoplasms, digestive, nervous system, cardiovascular, and general pathological conditions.

Technical Highlights

Chi-Square test (χ²=247.43, p=2.33e-52) to identify duplicate clustering
Removed 4,061 duplicate rows after statistical validation
Stratified 80/20 validation split to prevent overfitting
Weighted cross-entropy loss to handle class imbalance
SHAP explainability showing "osteosarcoma" as top classifier for neoplasms
Weighted model achieves better recall on minority classes

Result

The weighted model (80% accuracy) outperforms the standard model (82%) for minority class recall, making it better suited for routing niche publications to specialised departments reliably.

🌊

Subsea Corrosion Detection

Computer Vision for Underwater Infrastructure

Computer VisionDeep LearningCNNOpenCVNumPyPython

Problem Statement

Corrosion on metallic surfaces and underwater structures is a major safety and maintenance challenge. Manual inspection is costly, dangerous, and inconsistent.

Solution

A computer vision pipeline that classifies images as corrosion-positive or corrosion-negative using progressively advanced models.

Technical Approach

Images resized to 128×128 pixels and converted to NumPy arrays
Binary label assignment: 1 (corrosion) / 0 (no corrosion)
Progression from traditional ML to basic neural networks to advanced deep learning (CNN)
Image preprocessing, feature extraction, and flattening pipeline
Model evaluation using precision, recall, and F1-score

❤️

Cardiac Ultrasound Classification

Mitral Valve State Detection

Medical ImagingComputer VisionCNNDeep LearningScikit-learn

Problem Statement

Accurate and rapid assessment of cardiac valve state from ultrasound imaging is critical in clinical settings but requires expert interpretation.

Solution

Binary classification of cardiac ultrasound images to detect whether the mitral valve is open or closed, using a published clinical dataset from Cervantes-Guzmán et al. (2023).

Technical Approach

Images converted to grayscale, resized to 128×128 pixels
Progressive model complexity: traditional ML → neural networks → CNN
Normalisation and binarisation pipeline for cardiac imaging data
Model evaluation with clinically relevant metrics

Built with purpose.Driven by data.

Problem Statement

Solution

Key Contributions

Impact

Problem Statement

Solution

Technical Highlights

Result

Problem Statement

Solution

Technical Approach

Problem Statement

Solution

Technical Approach

Built with purpose.
Driven by data.