To teach Kifua AI how to “see” on chest X-rays, we need a lot of examples—far more than any single hospital can provide at the start.
That’s why we begin with a large public dataset called CheXpert.
What Is CheXpert?
CheXpert is a large dataset of chest X-rays labelled for a range of common findings, such as:
- Consolidation
- Effusion
- Cardiomegaly
- Atelectasis
- Infiltrates
- And more
For Kifua AI, CheXpert is:
- A starting point to train models to recognise patterns in lung images.
- A way to leverage millions of pixel-level examples before we ever see a single local CXR.
- The foundation on which we’ll later build local adaptation for Kenyan settings.
Why Start with a Large Public Dataset?
If we tried to train Kifua AI from scratch using only local data, we would hit several challenges:
- We might not have enough labelled images for every pathology.
- Label quality might vary.
- Training from scratch would be slow and data-hungry.
CheXpert gives us a strong baseline:
- Models learn generic features: edges, textures, anatomical regions.
- They become familiar with a broad range of pathologies.
- We can then fine-tune or recalibrate on local data to account for differences in population and imaging practices.
Multiple Models, Not Just One: The Power of Ensembles
Kifua AI doesn’t rely on a single model. Instead, we train an ensemble of architectures, including:
- EfficientNet – optimised for high performance with fewer parameters.
- ResNet – a reliable, well-understood baseline in medical imaging.
- Three additional models (e.g. DenseNet, ConvNeXt, and a lightweight model) to add diversity.
Why?
Because different architectures:
- Detect different kinds of patterns.
- Have different strengths and weaknesses.
- May fail in different ways on edge cases.
By combining them, we hope to get:
- More robust predictions across a wide range of images.
- Fewer cases where the system is confidently wrong.
Ensembling can be as simple as averaging probabilities or as sophisticated as using another model (a meta-learner) to learn how best to combine the outputs.
What Metrics Do We Care About?
In a triage setting, we’re especially interested in:
- Sensitivity for serious abnormalities:
- We’d rather over-call some cases than miss a large effusion or consolidation.
- Specificity sufficient to avoid overwhelming clinicians with false alarms.
- Calibration:
- When Kifua AI says “90% probability of abnormality,” that should roughly correspond to reality.
We will still evaluate standard metrics (AUROC, F1 scores, etc.), but the central question remains:
“Does this help clinicians pick out the worrying CXRs earlier and more reliably?”
From CheXpert to Kenya
CheXpert is the beginning, not the end.
The roadmap looks like this:
- Train and validate our ensemble on CheXpert.
- Freeze or partially freeze the models.
- Adapt them to local data as we obtain de-identified CXRs and radiologist labels from Kenyan sites.
- Recalibrate thresholds and triage rules for local disease patterns and workflows.
That final step—local validation and adaptation—is crucial, and we’ll dig into it more in Blog 4. For now, the key idea is:
CheXpert teaches Kifua AI to see lungs.
Kenyan data will teach it to see our lungs.

