Inside Afya-Yangu AI: How MedGemma, RAG, and FAISS Turn Guidelines into Answers | Sapashe Insights

In the first blog, we talked about why offline clinical AI matters for Kenyan primary care. Now let’s talk about how Afya-Yangu AI actually works—without drowning in jargon.

At a high level, Afya-Yangu AI does something deceptively simple:

A clinician asks a question in plain language →
The system looks up relevant Kenyan guidelines →
It generates a short, structured answer with citations.

Under the hood, that workflow is powered by three core components: MedGemma, RAG, and FAISS.

Meet MedGemma: The Brain of the System

MedGemma is a specialised medical language model—a kind of AI that has been trained to understand medical text and generate coherent, clinically styled responses.

Think of it as:

Better at medical jargon than a general chatbot
More familiar with clinical patterns and terminology
Able to structure reasoning in a way that makes sense to clinicians

But MedGemma, by itself, is not allowed to improvise guidelines. We don’t want it inventing Kenyan protocols based on global averages. That’s where the next ingredient comes in.

What Is RAG and Why Does It Matter?

Retrieval-Augmented Generation (RAG) is a fancy name for a simple idea:

Don’t let the model answer from memory alone—
force it to look up the relevant documents first.

In Afya-Yangu AI, that means:

You ask a question.
The system searches the Kenyan guideline knowledge base for the most relevant sections.
Those sections are passed into MedGemma as context.
MedGemma then writes an answer that summarises and explains those sections for your specific case.

This approach matters because it:

Reduces hallucinations – the model is grounded in text we trust.
Makes answers auditable – we can see which guideline paragraph it used.
Keeps the system up-to-date – when guidelines change, we update the knowledge base, not the entire model.

RAG is the bridge between “a smart model” and “a clinically safe assistant tied to Kenyan policy.”

FAISS: The Search Engine Behind the Scenes

To make RAG work, we need a way to quickly find the most relevant guideline snippets for any clinician query.

That’s where FAISS comes in.

FAISS is a library that helps us build a vector database of guideline chunks. In simpler terms:

We break guidelines into small pieces (e.g. a paragraph on pneumonia management).
Each piece is converted into a numerical “fingerprint” (an embedding) that captures its meaning.
When you ask a question, your query is also converted into a fingerprint.
FAISS then finds the guideline chunks whose fingerprints are closest to your query.

The result?

A fast, efficient search that works even on modest hardware.
A set of highly relevant excerpts we can feed into MedGemma.

A Real Example: From Question to Answer

Let’s walk through a simplified example.

Clinician query:

“How should I manage severe pneumonia in a 3-year-old at a level 3 facility, and when do I refer?”

Step 1 – Retrieval (FAISS):

The query is embedded as a vector.
FAISS finds chunks like:
IMNCI section on classification of pneumonia

Treatment protocols for severe pneumonia at level 2/3

Referral criteria and oxygen saturation thresholds

Step 2 – Generation (MedGemma with RAG):

We pass those chunks into MedGemma with a system instruction like:

“Using only the text provided, summarise the recommended assessment, treatment, and referral criteria according to Kenyan guidelines. Cite your sources.”

MedGemma responds with something like:

Classification: “Severe pneumonia based on chest in-drawing and fast breathing.”
Immediate actions: oxygen, antibiotics, monitoring.
Referral criteria: specific danger signs and SpO₂ thresholds.
Citations: “Source: IMNCI Kenya, severe pneumonia section.”

Why This Architecture Is Safer

By design, Afya-Yangu AI:

Leans on documents, not guesswork.
Always has a trail back to source guidelines.
Can be improved incrementally as we refine the knowledge base and prompts.

If new Kenyan guidance comes out for, say, childhood pneumonia:

We update the guideline corpus.
Rebuild the FAISS index.
The system automatically starts using the updated recommendations—without retraining the entire MedGemma model.

In future blogs, we’ll look at how we compress all of this into a form that can run offline in a clinic—and the guardrails we’re building to keep it safe.