Bringing Afya-Yangu AI to the Edge: Designing for Patchy Networks and Low-Power Devices

    Bringing Afya-Yangu AI to the Edge: Designing for Patchy Networks and Low-Power Devices

    By Fred MutisyaJuly 1, 2025
    AIGPUMedGemmaFAISS

    If you build a beautiful AI system that only works on a high-speed fibre connection and a big GPU, you haven’t built for Kenyan primary care.

    Designing Afya-Yangu AI means designing for the edge: low-power devices, intermittent connectivity, and busy clinics where every second counts.

    The Constraints We Have to Respect

    In many level 2 and 3 facilities, the technical reality looks like this:

    • Power cuts are common; backup options are limited.
    • Connectivity is via 3G/4G bundles or shared Wi-Fi, not dedicated fibre.
    • Available hardware may be:
    • A modest desktop with 4–8 GB RAM

    • A low-end server donated years ago

    • A rugged tablet shared across several rooms

    If our AI assistant depends on a large cloud model and stable internet, clinicians will use it once, get frustrated, and never open it again.

    So we made two key design decisions:

    1. Use a small but capable model (MedGemma-based SLM).
    2. Perform retrieval and inference locally using FAISS and on-device compute.

    Why Small Models Matter

    Large language models are powerful, but they come with trade-offs:

    • They need more compute → slower responses on small machines.
    • They’re harder to run offline → more dependence on cloud.

    A small model, carefully chosen and fine-tuned, gives us:

    • Lower latency – answers in seconds, not minutes.
    • Feasibility on local hardware – no expensive GPUs required.
    • Better control – easier to package, ship, and update.

    We’re not chasing flashy benchmarks. We’re optimising for “Does this work reliably in a busy clinic on Tuesday morning?”

    FAISS for Fast Local Search

    FAISS helps us store our guideline knowledge base in a way that’s:

    • Compact enough for local disks.
    • Fast enough for real-time search.

    Because we only retrieve a handful of relevant chunks for each query, we keep memory and compute usage low—which is exactly what we need on the edge.

    Trade-offs: Latency vs Accuracy, Size vs Coverage

    Every design choice involves compromise. Some of the trade-offs we’re navigating:

    • Smaller vs larger model:
    • Smaller → faster, cheaper, easier to deploy.
    • Larger → potentially more nuanced language understanding.
      Our bias is towards “small enough to run everywhere, good enough to be safe and useful.”
    • Latency vs complexity:
    • More retrieval steps and checks could improve answer quality.

    • But each extra step adds time.
      We aim for answers in <5 seconds under normal loads.
    • On-device vs cloud hybrid:
    • Full offline mode is essential for many sites.
    • But when connectivity exists, we might allow optional cloud enhancements (e.g. syncing logs, model updates).

    Possible Edge Architectures

    We’re exploring a few deployment patterns:

    1. Local Server in the Facility
    • A small box in the records or IT room.
    • Multiple devices on the local network can connect to it via web interface.
    1. Rugged Tablet or “Clinic Box”
    • All-in-one device with the model, FAISS index, and UI.
    • Ideal for facilities without any other computer infrastructure.
    1. Hybrid Mode
    • Primary inference on-device.
    • Occasional sync with cloud for updates, analytics, and backup.

    The goal is to avoid a brittle system that dies when the internet drops. Afya-Yangu AI should feel like part of the clinic, not a remote service.

    Keeping the System Up-to-Date

    Offline doesn’t mean frozen.

    We’re designing an update pathway where:

    • New or revised guidelines are packaged into update bundles.
    • These can be:
    • Downloaded when connectivity is available, or
    • Physically distributed on USB drives, if needed.
    • The local system:
    • Updates the guideline corpus.
    • Rebuilds the FAISS index.
    • Logs what changed, so we can trace behaviour.

    That way, clinicians get the benefits of offline reliability and can stay aligned with evolving national guidance.

    Afya-Yangu AI at the edge is a work in progress—but the principle is clear:

    If it can’t run where patients are seen, it doesn’t count as “real” clinical AI.