Loading…

LLMs for African Languages: What Actually Works

After a year of building Twi, Yoruba, and Swahili copilots for our enterprise clients, here is the honest field report on data, evaluation, and deployment.

Chiamaka Okonkwo

AI Engineering Lead

AI & Data

April 22, 2026 9 min read

AI & Data

LLMs for African Languages: What Actually Works

Every quarter, a new client asks us the same question: can we build a Twi-speaking customer support agent? Or a Yoruba document classifier? Or a Swahili medical triage assistant? The answer is almost always yes, but the path to yes is far less obvious than vendor demos would suggest. Off-the-shelf models, even the largest frontier systems, range from passable to embarrassing on African languages. The gap between English performance and Yoruba performance on the same task is rarely smaller than fifteen points on any benchmark we trust.

The data problem is real, but not what you think

The conventional wisdom is that low-resource languages lack data. That is partially true. What is more true is that the data that does exist is mostly the wrong kind. Bible translations, parliamentary proceedings, and news scrapes dominate the public corpora. None of that prepares a model for a customer asking about a failed mobile money transfer using a code-switched mix of Twi, English, and Pidgin in a single sentence. Real African conversation lives in WhatsApp threads, USSD logs, and call center transcripts. Almost none of it is in any pretraining corpus.

Our practice now is to start every engagement with a two-week data audit. We collect the client's actual conversational data, anonymize it, and use it to build an evaluation set before we write a single line of model code. The eval set is more valuable than the model. It outlives every model migration.

Fine-tuning versus retrieval versus prompting

For most enterprise use cases, the right answer is not what the AI Twitter discourse suggests. We have shipped production systems with all three approaches, and the pattern is clear. Prompting alone works for English-heavy tasks even on African contexts. Retrieval-augmented generation works when the knowledge is structured and the language is a thin wrapper. Fine-tuning is only worth the operational overhead when the linguistic gap is wide and the volume justifies the cost.

// Our default evaluation harness for a new language deployment.
const rubric = {
  intent_accuracy: 0.85,    // Did we understand the user?
  faithfulness: 0.90,       // Did we avoid hallucinating?
  code_switch_tolerance: 0.80, // Did mixed-language inputs survive?
  toxicity_rate: 0.001,     // Did we stay safe?
  latency_p95_ms: 1200,
};

async function gate(model: Model, evalSet: EvalCase[]) {
  const scores = await runEval(model, evalSet);
  return Object.entries(rubric).every(
    ([k, target]) => scores[k] >= target
  );
}

“The hardest part of African-language AI is not the model. It is admitting that the eval you inherited from English benchmarks does not measure anything that matters here.”
— Dr. Tunde Adebayo, External Reviewer

Deployment realities

Latency budgets in Lagos look different than latency budgets in Frankfurt. Plan for regional inference.
USSD remains the dominant channel for many users. Your LLM output must compress into 160-character chunks gracefully.
Voice is closing fast. Twi and Yoruba speech-to-text are within a year of being production-viable for support workflows.
Always ship with a human handoff path. The cost of a wrong answer in a fintech context is too high for full automation.

We are cautiously optimistic. The frontier is moving in our favor, and the data partnerships emerging across the continent will accelerate the next eighteen months. If you are evaluating a vendor or planning your own build, demand to see their evaluation set. If they cannot show you one, they do not have a product. They have a demo.

#llm#nlp#twi#yoruba#swahili#evaluation

Written by

Chiamaka Okonkwo

AI Engineering Lead

Chiamaka leads applied AI at Spalce, focused on multilingual systems for telcos, banks, and public sector clients across sub-Saharan Africa.

Keep reading

All posts

Fintech

Scaling Mobile Money Rails in West Africa

What we learned shipping a payment switch that handled 4M transactions on day one, and the architectural choices that kept us upright when peak hour hit.

May 14, 2026·8 min read

Engineering

Building HIPAA-Grade Telemedicine Across Borders

How we architected a multi-jurisdiction telehealth platform that satisfies HIPAA, GDPR, and Ghana's Data Protection Act without bolting compliance on at the end.

March 30, 2026·10 min read

Cloud

Cloud Cost Optimization for Early-Stage Startups

A no-nonsense guide to cutting your AWS or Azure bill by half without slowing the team down, drawn from a dozen advisory engagements.

February 18, 2026·7 min read

Want our team's eyes on your project?

We work with founders, government teams, and enterprises across Africa and the world. If you are wrestling with something hard, we would like to hear about it.

LLMs for African Languages: What Actually Works

After a year of building Twi, Yoruba, and Swahili copilots for our enterprise clients, here is the honest field report on data, evaluation, and deployment.

Chiamaka Okonkwo

AI Engineering Lead

AI & Data

April 22, 2026 9 min read

AI & Data

LLMs for African Languages: What Actually Works

The data problem is real, but not what you think

Fine-tuning versus retrieval versus prompting

// Our default evaluation harness for a new language deployment.
const rubric = {
  intent_accuracy: 0.85,    // Did we understand the user?
  faithfulness: 0.90,       // Did we avoid hallucinating?
  code_switch_tolerance: 0.80, // Did mixed-language inputs survive?
  toxicity_rate: 0.001,     // Did we stay safe?
  latency_p95_ms: 1200,
};

async function gate(model: Model, evalSet: EvalCase[]) {
  const scores = await runEval(model, evalSet);
  return Object.entries(rubric).every(
    ([k, target]) => scores[k] >= target
  );
}

“The hardest part of African-language AI is not the model. It is admitting that the eval you inherited from English benchmarks does not measure anything that matters here.”
— Dr. Tunde Adebayo, External Reviewer

Deployment realities

Latency budgets in Lagos look different than latency budgets in Frankfurt. Plan for regional inference.
USSD remains the dominant channel for many users. Your LLM output must compress into 160-character chunks gracefully.
Voice is closing fast. Twi and Yoruba speech-to-text are within a year of being production-viable for support workflows.
Always ship with a human handoff path. The cost of a wrong answer in a fintech context is too high for full automation.

#llm#nlp#twi#yoruba#swahili#evaluation

Written by

Chiamaka Okonkwo

AI Engineering Lead

Chiamaka leads applied AI at Spalce, focused on multilingual systems for telcos, banks, and public sector clients across sub-Saharan Africa.

Keep reading

All posts

Fintech

Scaling Mobile Money Rails in West Africa

What we learned shipping a payment switch that handled 4M transactions on day one, and the architectural choices that kept us upright when peak hour hit.

May 14, 2026·8 min read

Engineering

Building HIPAA-Grade Telemedicine Across Borders

How we architected a multi-jurisdiction telehealth platform that satisfies HIPAA, GDPR, and Ghana's Data Protection Act without bolting compliance on at the end.

March 30, 2026·10 min read

Cloud

Cloud Cost Optimization for Early-Stage Startups

A no-nonsense guide to cutting your AWS or Azure bill by half without slowing the team down, drawn from a dozen advisory engagements.

February 18, 2026·7 min read

Want our team's eyes on your project?

We work with founders, government teams, and enterprises across Africa and the world. If you are wrestling with something hard, we would like to hear about it.

LLMs for African Languages: What Actually Works

The data problem is real, but not what you think

Fine-tuning versus retrieval versus prompting

Deployment realities

Chiamaka Okonkwo

Related posts

Scaling Mobile Money Rails in West Africa

Building HIPAA-Grade Telemedicine Across Borders

Cloud Cost Optimization for Early-Stage Startups

Want our team's eyes on your project?

LLMs for African Languages: What Actually Works

The data problem is real, but not what you think

Fine-tuning versus retrieval versus prompting

Deployment realities

Chiamaka Okonkwo

Related posts

Scaling Mobile Money Rails in West Africa

Building HIPAA-Grade Telemedicine Across Borders

Cloud Cost Optimization for Early-Stage Startups

Want our team's eyes on your project?