AI & Data • 60 min • Jan 29, 2026
Three production LLM deployments — at a tier-1 bank, a public-sector agency, and an energy utility — and the patterns that worked across all of them.
Retrieval-augmented generation is the safe entry point. Pure fine-tuning is rarely the right first move for an enterprise deployment.
Hallucination handling has to be product, not just engineering. Build the UX around the assumption that the model will sometimes be wrong.
Vendor lock-in is a real risk. Architect for model portability from day one — the model that performs best today may not be the cheapest in eighteen months.
Auto-generated and lightly edited. Let us know about errors.
Dr Akua Sarpong: Welcome. Today we're going to talk about three production LLM deployments we've done in the last twelve months — at a tier-1 Nigerian bank, a West African public-sector agency, and a South African energy utility. The use cases are different. The patterns are surprisingly similar. Adaobi Eze: The first lesson — start with RAG, not fine-tuning. We've been asked many times whether we recommend fine-tuning a base model on the customer's data. The honest answer is — almost never, for a first deployment. RAG is faster to build, easier to debug, and gives you a clearer audit trail because you can show the customer which source documents the model used. Dr Akua Sarpong: And the audit trail matters a lot in regulated industries. When a banker asks the assistant a compliance question and gets an answer, the bank needs to be able to explain which policy document that answer came from. RAG makes that trivial. Fine-tuning makes it impossible. Adaobi Eze: The bank deployment was a customer-service assistant for internal staff. The model is a frontier model called through an API, with retrieval over the bank's internal policy library — about 4,000 documents. The query goes through a retrieval layer that ranks the top ten relevant documents, then the model generates a response grounded in those. Dr Akua Sarpong: The second lesson — hallucinations are inevitable. The model will sometimes generate content that sounds confident but is wrong. The engineering question is: how do you reduce the rate? The product question is: how do you handle it when it happens? Adaobi Eze: We use three engineering techniques. One: structured retrieval, so the model has to ground its answer in source documents. Two: confidence calibration, so we show the user how certain the model is. Three: explicit decline patterns — we train the model to say I don't know when retrieval doesn't return strong matches. Dr Akua Sarpong: And on the product side, we always show the source documents alongside the answer. The user can click through, verify, and challenge. We've found that users develop a healthy skepticism quickly when they have that affordance. They check the source. They don't take the model on faith. Adaobi Eze: The third lesson is about vendor lock-in. The LLM market is moving fast. The model that's best today may not be the cheapest in eighteen months. Your architecture should let you swap the model provider without rewriting the application. Dr Akua Sarpong: We use a model abstraction layer. The application talks to that layer with a stable interface. Behind it, we can route to any of the major model providers, or to an open-source model we host ourselves. For one customer we've already swapped providers once based on cost-per-token changes, with no application code changes. Adaobi Eze: The energy utility deployment is interesting because it's not a chat use case. It's an extraction use case. The utility receives thousands of supplier contracts in PDF. The model reads them and extracts key terms — pricing, term length, renewal clauses, dispute mechanisms. The output goes into a structured database that humans review. Dr Akua Sarpong: That's a sweet spot for current LLM capability. The model isn't generating advice. It's structuring unstructured content. The human is still in the loop. The productivity gain has been substantial — about 40 hours per week of contract review work compressed to about 8. Adaobi Eze: One last point. Don't deploy without an evaluation dataset. Every customer engagement starts with us building a ground-truth set of at least 500 queries with known correct answers. We run the model against that set on every change. Without it, you're flying blind.
We run custom 60-minute briefings for enterprise customers. Topics tailored to your engagement.
Request a private briefing