Engineering practices
Day-to-day mechanics of how Spalce squads write, review, and ship code. These are the defaults we apply on day one of every engagement — we will only deviate when a customer constraint genuinely demands it, and we will document the deviation.
Code review
Every change reaches main through a pull request reviewed by at least one engineer who did not author it, and a second reviewer for changes touching auth, payments, data migrations, or infrastructure. Reviews are scoped — we expect PRs under 400 lines of diff, with anything larger split or paired with a written walkthrough. We use GitHub CODEOWNERS to route changes automatically, with review SLAs of 4 working hours for blockers and one business day for everything else. Comments are labelled (nit, question, blocker) so the author knows what is required versus optional.
- PRs link to a Linear ticket and reference the ADR that frames the decision, when one exists.
- We require green CI, a passing preview deploy, and an updated changelog entry before merge.
- Authors squash-merge with a Conventional Commits message that becomes the changelog line.
- Security-sensitive changes (auth, crypto, IAM, secret handling) require a second reviewer drawn from the security guild.
- Stale PRs older than seven days get a triage ping; if the author cannot land it, we close and re-open with a fresh plan.
Pairing and mob programming
We pair on hard problems by default — new joiners spend their first two weeks pairing exclusively, and senior engineers pair when a problem crosses systems or touches a hot path. Mob programming (three or more engineers on one screen) is reserved for incident debriefs, gnarly migrations, and onboarding ceremonies. Pairing sessions are time-boxed to 90 minutes with a 15-minute break, and we rotate the driver every 25 minutes. We do not pair on routine work — async PR review is faster and we respect each other's time.
Trunk-based development
We work on main. Short-lived feature branches (under 48 hours) merge directly; longer pieces of work hide behind feature flags managed through LaunchDarkly or our open-source fallback. We do not use long-running develop or release branches — they hide integration pain and reward big-bang merges. Releases are continuous to staging on every merge to main, and to production on a per-team cadence (usually daily) with a manual gate for regulated workloads. Hotfixes follow the same path: branch from main, fix, flag if risky, merge, deploy.
Test discipline
We follow the testing trophy, not the testing pyramid — most of our investment is in integration tests that exercise real adapters with containerised dependencies. Unit tests guard pure logic and tricky algorithms. Browser tests use Playwright against ephemeral preview environments, and load tests use k6 against the staging cluster on a nightly schedule. Coverage is a heuristic, not a gate — we look at mutation testing scores (Stryker) on critical modules to know whether the tests actually mean anything.
- Every bug fix lands with a failing test that reproduces it before the fix is written.
- Flaky tests are quarantined within 24 hours of detection and fixed or deleted within the sprint.
- Test data is generated, not committed — we use builders and factories, never fixtures coupled to a snapshot.
- Contract tests (Pact) sit between every service boundary we own.
- Accessibility tests (axe-core) run on every preview deploy; a11y regressions block merge.
Architecture decisions (ADRs)
Architecture Decision Records are how we leave a paper trail your team can read in two years when the original engineers are gone. ADRs live in the customer repo under docs/adr/ and are written by the engineer making the decision, reviewed by a Principal Engineer, and discussed in the monthly architecture review.
We write ADRs in plain markdown, numbered sequentially, never deleted — superseded ADRs are marked with a status update and a link to the replacement. The goal is not bureaucracy. The goal is that any engineer who joins the codebase can read a chronological list of ADRs and understand why the system looks the way it does, what alternatives were considered, and which tradeoffs were accepted. ADRs sit alongside the code so they version with it and travel with the customer at handover.
When to write an ADR
- Choosing or replacing a database, queue, cache, or messaging system.
- Adopting or dropping a framework, runtime, or programming language for a service.
- Designing an authentication, authorization, or tenancy model.
- Picking a deployment topology (single region, multi-region, edge, hybrid).
- Introducing a new service boundary or merging two existing services.
- Selecting a vendor for a load-bearing capability (identity, payments, observability).
- Accepting a tradeoff that future engineers might disagree with.
ADR template
# ADR 001: Title
## Status
Accepted | Superseded by ADR-012
## Context
What is the problem, what forces are at play, what constraints
apply (regulatory, commercial, technical), and what evidence
shapes the decision.
## Decision
The choice we made, in plain language. Active voice.
## Alternatives considered
Each alternative with one paragraph on why it was rejected.
## Consequences
- Good: what this unlocks.
- Bad: what we are paying for this decision.
- Neutral: what changes operationally but is neither a win nor loss.
## Review trigger
When we will revisit this decision (metric, date, or event).Common patterns we adopt
- Hexagonal architecture with adapters at the edges and a pure domain core, so swapping infrastructure is a one-file change.
- Outbox pattern for reliable event publication when a database write must trigger a downstream side effect.
- Idempotent APIs with client-provided keys for any endpoint that triggers money movement, notifications, or state changes.
- Database-per-service for production services we own end to end; shared schemas are flagged as technical debt.
- Event-carried state transfer when consumers need to reduce their dependency on synchronous calls into the producer.
Delivery cadence
We work in two-week sprints with a fixed weekly rhythm of rituals. Cadence is deliberate — every meeting has a defined outcome, and if a meeting cannot produce that outcome, it gets cancelled. We treat your calendar as a finite resource.
Sprint shape
Sprints start on Monday and end on Friday of the following week. We commit to a sprint goal (one sentence, written on the planning board), then a set of stories sized in t-shirt complexity (S, M, L, XL). We do not estimate in hours or days — they are wrong, and they invite the wrong arguments. We aim to finish 80 to 90 percent of committed scope; consistently hitting 100 percent means we are sandbagging, and consistently missing means we are over-committing. Either way, we adjust.
Operating rhythm
| Ritual | Cadence | Outcome |
|---|---|---|
| Standup | Daily 15min | Blocker surface |
| Sprint planning | Bi-weekly 90min | Scope agreement |
| Sprint review | Bi-weekly 60min | Demo + retro |
| Engineering Health | Monthly 60min | DORA metric review |
| Steering committee | Monthly 60min | Risk + budget alignment |
| Architecture review | Monthly 90min | ADRs + tech debt |
| Quarterly business review | Quarterly half-day | Outcome vs goals |
Deployment platform
The default Spalce platform is a small, opinionated set of building blocks we have run in production across half a dozen engagements. We will adapt to your stack — but if you have no preference, this is what you will get, because it works.
Standard platform
Workloads run on Kubernetes (EKS, GKE, or AKS depending on customer cloud) provisioned with Terraform and a thin Helm layer for application charts. Deployments are continuous via Argo CD using the app-of-apps pattern, with progressive rollouts (canary or blue/green) managed by Argo Rollouts. We use Crossplane or Pulumi for control-plane resources that do not belong in raw Terraform. Container images are built with Buildpacks or Docker, signed with cosign, and stored in a registry inside the customer's cloud account.
- Terraform state stored in customer-owned S3/GCS with state locking via DynamoDB or equivalent.
- GitHub Actions or GitLab CI/CD for the build pipeline; Argo CD owns deployment from a separate config repo.
- Vault or AWS Secrets Manager for secret material; nothing sensitive ever lands in environment variables at rest.
- Ephemeral preview environments per pull request, torn down on merge or after 72 hours.
- Tagging policy enforced at provisioning time — every resource carries owner, environment, cost-center, and data-class tags.
Multi-region setup
For T0 and T1 workloads we default to an active-active topology across two regions in the same cloud provider, with a third region for disaster recovery. Traffic is steered by a global load balancer (Cloudflare or cloud-native) using latency-based routing with health checks. State is replicated synchronously where the SLO demands it (auth, payments) and asynchronously elsewhere. We run quarterly chaos drills that kill an entire region during business hours — if we cannot survive that, we fix it before we ship.
Database approach
PostgreSQL is the default for transactional data. We pick managed offerings (RDS, Cloud SQL, Aurora) over self-hosted unless a strict residency or cost constraint forces otherwise. Migrations run through a versioned tool (Flyway, Atlas, or Prisma Migrate depending on stack) with explicit forward and backward scripts; we never run autogenerated migrations in production. For high-throughput append-only workloads we use ClickHouse; for graph-shaped problems we use Neo4j only when we have measured a query that PostgreSQL cannot serve. Schema changes that lock or rewrite large tables run via online tools like pg_repack or gh-ost.
Observability stack
OpenTelemetry SDKs are baked into every service from day one — traces, metrics, and logs share a single instrumentation layer. We forward to Datadog or Grafana Cloud (depending on customer preference and budget) for storage and visualisation, with PagerDuty wired to SLO burn-rate alerts rather than raw thresholds. Every service ships with a default dashboard, a runbook link, and at least one synthetic check; the on-call engineer should never land on a service without those.
Security baseline
Security is not a phase; it is a set of defaults that ship with every workload. We map our controls to OWASP ASVS Level 2 for customer-facing applications and the NIST Cybersecurity Framework for organisational alignment. This section describes what every Spalce-built platform gets out of the box.
Default security controls
- TLS 1.2+ everywhere (1.3 where the stack supports it); HSTS preload; certificate rotation automated via cert-manager or ACM.
- MFA enforced for every human identity touching the platform; SSO through the customer's IdP (Okta, Azure AD, Google Workspace) where available.
- Least-privilege IAM with role-based access; service accounts scoped per workload; no long-lived static credentials in CI.
- Encryption at rest using customer-managed keys (CMK) where the cloud allows; KMS rotation enabled with 90-day cadence.
- Audit logging on every privileged action, shipped to an immutable store with a minimum 365-day retention.
- Dependency pinning and software bill of materials (SBOM) generated per build using Syft; vulnerability scanning via Snyk or Trivy.
Threat modeling
Every new service and every significant feature change goes through a STRIDE-based threat modeling session — usually 45 minutes with one engineer, the Principal Engineer, and a security guild member. The output is a markdown file that lives next to the ADRs, listing assets, trust boundaries, threats, and mitigations. We re-threat-model when the deployment topology changes, when a new third party joins the data path, and at least annually for production systems.
Security in CI/CD
Every pull request runs static analysis (Semgrep for custom rules, CodeQL for language-level coverage), dependency scanning (Snyk or Dependabot), and IaC scanning (Checkov, tfsec) before reviewers see it. Secret scanning (Gitleaks) runs on every commit and as a pre-receive hook on the customer's git server when supported. Production deploys add dynamic application security testing (OWASP ZAP baseline scan) against the staging environment and a container image scan (Trivy) gated to high-severity issues only — we tune signal-to-noise aggressively.
Penetration testing cadence
Production systems get an external penetration test at least once per quarter, scoped to the application layer and the cloud control plane. We use rotating third-party firms — we do not let any single firm get comfortable with our patterns. Findings are tracked in Linear with named owners, severity-based SLAs (critical 7 days, high 30 days, medium 90 days), and a public-facing summary in the customer's security center. For payments and health workloads we layer on annual red-team exercises that include social engineering.
Reliability & SLOs
We run platforms against Service Level Objectives, not vague uptime promises. SLOs are negotiated with the business, codified in monitoring, and used as the explicit input to prioritisation decisions. When the error budget is intact, we ship features; when it is exhausted, we fix the platform.
Why SLOs not uptime targets
An uptime target hides what actually matters — the experience of the end user. A request that takes 9 seconds is not down, but no one would call it up. SLOs let us define availability as a percentage of good user-facing requests, where good is measured in terms of correctness, latency, and freshness. They give us a shared vocabulary with the business: if we promise 99.9 percent, we have 43 minutes of error budget per month, and we will spend that budget rather than hoard it. Hoarded budgets mean we are over-engineering; depleted budgets mean we are under-investing.
Default SLO tiers
| Tier | Availability | p95 latency | Use case |
|---|---|---|---|
| T0 Critical | 99.99% | <300ms | Payments, auth |
| T1 Important | 99.9% | <1s | Customer-facing APIs |
| T2 Standard | 99.5% | <3s | Internal tools, batch |
Error budget policy
Every service has a documented error budget policy signed by the engineering lead and the business owner. While the budget is healthy, we ship features at full speed. When 50 percent is consumed, we discuss in the next standup. When 75 percent is consumed, the team freezes optional feature work and focuses on reliability investments until the budget recovers. When the budget is exhausted, all non-critical changes pause, an incident review is mandatory, and we publish what we learned. This is not punishment — it is a circuit breaker against the natural tendency to over-promise.
Data engineering
Where the platform meets analytics. We treat the warehouse as a product with its own SLAs, schemas as contracts that change through deprecation cycles, and data quality as a first-class concern that fails builds when violated.
Warehouse defaults
We default to BigQuery on Google Cloud and Snowflake elsewhere — both have managed scaling, predictable cost models, and good ergonomics for the African operating reality (variable connectivity, fluctuating workloads). For self-hosted needs we use ClickHouse or DuckDB depending on the workload shape. Raw ingestion uses Airbyte or Fivetran for SaaS sources and Debezium-driven CDC for our own transactional databases. Storage uses the medallion pattern: bronze (raw), silver (modelled), gold (business-ready), with each layer's freshness and quality contracts published.
dbt & semantic layer
All warehouse modelling happens in dbt. Models are tested (uniqueness, not-null, accepted-values, referential integrity) and documented inline; CI fails the build if test coverage on a new model is missing. We expose business logic through dbt's semantic layer or a Cube.js deployment so dashboards, notebooks, and reverse-ETL all consume the same definition of revenue, active user, or default rate. Metric definitions are versioned and require a written rationale when they change.
Data quality
- Great Expectations or Soda runs as a separate CI job that breaks the pipeline when expectations are violated.
- Each gold-layer table has an explicit SLA for freshness, completeness, and consumer subscriptions; misses trigger PagerDuty.
- PII columns are tagged at the schema level and access is gated through column-level policies.
- Backfills run through a separate, idempotent code path and are signed off in writing by the table owner.
- Lineage is captured automatically via OpenLineage so a downstream consumer can trace every column back to source.
ML & LLM practices
We are pragmatic about machine learning. Most problems do not need it; the ones that do, often need less of it than the demo suggests. This section sets the bar we hold ourselves to when we ship learned systems into production.
When we use LLMs
We reach for LLMs when the problem has fuzzy inputs (natural language, semi-structured documents, varied phrasing) and forgiving outputs (summarisation, routing, drafting). We do not use LLMs to do arithmetic, enforce business rules, or make irreversible decisions without human review. Every LLM-backed feature has a deterministic baseline we compare against — if the baseline is within a few points of the LLM's quality, we ship the baseline and save the cost, latency, and operational complexity.
RAG before fine-tuning
Retrieval-augmented generation answers most knowledge-grounded problems at a fraction of the cost and risk of fine-tuning. We invest first in clean ingestion, semantic chunking with overlap, hybrid retrieval (BM25 plus dense vectors via pgvector or Weaviate), and a re-ranker before we even consider model customisation. Fine-tuning enters the conversation when we have a high-volume, stable task with a clear evaluation set and the latency or cost of prompting has become untenable — and even then, we prefer LoRA-style adaptation over full fine-tunes.
Evaluation discipline
We hold ML and LLM systems to the same engineering bar as the rest of the platform: versioned datasets, reproducible builds, regression tests that run in CI. Every change to a prompt, retriever, or model lands with a side-by-side evaluation on a held-out set, scored both automatically (Ragas, custom rubrics) and by a human reviewer. We track win rate, hallucination rate, latency p95, and cost per request as first-class SLIs. A change that wins on quality but loses on cost or latency goes through the same trade-off discussion we have for any other architectural choice.
Hallucination guardrails
- Every user-facing answer is grounded in a citation; if we cannot cite, we refuse to answer.
- Structured outputs (JSON schema, function calling) wherever the downstream consumer expects structure.
- Output validators (Guardrails, Instructor, or hand-rolled Zod schemas) on the response path.
- Confidence calibration via self-consistency or verifier models on high-stakes flows.
- Human-in-the-loop review queues for any output that triggers money movement, customer communication, or compliance reporting.
Africa-specific defaults
Building software for African enterprises requires assumptions that Silicon Valley playbooks do not make. Bandwidth fluctuates, devices skew older, payments do not run on cards, and regulators are still finding their feet. These are the defaults we apply unless told otherwise.
Low-bandwidth design
Performance budgets target a Moto G-class Android device on a 3G connection: under 200KB of JavaScript on the critical path, under 100KB of CSS, total page weight under 1MB. We use route-based code splitting, modern image formats (AVIF and WebP with JPEG fallbacks), and aggressive caching at the edge. Server-rendered HTML is the default; client-side hydration is opt-in per route. We measure real-user metrics from Lagos, Nairobi, Accra, and Johannesburg via SpeedCurve or a self-hosted equivalent — synthetic numbers from us-east-1 are a lie.
Offline-first mobile
Mobile experiences assume the network is intermittent. We use a sync engine (Replicache or a hand-rolled SQLite-backed queue) so users can compose, edit, and submit work without a connection, with eventual reconciliation on the server. Conflict resolution is explicit and visible to the user; we do not silently overwrite their work. Background sync respects battery and data limits; we measure data consumption per session and publish it to the user. PWAs ship before native unless a hardware capability genuinely demands native.
Mobile money rail integration
Cards are not the dominant payment instrument in our markets — mobile money is. We integrate directly with MTN MoMo, Airtel Money, M-Pesa, and Vodafone Cash through their official APIs, and route through aggregators (Flutterwave, Paystack, MFS Africa) only where direct integration is unavailable or commercially worse. We treat each rail's idempotency semantics, callback reliability, and reconciliation cadence as first-class engineering concerns. Settlement reconciliation runs daily as an automated job with discrepancies above a configurable threshold going to a human queue within 24 hours.
Regulator engagement
We engage early with the Bank of Ghana, the Central Bank of Nigeria, the Central Bank of Kenya, and equivalent bodies in the markets we serve — not at submission time, but during design. We produce regulator-ready documentation (data residency, encryption practices, business continuity, third-party risk) as part of every regulated engagement, in the format the regulator actually uses. We have walked auditors through our architectures in person; we know what they ask, and we build to make those answers easy.
Documentation & handover
Documentation is part of done. It is not a phase at the end of the engagement, it is a continuous obligation that ships with every feature. The test of good documentation is whether your team can run the system without us — and we test that explicitly before we step back.
What you receive at end of engagement
- All source code in your git remote, with full commit history, ADRs, and architecture diagrams under docs/.
- Infrastructure as code (Terraform, Helm, Argo CD config) in a second repo, also yours, with state stored in your cloud account.
- An operations runbook per service: SLO definitions, on-call rotation, alert response, common failure modes, recovery procedures.
- Threat models, penetration test reports, SOC 2 / ISO 27001 evidence (where applicable), data flow diagrams.
- A six-week handover plan with named recipients on your side, weekly knowledge-transfer sessions, and a final exit interview.
Runbook standards
Every runbook is tested. Tested means we deliberately trigger the failure mode in staging (or production with permission) and walk a junior engineer through the recovery from cold start. If the engineer cannot recover by following the runbook alone, the runbook is broken. Runbooks live in the same repo as the service they document, link from every alert into the runbook line that addresses that alert, and carry a last-tested timestamp that decays into a warning after 90 days.
Knowledge transfer process
Handover is not a meeting. It is a six-week structured programme that starts roughly six weeks before our scheduled exit. Weeks one and two: shadow rotations, where your engineers sit on every call. Weeks three and four: reverse rotations, where your engineers lead and ours observe. Weeks five and six: independent operation by your team, with us on call as backup. We measure success by your team's MTTR (mean time to recovery) on real incidents — if it has not converged to a level you are comfortable with by week four, we extend the handover at no charge.
What we don't do
Being explicit about what's out of scope matters as much as what's in scope.
- We don't take on engagements without an empowered executive sponsor on the customer side.
- We don't skip threat modeling on regulated workloads, even when timelines are tight.
- We don't ship without observability — every production service has SLIs from day one.
- We don't deploy without rollback. Forward-only deploys are a code smell.
- We don't deliver code without ADRs for the load-bearing decisions.
- We don't use blockchain when a database will do (which is most of the time).
- We don't accept engagements where we'd be a single individual; we engage as squads.
- Engagement modelThe higher-level view of how a Spalce engagement is shaped.
- Security centerCertifications, sub-processors, and compliance posture.
- DocumentationImplementation guides and platform references.
- PricingCommercial models and what each tier includes.
- CompareHow Spalce stacks up against alternatives.
- Case studiesReal engagements, real outcomes, named customers.
