Spalce ships a self-hosted distribution for customers with strict residency, regulatory, or operational requirements. The self-hosted edition is functionally identical to our managed offering — same APIs, same dashboards, same SDKs — but you own the operational responsibility. This guide is the entry point for sizing, installing, and operating a self-hosted cluster.
Self-hosted is a serious commitment. If you do not have a 24/7 platform team, talk to us about a managed-in-region option before you start.
Reference topology
A minimum production deployment has three tiers: a Kubernetes cluster for stateless services, a managed Postgres for transactional state, and a managed object store for blobs. We support AWS, GCP, and Azure as first-class targets, plus VMware vSphere for on-premise deployments. The same Helm chart works across all of them — only the values files differ.
# Minimum recommended sizing for 10k req/min sustained
# Kubernetes cluster
workers: 6 x (8 vCPU, 32 GiB RAM)
pod CIDR: /16
# Postgres
primary: 8 vCPU, 32 GiB RAM, 500 GiB io2
replicas: 2 (same shape)
# Object store
Class: standard, lifecycle policy enabledInstalling with Helm
We publish a signed Helm chart at oci://charts.spalce.dev/spalce. The chart manages every Spalce-owned workload, leaves your platform-team-owned resources alone, and supports both blue-green and rolling upgrades. We recommend installing into a dedicated namespace so RBAC and network policies are easy to scope.
kubectl create namespace spalce
helm install spalce oci://charts.spalce.dev/spalce \
--namespace spalce \
--version 4.12.0 \
--values prod.values.yamlOperating responsibilities
On a self-hosted deployment, you own four operational responsibilities that we own on the managed platform: backups, capacity planning, security patching, and upgrade scheduling. We provide playbooks, dashboards, and a quarterly health review with our SRE team, but you decide when a change ships.
- Backups — Postgres point-in-time recovery plus daily snapshots, retained 30 days.
- Capacity — review request volume, queue depth, and database IOPS weekly.
- Patching — apply security fixes within 14 days for high severity, 30 for medium.
- Upgrades — minor versions monthly, majors quarterly, with a maintenance window.
Observability
Every service emits Prometheus metrics on /metrics, OpenTelemetry traces over OTLP, and structured JSON logs to stdout. We ship Grafana dashboards as code in the chart, plus a curated set of alerts that map to severity definitions. Hook them into your existing on-call rotation rather than building a parallel one.
// Drop-in alert rule for slow authorization path (Prometheus)
// File: alerts/authorization-slow.yaml
groups:
- name: spalce.authorization
rules:
- alert: AuthorizationP99Slow
expr: histogram_quantile(0.99,
sum(rate(http_request_duration_seconds_bucket{job="auth"}[5m]))
by (le)) > 0.2
for: 10m
labels: { severity: page }
annotations:
summary: "Authorization P99 above 200ms for 10m"Run our preflight CLI in CI for every infra PR. It catches 80% of misconfigurations before they reach apply.
Upgrades and rollback
Helm upgrades are reversible for at least one prior version. Every release publishes a downgrade script and a migration plan — most upgrades are forward-compatible at the database layer, but some require a brief read-only window. We post a release calendar 30 days in advance and never gate critical security fixes behind a feature flag.
Was this article helpful?
