Multi-DC
This document describes the multi-datacenter design for GoAkt and how developers can use it. Each datacenter runs an independent GoAkt cluster. There is no global cluster state shared between datacenters. Cross-DC communication is DC-transparent to callers and relies on a DC control plane (registry) for routing.
Goals and Non-Goals
Goals
Isolation: Each datacenter remains a standalone GoAkt cluster.
DC-transparent messaging: Callers address actors/grains by logical identity; the runtime discovers which DC contains the target.
Control plane as source of truth: The DC control plane registry drives routing and placement.
DC-aware placement:
SpawnOnwithWithDataCenterspawns actors in a specific datacenter.Liveness: Periodic heartbeats keep routing tables accurate.
Non-Goals
Global cluster membership or consensus across datacenters.
Transparent global actor identity with shared state.
Strong consistency across datacenters.
Quick Start
1. Choose a Control Plane
Use etcd or NATS JetStream as the DC registry. One backend can serve both cluster discovery and the DC control plane (with different namespaces/keys per use).
2. Configure the Actor System
3. Start and Check Readiness
4. Cross-DC Messaging
Send messages to actors/grains without specifying a DC; the runtime discovers the target:
5. Spawn in a Specific Datacenter
Architecture
DC leader is the sole writer for its
DataCenterRecord; followers do not run the controller.Control plane stores records, endpoints, and liveness TTLs/leases.
Routing uses the control plane cache; local DC is tried first, then remote DCs in parallel.
Developer Guide
Endpoints Configuration
datacenter.Config.Endpoints lists remoting addresses (host:port) advertised to other DCs. Other DCs use these to discover and talk to actors/grains in this DC.
Single endpoint (e.g. leader only): All cross-DC traffic goes to one node.
Multiple endpoints: Cross-DC spawns and lookups are spread across nodes.
Example:
Endpoints must be reachable from other DCs.
Readiness and Cache Refresh
DataCenterReady(): Returns true when the DC controller is operational and the cache has been refreshed at least once. Use for readiness probes before cross-DC operations.DataCenterLastRefresh(): Returns the time of the last successful cache refresh. Zero when multi-DC is disabled or the controller has never refreshed.
FailOnStaleCache
true(default): Cross-DC operations (e.g.SpawnOnwithWithDataCenter) fail withErrDataCenterStaleRecordsif the cache is stale. Favours consistency.false: Operations proceed with best-effort routing using a potentially stale cache. Favours availability.
DC-Aware Spawn (SpawnOn + WithDataCenter)
To spawn an actor in a different datacenter:
The runtime looks up the target DC by DataCenter.ID() (Zone+Region+Name) in the active records, then sends RemoteSpawn to one of that DC's Endpoints chosen at random.
Errors: ErrDataCenterNotReady, ErrDataCenterStaleRecords, ErrDataCenterRecordNotFound.
Control Plane Providers
Etcd
datacenter/controlplane/etcd
Supports TLS, auth, watch
NATS JetStream
datacenter/controlplane/nats
Requires NATS with -js (JetStream)
Control Plane
Interface
Register: Create a DC record; returns ID and version.
Heartbeat: Renew lease; extends liveness.
SetState: Transition record state (e.g. ACTIVE β DRAINING on shutdown).
ListActive: Return ACTIVE records with valid leases for routing.
Watch: Stream updates (optional; return
ErrWatchNotSupportedif unsupported).Deregister: Remove record on shutdown for faster cleanup than lease expiry.
Data Model
DataCenterRecord:
ID
Stable identifier (Zone+Region+Name or Name)
DataCenter
Name, Region, Zone, Labels
Endpoints
Advertised remoting addresses
State
REGISTERED, ACTIVE, DRAINING, INACTIVE
LeaseExpiry
Liveness lease expiry
Version
Monotonic revision for conflict-free updates
State transitions:
REGISTERED β ACTIVE: First heartbeat after start.
ACTIVE β DRAINING: On actor system shutdown.
ACTIVE/DRAINING β INACTIVE: TTL expiry or explicit disable.
Built-in Implementations
etcd
NATS JetStream
NATS must be started with JetStream: nats-server -js.
Contract Guarantees
Register, SetState, and Heartbeat are linearisable per record (CAS by Version).
Heartbeat extends the lease only when Version is current.
ListActive returns only ACTIVE records with unexpired leases.
Watch emits ordered updates (clients de-dupe by Version).
Deregister removes the record immediately for clean shutdown.
Configuration Reference
DataCenter
required
Name, Region, Zone, Labels
Endpoints
required
Remoting addresses (host:port)
HeartbeatInterval
10s
Heartbeat cadence
CacheRefreshInterval
10s
Cache refresh interval
MaxCacheStaleness
30s
Max age before cache is considered stale
LeaderCheckInterval
5s
Leadership recheck interval
RequestTimeout
5s
Control plane operation timeout
WatchEnabled
true
Use Watch when supported
FailOnStaleCache
true
Fail cross-DC ops when cache is stale
JitterRatio
0.1
Jitter for intervals
MaxBackoff
30s
Max backoff on errors
Cluster Integration
Multi-DC is enabled by adding datacenter config to the cluster config:
Failure Handling
Leader failover: The new leader takes over heartbeats.
Registry outages: Local DC operation continues; cross-DC routing may use cached data.
Stale cache: Configurable via
FailOnStaleCache(strict vs best-effort).Control plane unavailable: Does not block local DC; only affects cross-DC operations.
Routing Flow (DC-Transparent Messaging)
Runtime checks local datacenter via
ActorOf.If not found locally, fetches active DCs from the control plane cache.
Queries all active DC endpoints in parallel via
RemoteLookup.Uses the first DC that returns a valid actor address.
Cancels remaining lookups.
Sends the message via remote messaging to the chosen endpoint.
Sequence: DC Registration
DC leader starts and registers
DataCenterRecord(metadata + endpoints).Leader sends heartbeats at
HeartbeatInterval(less than TTL).Registry expires the record if heartbeats stop.
On shutdown, leader transitions ACTIVE β DRAINING β INACTIVE and calls
Deregister.
Last updated