Skip to content

Health & graceful shutdown

GGCommons ships a tiny, dependency-free HTTP/1.1 health server that exposes Kubernetes-style probes, plus library-owned signal handling that drains your subsystems cleanly on SIGTERM. It is on by default only on the KUBERNETES platform and off everywhere else, so you get correct probe behavior in a cluster without writing any HTTP code, and zero overhead on the edge.

This guide is for component developers using the SDK. You almost never build the health server yourself — the builder wires it for you. Your job is to drive the readiness gate from your code and let the library handle the rest.

The server binds 0.0.0.0 (default port 8081) and answers three paths:

Probe Default path Returns 200 when Returns 503 when
Liveness /livez the process is alive — always, the running handler is the proof never (a broker outage must not fail liveness)
Readiness /readyz connected && ready && !shuttingDown starting up, gated off, disconnected, or shutting down
Startup /startupz same predicate as readiness same as readiness (use it to give slow connects more time)

Key semantics, identical across all four SDKs:

  • /livez never consults the broker. The fact that the handler can answer is itself the liveness proof. Kubernetes should restart a hung process, not a temporarily disconnected one — so liveness stays decoupled from connectivity.
  • /readyz is the full predicate messaging connected && readyFlag && !shuttingDown. It is 200 only when all three hold. A 200 therefore means more than “messaging connected” — if your code called setReady(false), /readyz stays 503 even while connected.
  • An unknown path returns 404 “not found”. Bodies are tiny text/plain: "ok" or "not ready".

Enablement follows one rule in every language: explicit health.enabled wins; otherwise the platform default applies. The platform default is on only for KUBERNETES.

Platform Default Override
KUBERNETES on set health.enabled: false to turn it off
HOST off set health.enabled: true to turn it on
GREENGRASS off set health.enabled: true to turn it on
auto-detected as none of the above off set health.enabled: true to turn it on

The enabled key is tri-state: absent means “use the platform default”, and an explicit value overrides in either direction (explicit false turns it off even on KUBERNETES). A bind failure is logged and swallowed in all four SDKs — a port clash never crashes your component.

Health lives under the health section of your component config (the single schema in schema/ggcommons-config-schema.json, shared by all four SDKs). Every key is optional:

{
"component": { "name": "com.example.MyComponent" },
"health": {
"enabled": true,
"port": 8081,
"livenessPath": "/livez",
"readinessPath": "/readyz",
"startupPath": "/startupz"
}
}

The values above are the defaults — you only need a health block at all if you want to change the port, rename a path, or force-enable the server off-cluster. On KUBERNETES, omitting the section entirely still starts the server on 8081 with the default paths.

The readiness flag (readyFlag) defaults to true, so if you do nothing, /readyz flips to 200 as soon as messaging connects. To hold the gate closed while you do startup work — warming a cache, confirming required subscriptions, loading parameters — call setReady(false) early, then setReady(true) once you are genuinely ready to serve traffic.

You cannot force-ready while disconnected or shutting down: setReady(true) only lifts your gate; the connectivity and shutdown parts of the predicate still apply.

// gg is your built GGCommons instance (see the Components guide for construction).
gg.setReady(false); // hold /readyz at 503 during startup
gg.getMessaging().subscribe("commands/+", handler);
warmCaches();
gg.setReady(true); // /readyz can now return 200 (once connected)

When the library receives SIGTERM (and, in some SDKs, SIGINT), it runs a fixed drain sequence:

  1. Readiness flips to 503 first. The shuttingDown flag is set before anything is torn down, so /readyz immediately reports “not ready”. Kubernetes stops routing new traffic to the pod while in-flight work finishes.

  2. Subsystems drain. Messaging unsubscribes and disconnects, metrics/heartbeat stop, and the other subsystems shut down in order.

  3. The health server stops last. It keeps answering /readyz with 503 throughout the drain, so the orchestrator sees a clean “not ready → gone” transition rather than a connection refused mid-drain.

The library installs the signal handling for you — you do not register SIGTERM yourself. You can also trigger the same drain from your own code (for example after a fatal error), and it is idempotent, so the signal path and an explicit call cannot double-drain.

// A JVM shutdown hook catches SIGTERM/SIGINT and runs the drain; the JVM then exits 0.
// You can also drive it explicitly — this deregisters the hook and is idempotent:
gg.shutdown();

Point each probe at the matching path on the health port. The startup probe guards a slow first connect; once it passes, liveness and readiness take over.

# Deployment > spec.template.spec.containers[]
ports:
- name: health
containerPort: 8081
livenessProbe:
httpGet:
path: /livez
port: health
periodSeconds: 10
readinessProbe:
httpGet:
path: /readyz
port: health
periodSeconds: 5
startupProbe:
httpGet:
path: /startupz
port: health
failureThreshold: 30
periodSeconds: 2

Because /readyz reflects the full predicate, a rollout will wait on genuine readiness — not merely on the process starting:

Terminal window
kubectl rollout status deploy/my-component
# blocks until enough pods report /readyz == 200 (connected, ready, not shutting down)