Skip to content

Metrics

The metrics subsystem lets you define a metric once — its measures and dimensions — and then emit measure values by name. Where those values go (a log file, the messaging transport, CloudWatch, or a Prometheus scrape endpoint) is chosen by configuration, not by your code. The same business logic emits the same metric regardless of platform.

The push targets serialize to CloudWatch EMF (Embedded Metric Format); the pull target serves a Prometheus/OpenMetrics exposition.

You obtain the service from your GGCommons instance. The accessor name differs per language:

MetricEmitter metrics = gg.getMetrics();

Build a metric with MetricBuilder: give it a name, add one or more measures (each with a unit and a storage resolution), and optionally add dimensions. A measure’s storage resolution is either 1 (high resolution, 1-second granularity) or 60 (standard, 1-minute granularity).

metrics.defineMetric(
MetricBuilder.create("performance")
.withConfig(config) // fills thingName, componentName, namespace
.addMeasure("replyLatency", "Milliseconds", 1)
.addDimension("instance", "main")
.build());

Dimensions are key/value labels attached to every datum of the metric. Three are injected automatically when you build:

Dimension Value Source
category the metric’s name injected by the builder on build()
coreName the IoT Thing name from withThingName / config
component the component name from withComponentName / config

CloudWatch caps a metric at 10 dimensions (MAX_DIMENSIONS = 10), counting the three injected ones. The SDKs enforce the cap differently when you exceed it:

  • Java, Python, TypeScript throw (Java at addDimension and build; Python and TypeScript at build).
  • Rust is infallible: build() trims the excess custom dimensions (keeping the three injected) and logs a tracing::warn! instead of erroring.

After a metric is defined, emit values by passing the metric name and a map of measureName -> value. There are two emission paths:

  • emitMetric (batched) — records the value into the target’s buffer; it is delivered on the target’s normal interval / batching policy.
  • emitMetricNow (immediate) — bypasses the buffer and delivers right away.

flushMetrics force-drains the buffer on demand. The measure-value container type is language-specific: Map<String, Float> (Java — note Float, not double), a plain dict (Python), HashMap<String, f64> (Rust), and Record<string, number> (TypeScript).

Map<String, Float> values = new HashMap<>();
values.put("replyLatency", 12.5f); // Map<String, Float>
metrics.emitMetric("performance", values); // buffered — drained on the emit interval
metrics.emitMetricNow("performance", values); // immediate — bypasses the buffer
metrics.flushMetrics(); // force-drain now
// metrics.close(); // flush + release the target on shutdown

A target decides where emitted metrics actually go. You select it with metricEmission.target in your component config:

target What it does
log Writes EMF JSON lines to the metric log file (the library default).
messaging Publishes EMF over the active messaging transport.
cloudwatch Sends EMF to CloudWatch through a durable store-and-forward buffer (see below).
cloudwatchcomponent Hands metrics to the AWS-managed CloudWatch metrics Greengrass component.
prometheus Exposes metrics on an HTTP scrape endpoint (the default on Kubernetes).

The effective target is resolved by a three-tier precedence, identical across all four SDKs:

  1. The explicit metricEmission.target from config, if set.
  2. Otherwise the platform-profile defaultprometheus on Kubernetes.
  3. Otherwise the library default, log.

An unknown target name logs a warning and falls back to log.

// component config — metricEmission section
{
"metricEmission": {
"target": "cloudwatch",
"namespace": "MyApp/Metrics"
}
}

The prometheus target inverts the usual lifecycle. Instead of pushing, emit/emitNow only update an in-process latest-value gauge registry; flush is a no-op (a scrape pulls the data); shutdown/close stops the HTTP listener. A Prometheus server (or any scraper) pulls the current values on its own schedule.

  • The HTTP server binds on 0.0.0.0, default port 9090, default path /metrics.
  • GET <path> returns 200 with the OpenMetrics text exposition; any other path returns 404 (and, on Java, a non-GET on the metrics path returns 405).
  • A bind/start failure is logged and swallowed — the component keeps running and emits keep updating the registry; only the scrape endpoint is unavailable.

Gauge naming: the gauge name is sanitize(lowercase("{namespace}_{measureName}")) (restricted to [a-z0-9_], prefixed with _ if it would start with a digit). Dimensions become labels, with each label name sanitized to [a-zA-Z_][a-zA-Z0-9_]*. The namespace used is the configured metricEmission.namespace (not the per-metric namespace), defaulting to ggcommons.

The cloudwatch target is backed by a durable, disk-backed store-and-forward buffer by default (buffer.type defaults to durable). Datums are written to local disk first and drained to CloudWatch in batches, so metrics survive process restarts and cloud disconnects — the edge-first default. Set buffer.type to memory to opt back into in-memory batching only.

{
"metricEmission": {
"target": "cloudwatch",
"namespace": "MyApp/Metrics",
"buffer": {
"type": "durable",
"maxDiskBytes": 134217728,
"path": "/var/lib/ggcommons/metrics/{ComponentName}/cw",
"onFull": "dropOldest",
"fsync": "perBatch"
}
}
}

The defaults shown above (128 MiB cap, the per-component cw path, dropOldest when full, fsync per batch) apply when the buffer block is omitted.

Behavior to be aware of:

  • Durable mode fails fast if the bundled ggstreamlog native core is missing — that is a deployment error, since the core ships with the SDK. If the core is present but the buffer cannot open (bad path / IO error), it degrades to in-memory batching.
  • The drain batches up to 1000 datums (≈900 KB) per CloudWatch request, drops datums outside CloudWatch’s accept window (≈14 days past / ≈2 hours future), and groups by namespace.
  • close() on a durable buffer flushes to disk and stops the engine — it does not drain to the cloud, so any backlog persists for the next restart.

For the push (EMF) targets, the _aws.Timestamp field is in epoch milliseconds. The cloudwatchcomponent message carries its own timestamp in epoch seconds (a deliberately different unit), and its JSON location differs by language — Java places it at request.metricData.timestamp, while Python places it at request.timestamp (a sibling of metricData).