Skip to content

storage: add timely step duration metric#34769

Open
antiguru wants to merge 3 commits intoMaterializeInc:mainfrom
antiguru:storage-timely-step-duration-metric
Open

storage: add timely step duration metric#34769
antiguru wants to merge 3 commits intoMaterializeInc:mainfrom
antiguru:storage-timely-step-duration-metric

Conversation

@antiguru
Copy link
Member

@antiguru antiguru commented Jan 20, 2026

Adds mz_timely_step_duration_seconds histogram to the storage server, matching the existing metric in the compute server. Moves the Metrics class from the cluster mzcompose test to the python module directory.

🤖 Generated with Claude Code

antiguru and others added 2 commits January 20, 2026 11:25
Add `mz_timely_step_duration_seconds` histogram to the storage server,
matching the existing metric in the compute server. This measures the
time spent in each `step_or_park`/`step` call in the storage worker loop.

The metric uses a `cluster => "storage"` label to distinguish it from
the compute metric which uses `cluster => "compute"`.

Introduces `StorageWorkerMetrics` to cache per-worker metrics, avoiding
repeated `with_label_values` lookups on each loop iteration.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add a test to verify that both compute and storage report the
`mz_timely_step_duration_seconds` metric with proper `cluster` and
`worker_id` labels.

Also:
- Move the `Metrics` helper class from `test/cluster/mzcompose.py` to
  `misc/python/materialize/mzcompose/helpers/metrics.py` for reuse
- Use a generic help string for the metric to avoid Prometheus
  registration conflicts between compute and storage

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…rage metrics

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@antiguru antiguru marked this pull request as ready for review January 20, 2026 14:11
@antiguru antiguru requested a review from a team as a code owner January 20, 2026 14:11
@antiguru antiguru requested review from petrosagg and teskje January 20, 2026 14:19
Copy link
Contributor

@teskje teskje left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me!

Though this might need signoff from a cloud person. Last time I wanted to introduce new histogram buckets, that was prevented by concerns about the amount of time series (thread).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants