Add metrics for network message sizes and aggregated signature sizes by pablodeymo · Pull Request #314 · lambdaclass/ethlambda

pablodeymo · 2026-04-21T20:11:06Z

Motivation

We want visibility into how large the messages we send on the wire are,
and how large individual aggregated signatures get. The existing gossip
size histograms only covered the receive path and measured a single
(uncompressed) number, so we couldn't answer questions like "how much
bandwidth does snappy save?" or "how do publish sizes compare to what we
receive from the mesh?" — and we had no signal at all on req/resp sizes
or on aggregated proof sizes.

Description

New and changed metrics

Metric	Type	Labels	Buckets (bytes)
`lean_gossip_block_size_bytes`	Histogram	`compression=raw,snappy`	10k, 50k, 100k, 250k, 500k, 1M, 2M, 5M
`lean_gossip_attestation_size_bytes`	Histogram	`compression=raw,snappy`	512, 1024, 2048, 4096, 8192, 16384
`lean_gossip_aggregation_size_bytes`	Histogram	`compression=raw,snappy`	1024, 4k, 16k, 64k, 128k, 256k, 512k, 1M
`lean_reqresp_request_size_bytes`	Histogram	`protocol=status,blocks_by_root` `compression=raw,snappy`	64, 128, 256, 512, 1k, 4k, 16k, 64k
`lean_reqresp_response_chunk_size_bytes`	Histogram	`protocol=status,blocks_by_root` `compression=raw,snappy`	128, 1k, 10k, 100k, 500k, 1M, 5M, 10M
`lean_aggregated_proof_size_bytes`	Histogram		1024, 4k, 16k, 64k, 128k, 256k, 512k, 1M

The three existing lean_gossip_*_size_bytes histograms gained a
compression label (raw = SSZ-encoded, snappy = on-wire), and are
now observed on both publish and receive paths instead of only receive.

Implementation

crates/net/p2p/src/metrics.rs: converts the three gossip size
histograms to HistogramVec and adds the two new req/resp histograms.
Each observe_*_size(raw, snappy) function records both points in one
call.
crates/net/p2p/src/gossipsub/handler.rs: on receive, capture
message.data.len() as the compressed size before decompressing; on
publish (publish_block, publish_attestation,
publish_aggregated_attestation), observe both ssz_bytes.len() and
compressed.len() right after compression.
crates/net/p2p/src/req_resp/encoding.rs: write_payload now
returns the compressed byte count; decode_payload returns a small
DecodedPayload { uncompressed, compressed_size } struct. This lets
the codec record both sizes without re-encoding.
crates/net/p2p/src/req_resp/codec.rs: observes request and
per-chunk response sizes for Status and BlocksByRoot at every
read/write site, using a short protocol_label() helper to map the
full libp2p protocol id to the metric label.
crates/blockchain/src/metrics.rs: registers
lean_aggregated_proof_size_bytes and its
observe_aggregated_proof_size() helper; both included in the
init() eager-registration list.
crates/blockchain/src/aggregation.rs: observes the proof size
right after AggregatedSignatureProof::new() in aggregate_job,
using proof.proof_data.len() — the variable-length portion of the
post-quantum aggregated signature.
crates/common/metrics/src/lib.rs: re-exports HistogramVec and
register_histogram_vec! so labeled histograms can be used
workspace-wide.

Devnet-compat fix (bundled, separate commit)

While validating these metrics in a local devnet I found that ethlambda
cannot launch against the current lean-quickstart generator output,
blocking make run-devnet on main. The generator still produces the
single-key layout (one XMSS key per validator, flat-string
GENESIS_VALIDATORS, single validator_N_sk.ssz), which other clients
(zeam, lantern) already accept by using the key for both roles.

Commit Accept single-key genesis + annotated-validator layouts
extends two loaders to accept either layout:

GenesisValidatorEntry deserializes from either a map
(attestation_pubkey + proposal_pubkey) or a flat hex string (used
for both roles).
classify_role / read_validator_keys gain a Shared role for
files without attester/proposer in the name, populating both role
slots from the same key file. Mixing Shared with role-specific
entries for the same validator is still an error.

Round-trip test added for the flat-string genesis variant.

Documentation

docs/metrics.md: adds the new lean_aggregated_proof_size_bytes
to the PQ Signature Metrics table and documents the five network size
histograms under Network Metrics.

How to Test

make fmt — passes
make lint — passes (clippy clean, -D warnings)
cargo test -p ethlambda-types --lib genesis — 4 passed (includes the
flat-pubkey round-trip)
cargo test -p ethlambda-metrics -p ethlambda-p2p -p ethlambda-blockchain --lib --bins — passes
cargo test -p ethlambda-state-transition --release — passes

Live devnet scrape (4-node local devnet, 4 min uptime, 2 zeam +
2 ethlambda with ethlambda_1 as aggregator). Metrics pulled from
ethlambda_1 on the aggregator:

lean_gossip_attestation_size_bytes{compression=raw}    n=124  mean=2.7 kB  p50/p99 ∈ (2.0 kB, 4.1 kB]
lean_gossip_attestation_size_bytes{compression=snappy} n=124  mean=2.6 kB  p50/p99 ∈ (2.0 kB, 4.1 kB]
lean_reqresp_request_size_bytes{protocol=status,raw}    n=7  mean=80 B    p50 ∈ (64 B, 128 B]
lean_reqresp_request_size_bytes{protocol=status,snappy} n=7  mean=58 B    p50 ∈ (0 B, 64 B]
lean_reqresp_response_chunk_size_bytes{protocol=status,raw}    n=7  mean=80 B
lean_reqresp_response_chunk_size_bytes{protocol=status,snappy} n=7  mean=58 B

Attestation payloads are already tight, so snappy only shaves ~3 %.
Status frames (tiny structs) compress ~28 % (80 → 58 B).
The block, aggregation, and BlocksByRoot histograms had zero
observations — ethlambda could not produce blocks in this run
(pre-existing cross-client signature scheme mismatch with the pinned
zeam image; not related to this PR).

Notes

The forkchoice_spectests and signature_spectests fixture tests fail
with the same errors on main (XmssSignature length != 2536 and
ProofError(InvalidProof) — pre-existing devnet-3/devnet-4 fixture
mismatches). This PR does not touch signature or SSZ types.

Extend the three `lean_gossip_{block,attestation,aggregation}_size_bytes` histograms with a `compression` label (raw SSZ vs snappy on-wire) and observe them on both publish and receive paths instead of only receive. Add new `lean_reqresp_request_size_bytes` and `lean_reqresp_response_chunk_size_bytes` histograms for the Status and BlocksByRoot protocols, labeled by `{protocol, compression}`. The req/resp encoding helpers (`write_payload`, `decode_payload`) now return the compressed byte count so the codec can record both sizes without re-encoding. Add `lean_aggregated_proof_size_bytes` histogram observed in `aggregate_job` after each `AggregatedSignatureProof` is built, recording the size of `proof_data` — the variable-length portion of post-quantum aggregated signatures. Re-export `HistogramVec` and `register_histogram_vec!` from `ethlambda-metrics` so labeled histograms can be used workspace-wide.

github-actions · 2026-04-21T20:12:43Z

🤖 Kimi Code Review

This is a solid observability PR that adds comprehensive metrics for P2P message sizes. The implementation is clean, secure, and follows Rust best practices.

Correctness & Security

No security issues introduced. The PR only adds instrumentation around existing encoding/decoding logic.
Proper handling of size limits remains intact in encoding.rs (MAX_PAYLOAD_SIZE checks).
No panic paths introduced in metric hot paths.

Code Quality

Good use of HistogramVec with low-cardinality labels (protocol, compression) to avoid cardinality explosion.
The DecodedPayload struct in encoding.rs (lines 11-14) elegantly returns both uncompressed data and compressed size without requiring the caller to re-measure.
Consistent error handling: metrics are only recorded on successful encode/decode operations.

Minor Suggestion
In crates/net/p2p/src/req_resp/codec.rs, the error response path in write_response (lines 165-169) does not record metrics, while success paths do. For consistency, consider observing the error payload size:

Response::Error { message } => {
    io.write_all(&[ResponseCode::ERROR.into()]).await?;
    let encoded = message.to_ssz();
    let compressed_size = write_payload(io, &encoded).await?;
    metrics::observe_reqresp_response_chunk_size(label, encoded.len(), compressed_size);
    Ok(())
}

Documentation

The docs/metrics.md update correctly documents all new histograms and their label values.

Performance

Metric observations are lock-free histogram updates (atomic increments) and won't block the async runtime.
No unnecessary allocations; the protocol_label function returns &'static str.

Nitpick
In crates/blockchain/src/aggregation.rs (line 293), the variable proof is created separately before being moved into the struct. This is fine for readability, though you could inline it if preferred.

Overall: LGTM. The PR significantly improves network observability without touching consensus-critical logic.

Automated review by Kimi (Moonshot AI) · kimi-k2.5 · custom prompt

github-actions · 2026-04-21T20:13:23Z

🤖 Codex Code Review

Response::Error chunks are not counted in the new req/resp response-size metric, even though the receive side does count them and the docs describe the metric as covering req/resp response chunks generally. The gap is in codec.rs: after encoding the error payload, write_payload is called and the result is discarded, with no observe_reqresp_response_chunk_size(...). This will systematically under-report outbound error traffic and make send/receive comparisons misleading. Add the same observation here that you add for success chunks.

No consensus-layer correctness or security regressions stood out in this diff. The changes are confined to metrics/instrumentation; nothing here touches fork choice, attestation validation, STF, XMSS verification, or SSZ semantics beyond measuring encoded sizes.

I could not complete a cargo check in this environment: the toolchain/cargo cache paths under /home/runner are read-only, and network access is restricted, so dependency resolution failed before compilation.

Automated review by OpenAI Codex · gpt-5.4 · custom prompt

github-actions · 2026-04-21T20:14:21Z

🤖 Claude Code Review

Now I have a thorough understanding of the PR. Let me compile the review.

PR Review: Add metrics for network message sizes and aggregated signature sizes

Overall: A clean, well-motivated observability PR. The design (dual raw/snappy labels, DecodedPayload struct, protocol_label helper) is solid and follows existing project conventions. A few issues worth addressing before merge.

Bug: `compressed_size` over-reports for multi-chunk BlocksByRoot responses

encoding.rs:36:

let (size, rest) = decode_varint(&buf)?;
let compressed_size = rest.len();

buf is filled by read_to_end, which consumes all remaining bytes from io (bounded only by MAX_COMPRESSED_PAYLOAD_SIZE). For a BlocksByRoot response stream carrying N blocks:

[varint1][snappy1][result_byte2][varint2][snappy2]…

rest after stripping varint1 includes the compressed bytes of all subsequent chunks. So on the first call to decode_payload inside decode_blocks_by_root_response, decoded.compressed_size is the entire remainder of the stream, not just the current chunk. The raw-vs-snappy ratio this metric is supposed to answer becomes meaningless for multi-block responses.

A correct approach would be to have write_payload return the compressed size (already done) and compare against size as usize on the decode side, or to at least cap the reported compressed_size at the actual snappy frame bytes consumed. As a short-term fix, consider adding a note in the DecodedPayload doc-comment that this field is accurate only for single-chunk streams, until the decode path is refactored.

Minor: Error responses on the write path are not measured

codec.rs:163–172 — the Response::Error arm calls write_payload but discards the returned compressed size and records no metric, while the read path (decode_status_response, decode_blocks_by_root_response) does record metrics for error chunks. If the intent is to measure all on-wire traffic, the write side should observe here too. If error sizes are intentionally excluded, the comments should say so.

Minor: Request size observed on both send and receive sides under the same metric name

codec.rs:47 (read_request) and codec.rs:104–106 (write_request) both call observe_reqresp_request_size. Each peer records its own view (requester records what it sent, responder records what it received), so there is no double-counting within one process. This is correct — but worth a brief comment so a future reader doesn't assume it's a bug.

Nit: `HistogramVec` label cardinality protection

metrics.rs:20–26 — protocol_label maps unknown protocols to "unknown". Good defensive practice that prevents metric cardinality explosion if new protocols are added without updating this function. No change needed, just worth calling out as a good pattern.

Nit: Gossip metrics observed before publish returns

handler.rs:144,179,207 — metrics are observed before the swarm_handle.publish call. If publish is fire-and-forget (it is here), this is harmless, but it means a failed publish still increments the histogram. Acceptable for this codebase since publish doesn't return a Result, but worth knowing.

Documentation

docs/metrics.md additions are accurate and complete. Bucket choices are reasonable for XMSS-heavy payloads (blocks can be large; individual attestations are small). The comment block on compression label values ("raw" / "snappy") in metrics.rs:73–75 is helpful.

What's good

DecodedPayload struct cleanly keeps encoding details co-located rather than returning multiple values or adding extra parameters.
Re-exporting HistogramVec + register_histogram_vec! from ethlambda-metrics follows the workspace re-export convention.
All new metrics use the lean_ prefix and are added to init() in blockchain/src/metrics.rs (consistent with existing eager registration pattern).
The p2p-side histograms (no init()) are consistent with existing p2p metrics that use lazy initialization.
observe_aggregated_proof_size correctly uses proof.proof_data.len() — measuring just the variable-length post-quantum proof bytes, not the full AggregatedSignatureProof struct size.

Automated review by Claude (Anthropic) · sonnet · custom prompt

greptile-apps · 2026-04-21T20:15:01Z

Greptile Summary

This PR adds raw/snappy size histograms for gossip messages (block, attestation, aggregation) on both send and receive paths, two new req/resp size histograms (request and per-chunk response), and a histogram for aggregated proof sizes. The implementation is well-structured: DecodedPayload cleanly carries both the uncompressed bytes and compressed size from decode_payload, write_payload now returns the compressed byte count, and protocol_label() keeps codec callsites tidy. The only minor gap is that the Network Metrics table in docs/metrics.md is missing a Buckets column present in other sections.

Confidence Score: 5/5

Safe to merge — changes are purely additive observability instrumentation with no behavior changes on the critical path.

All findings are P2 (documentation style). No logic errors, data loss, or correctness issues were found. The metric measurements are accurate for the cases that occur in practice, and the implementation follows existing patterns consistently.

docs/metrics.md — missing Buckets column in the Network Metrics table.

Important Files Changed

Filename	Overview
crates/net/p2p/src/req_resp/encoding.rs	Adds DecodedPayload struct returning (uncompressed, compressed_size) from decode_payload, and makes write_payload return the compressed byte count. Logic is correct: compressed_size is derived from rest.len() (bytes after the varint), accurately capturing the on-wire snappy bytes for single-chunk responses.
crates/net/p2p/src/req_resp/codec.rs	Adds protocol_label() helper and observes request/response sizes at every read/write site. Error-response payloads are also counted in the response-chunk histogram — reasonable since they're still on-wire traffic.
crates/net/p2p/src/gossipsub/handler.rs	Records raw+snappy sizes on both receive and publish paths for blocks, attestations, and aggregated attestations. Compressed size is captured before decompression on receive; SSZ and compressed lengths measured after compression on publish.
crates/net/p2p/src/metrics.rs	Converts three gossip histograms from Histogram to HistogramVec with a compression label, and adds two new req/resp HistogramVec metrics. Helper functions observe_*_size(raw, snappy) cleanly record both label values per call.
crates/blockchain/src/metrics.rs	Adds LEAN_AGGREGATED_PROOF_SIZE_BYTES histogram with appropriate byte-range buckets (1 KiB–1 MiB) and a clean observe_aggregated_proof_size() helper. Correctly included in init() for eager Prometheus registration.
crates/blockchain/src/aggregation.rs	Observes proof.proof_data.len() right after AggregatedSignatureProof::new(). Observation is placed correctly — after the proof is built successfully, before Some(output) is returned.
crates/common/metrics/src/lib.rs	Re-exports HistogramVec and register_histogram_vec! so they are available workspace-wide. Minimal, correct change.
docs/metrics.md	Documents the new network size histograms and the aggregated proof size histogram. The Network Metrics table is missing a Buckets column present in other sections.

Comments Outside Diff (1)

docs/metrics.md, line 76-88 (link)

Network Metrics table missing Buckets column

The Network Metrics table header is missing a "Buckets" column, while the PQ Signature Metrics and Fork-Choice Metrics tables both include one. The five new size histograms all have non-trivial bucket choices that are worth documenting here for quick reference.

Prompt To Fix With AI

This is a comment left during a code review.
Path: docs/metrics.md
Line: 76-88

Comment:
**Network Metrics table missing Buckets column**

The Network Metrics table header is missing a "Buckets" column, while the PQ Signature Metrics and Fork-Choice Metrics tables both include one. The five new size histograms all have non-trivial bucket choices that are worth documenting here for quick reference.



How can I resolve this? If you propose a fix, please make it concise.

Prompt To Fix All With AI

This is a comment left during a code review.
Path: docs/metrics.md
Line: 76-88

Comment:
**Network Metrics table missing Buckets column**

The Network Metrics table header is missing a "Buckets" column, while the PQ Signature Metrics and Fork-Choice Metrics tables both include one. The five new size histograms all have non-trivial bucket choices that are worth documenting here for quick reference.

```suggestion
| Name   | Type  | Usage | Sample collection event | Labels | Buckets | Supported |
```

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (1): Last reviewed commit: "Add metrics for network message sizes an..." | Re-trigger Greptile}

The `lean-quickstart` genesis generator still emits a single XMSS key per validator: `config.yaml`'s `GENESIS_VALIDATORS` is a flat hex-string list, and `annotated_validators.yaml` references one `validator_N_sk.ssz` file per validator with no `attester`/`proposer` split. zeam and lantern already treat that shared key as both the attestation and proposal key for the same validator; ethlambda rejected both files outright, which prevented launching the client against the generator's output. Extend the two loaders to accept the single-key layout in addition to the canonical dual-key one: - `GenesisValidatorEntry` (`crates/common/types/src/genesis.rs`) now deserializes either a map with `attestation_pubkey` + `proposal_pubkey` or a plain hex string; the flat string is used for both roles. - `classify_role` / `read_validator_keys` in `bin/ethlambda/src/main.rs` add a `Shared` role that populates both attester and proposer slots from the same private-key file. Files with the existing `attester` / `proposer` substrings keep their prior routing. A `Shared` entry cannot be mixed with role-specific entries for the same validator. Added a round-trip test for the flat-string genesis variant. Unblocks running ethlambda against `make run-devnet` / lean-quickstart's current generator output.

This reverts commit 56ce3a2.

MegaRedHand and others added 4 commits April 21, 2026 17:45

Merge branch 'main' into metrics-network-signature-sizes

1248d9e

Revert "Accept single-key genesis + annotated-validator layouts."

5a31ae1

This reverts commit 56ce3a2.

Merge branch 'main' into metrics-network-signature-sizes

1e4f41f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add metrics for network message sizes and aggregated signature sizes#314

Add metrics for network message sizes and aggregated signature sizes#314
pablodeymo wants to merge 5 commits intomainfrom
metrics-network-signature-sizes

pablodeymo commented Apr 21, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 21, 2026

Uh oh!

github-actions Bot commented Apr 21, 2026

Uh oh!

github-actions Bot commented Apr 21, 2026

Uh oh!

greptile-apps Bot commented Apr 21, 2026 •

edited

Loading

Important Files Changed

Comments Outside Diff (1)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pablodeymo commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Description

New and changed metrics

Implementation

Devnet-compat fix (bundled, separate commit)

Documentation

How to Test

Notes

Uh oh!

github-actions Bot commented Apr 21, 2026

🤖 Kimi Code Review

Uh oh!

github-actions Bot commented Apr 21, 2026

🤖 Codex Code Review

Uh oh!

github-actions Bot commented Apr 21, 2026

🤖 Claude Code Review

PR Review: Add metrics for network message sizes and aggregated signature sizes

Bug: compressed_size over-reports for multi-chunk BlocksByRoot responses

Minor: Error responses on the write path are not measured

Minor: Request size observed on both send and receive sides under the same metric name

Nit: HistogramVec label cardinality protection

Nit: Gossip metrics observed before publish returns

Documentation

What's good

Uh oh!

greptile-apps Bot commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Comments Outside Diff (1)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pablodeymo commented Apr 21, 2026 •

edited

Loading

Bug: `compressed_size` over-reports for multi-chunk BlocksByRoot responses

Nit: `HistogramVec` label cardinality protection

greptile-apps Bot commented Apr 21, 2026 •

edited

Loading