Add metrics for network message sizes and aggregated signature sizes#314
Add metrics for network message sizes and aggregated signature sizes#314pablodeymo wants to merge 5 commits intomainfrom
Conversation
Extend the three `lean_gossip_{block,attestation,aggregation}_size_bytes`
histograms with a `compression` label (raw SSZ vs snappy on-wire) and
observe them on both publish and receive paths instead of only receive.
Add new `lean_reqresp_request_size_bytes` and
`lean_reqresp_response_chunk_size_bytes` histograms for the Status and
BlocksByRoot protocols, labeled by `{protocol, compression}`. The
req/resp encoding helpers (`write_payload`, `decode_payload`) now return
the compressed byte count so the codec can record both sizes without
re-encoding.
Add `lean_aggregated_proof_size_bytes` histogram observed in
`aggregate_job` after each `AggregatedSignatureProof` is built,
recording the size of `proof_data` — the variable-length portion of
post-quantum aggregated signatures.
Re-export `HistogramVec` and `register_histogram_vec!` from
`ethlambda-metrics` so labeled histograms can be used workspace-wide.
🤖 Kimi Code ReviewThis is a solid observability PR that adds comprehensive metrics for P2P message sizes. The implementation is clean, secure, and follows Rust best practices. Correctness & Security
Code Quality
Minor Suggestion Response::Error { message } => {
io.write_all(&[ResponseCode::ERROR.into()]).await?;
let encoded = message.to_ssz();
let compressed_size = write_payload(io, &encoded).await?;
metrics::observe_reqresp_response_chunk_size(label, encoded.len(), compressed_size);
Ok(())
}Documentation
Performance
Nitpick Overall: LGTM. The PR significantly improves network observability without touching consensus-critical logic. Automated review by Kimi (Moonshot AI) · kimi-k2.5 · custom prompt |
🤖 Codex Code Review
No consensus-layer correctness or security regressions stood out in this diff. The changes are confined to metrics/instrumentation; nothing here touches fork choice, attestation validation, STF, XMSS verification, or SSZ semantics beyond measuring encoded sizes. I could not complete a Automated review by OpenAI Codex · gpt-5.4 · custom prompt |
🤖 Claude Code ReviewNow I have a thorough understanding of the PR. Let me compile the review. PR Review: Add metrics for network message sizes and aggregated signature sizesOverall: A clean, well-motivated observability PR. The design (dual raw/snappy labels, Bug:
|
Greptile SummaryThis PR adds raw/snappy size histograms for gossip messages (block, attestation, aggregation) on both send and receive paths, two new req/resp size histograms (request and per-chunk response), and a histogram for aggregated proof sizes. The implementation is well-structured: Confidence Score: 5/5Safe to merge — changes are purely additive observability instrumentation with no behavior changes on the critical path. All findings are P2 (documentation style). No logic errors, data loss, or correctness issues were found. The metric measurements are accurate for the cases that occur in practice, and the implementation follows existing patterns consistently. docs/metrics.md — missing Buckets column in the Network Metrics table.
|
| Filename | Overview |
|---|---|
| crates/net/p2p/src/req_resp/encoding.rs | Adds DecodedPayload struct returning (uncompressed, compressed_size) from decode_payload, and makes write_payload return the compressed byte count. Logic is correct: compressed_size is derived from rest.len() (bytes after the varint), accurately capturing the on-wire snappy bytes for single-chunk responses. |
| crates/net/p2p/src/req_resp/codec.rs | Adds protocol_label() helper and observes request/response sizes at every read/write site. Error-response payloads are also counted in the response-chunk histogram — reasonable since they're still on-wire traffic. |
| crates/net/p2p/src/gossipsub/handler.rs | Records raw+snappy sizes on both receive and publish paths for blocks, attestations, and aggregated attestations. Compressed size is captured before decompression on receive; SSZ and compressed lengths measured after compression on publish. |
| crates/net/p2p/src/metrics.rs | Converts three gossip histograms from Histogram to HistogramVec with a compression label, and adds two new req/resp HistogramVec metrics. Helper functions observe_*_size(raw, snappy) cleanly record both label values per call. |
| crates/blockchain/src/metrics.rs | Adds LEAN_AGGREGATED_PROOF_SIZE_BYTES histogram with appropriate byte-range buckets (1 KiB–1 MiB) and a clean observe_aggregated_proof_size() helper. Correctly included in init() for eager Prometheus registration. |
| crates/blockchain/src/aggregation.rs | Observes proof.proof_data.len() right after AggregatedSignatureProof::new(). Observation is placed correctly — after the proof is built successfully, before Some(output) is returned. |
| crates/common/metrics/src/lib.rs | Re-exports HistogramVec and register_histogram_vec! so they are available workspace-wide. Minimal, correct change. |
| docs/metrics.md | Documents the new network size histograms and the aggregated proof size histogram. The Network Metrics table is missing a Buckets column present in other sections. |
Comments Outside Diff (1)
-
docs/metrics.md, line 76-88 (link)Network Metrics table missing Buckets column
The Network Metrics table header is missing a "Buckets" column, while the PQ Signature Metrics and Fork-Choice Metrics tables both include one. The five new size histograms all have non-trivial bucket choices that are worth documenting here for quick reference.
Prompt To Fix With AI
This is a comment left during a code review. Path: docs/metrics.md Line: 76-88 Comment: **Network Metrics table missing Buckets column** The Network Metrics table header is missing a "Buckets" column, while the PQ Signature Metrics and Fork-Choice Metrics tables both include one. The five new size histograms all have non-trivial bucket choices that are worth documenting here for quick reference. How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: docs/metrics.md
Line: 76-88
Comment:
**Network Metrics table missing Buckets column**
The Network Metrics table header is missing a "Buckets" column, while the PQ Signature Metrics and Fork-Choice Metrics tables both include one. The five new size histograms all have non-trivial bucket choices that are worth documenting here for quick reference.
```suggestion
| Name | Type | Usage | Sample collection event | Labels | Buckets | Supported |
```
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "Add metrics for network message sizes an..." | Re-trigger Greptile
The `lean-quickstart` genesis generator still emits a single XMSS key per validator: `config.yaml`'s `GENESIS_VALIDATORS` is a flat hex-string list, and `annotated_validators.yaml` references one `validator_N_sk.ssz` file per validator with no `attester`/`proposer` split. zeam and lantern already treat that shared key as both the attestation and proposal key for the same validator; ethlambda rejected both files outright, which prevented launching the client against the generator's output. Extend the two loaders to accept the single-key layout in addition to the canonical dual-key one: - `GenesisValidatorEntry` (`crates/common/types/src/genesis.rs`) now deserializes either a map with `attestation_pubkey` + `proposal_pubkey` or a plain hex string; the flat string is used for both roles. - `classify_role` / `read_validator_keys` in `bin/ethlambda/src/main.rs` add a `Shared` role that populates both attester and proposer slots from the same private-key file. Files with the existing `attester` / `proposer` substrings keep their prior routing. A `Shared` entry cannot be mixed with role-specific entries for the same validator. Added a round-trip test for the flat-string genesis variant. Unblocks running ethlambda against `make run-devnet` / lean-quickstart's current generator output.
This reverts commit 56ce3a2.
Motivation
We want visibility into how large the messages we send on the wire are,
and how large individual aggregated signatures get. The existing gossip
size histograms only covered the receive path and measured a single
(uncompressed) number, so we couldn't answer questions like "how much
bandwidth does snappy save?" or "how do publish sizes compare to what we
receive from the mesh?" — and we had no signal at all on req/resp sizes
or on aggregated proof sizes.
Description
New and changed metrics
lean_gossip_block_size_bytescompression=raw,snappylean_gossip_attestation_size_bytescompression=raw,snappylean_gossip_aggregation_size_bytescompression=raw,snappylean_reqresp_request_size_bytesprotocol=status,blocks_by_rootcompression=raw,snappylean_reqresp_response_chunk_size_bytesprotocol=status,blocks_by_rootcompression=raw,snappylean_aggregated_proof_size_bytesThe three existing
lean_gossip_*_size_byteshistograms gained acompressionlabel (raw= SSZ-encoded,snappy= on-wire), and arenow observed on both publish and receive paths instead of only receive.
Implementation
crates/net/p2p/src/metrics.rs: converts the three gossip sizehistograms to
HistogramVecand adds the two new req/resp histograms.Each
observe_*_size(raw, snappy)function records both points in onecall.
crates/net/p2p/src/gossipsub/handler.rs: on receive, capturemessage.data.len()as the compressed size before decompressing; onpublish (
publish_block,publish_attestation,publish_aggregated_attestation), observe bothssz_bytes.len()andcompressed.len()right after compression.crates/net/p2p/src/req_resp/encoding.rs:write_payloadnowreturns the compressed byte count;
decode_payloadreturns a smallDecodedPayload { uncompressed, compressed_size }struct. This letsthe codec record both sizes without re-encoding.
crates/net/p2p/src/req_resp/codec.rs: observes request andper-chunk response sizes for Status and BlocksByRoot at every
read/write site, using a short
protocol_label()helper to map thefull libp2p protocol id to the metric label.
crates/blockchain/src/metrics.rs: registerslean_aggregated_proof_size_bytesand itsobserve_aggregated_proof_size()helper; both included in theinit()eager-registration list.crates/blockchain/src/aggregation.rs: observes the proof sizeright after
AggregatedSignatureProof::new()inaggregate_job,using
proof.proof_data.len()— the variable-length portion of thepost-quantum aggregated signature.
crates/common/metrics/src/lib.rs: re-exportsHistogramVecandregister_histogram_vec!so labeled histograms can be usedworkspace-wide.
Devnet-compat fix (bundled, separate commit)
While validating these metrics in a local devnet I found that ethlambda
cannot launch against the current
lean-quickstartgenerator output,blocking
make run-devnetonmain. The generator still produces thesingle-key layout (one XMSS key per validator, flat-string
GENESIS_VALIDATORS, singlevalidator_N_sk.ssz), which other clients(zeam, lantern) already accept by using the key for both roles.
Commit
Accept single-key genesis + annotated-validator layoutsextends two loaders to accept either layout:
GenesisValidatorEntrydeserializes from either a map(
attestation_pubkey+proposal_pubkey) or a flat hex string (usedfor both roles).
classify_role/read_validator_keysgain aSharedrole forfiles without
attester/proposerin the name, populating both roleslots from the same key file. Mixing
Sharedwith role-specificentries for the same validator is still an error.
Round-trip test added for the flat-string genesis variant.
Documentation
docs/metrics.md: adds the newlean_aggregated_proof_size_bytesto the PQ Signature Metrics table and documents the five network size
histograms under Network Metrics.
How to Test
make fmt— passesmake lint— passes (clippy clean,-D warnings)cargo test -p ethlambda-types --lib genesis— 4 passed (includes theflat-pubkey round-trip)
cargo test -p ethlambda-metrics -p ethlambda-p2p -p ethlambda-blockchain --lib --bins— passescargo test -p ethlambda-state-transition --release— passesLive devnet scrape (4-node local devnet, 4 min uptime, 2 zeam +
2 ethlambda with ethlambda_1 as aggregator). Metrics pulled from
ethlambda_1 on the aggregator:
Attestation payloads are already tight, so snappy only shaves ~3 %.
Status frames (tiny structs) compress ~28 % (80 → 58 B).
The block, aggregation, and BlocksByRoot histograms had zero
observations — ethlambda could not produce blocks in this run
(pre-existing cross-client signature scheme mismatch with the pinned
zeam image; not related to this PR).
Notes
The
forkchoice_spectestsandsignature_spectestsfixture tests failwith the same errors on
main(XmssSignature length != 2536andProofError(InvalidProof)— pre-existing devnet-3/devnet-4 fixturemismatches). This PR does not touch signature or SSZ types.