feat: add duration and traceId to degraded events#8455
feat: add duration and traceId to degraded events#8455cryptodev-2s wants to merge 19 commits intomainfrom
duration and traceId to degraded events#8455Conversation
e3eb1a1 to
c4cf4d5
Compare
|
@metamaskbot publish-preview |
|
Preview builds have been published. Learn how to use preview builds in other projects. Expand for full list of packages and versions. |
b9ab587 to
e8262db
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 4fd9aed. Configure here.
|
@metamaskbot publish-preview |
1 similar comment
|
@metamaskbot publish-preview |
|
Preview builds have been published. Learn how to use preview builds in other projects. Expand for full list of packages and versions. |
e61cfa5 to
10f285d
Compare
When a request succeeds but takes longer than degradedThreshold,
the onDegraded event now emits { duration } instead of void, so
higher layers can include the actual request duration in their
event payloads.
…stable onDegraded type
…ugh RpcServiceChain
10f285d to
d6985be
Compare
|
@metamaskbot publish-preview |
|
Preview builds have been published. Learn how to use preview builds in other projects. Expand for full list of packages and versions. |
|
|
||
| ### Changed | ||
|
|
||
| - **BREAKING:** The `ServicePolicy` type's `onDegraded` event now emits `{ duration: number }` instead of `void` when the service succeeds but takes longer than the `degradedThreshold` ([#8455](https://github.com/MetaMask/core/pull/8455)) |
Remove internal metrics terminology from comments, replace as-cast with typeof narrowing, update finally block comment to mention both properties, add test for stale traceId prevention, and fix test description wording.
|
@metamaskbot publish-preview |
| const commonFields = { | ||
| endpointUrl: this.endpointUrl.toString(), | ||
| rpcMethodName: this.#currentRpcMethodName, | ||
| traceId: this.#currentTraceId, | ||
| }; | ||
| if (hasProperty(data, 'duration') && typeof data.duration === 'number') { | ||
| listener({ | ||
| endpointUrl: this.endpointUrl.toString(), | ||
| rpcMethodName: this.#currentRpcMethodName, | ||
| ...commonFields, | ||
| duration: data.duration, | ||
| }); | ||
| } else { | ||
| listener({ | ||
| ...data, | ||
| endpointUrl: this.endpointUrl.toString(), | ||
| rpcMethodName: this.#currentRpcMethodName, | ||
| ...commonFields, | ||
| duration: undefined, | ||
| }); | ||
| } |
There was a problem hiding this comment.
There still seem to be some type errors here. I think this can just be:
| const commonFields = { | |
| endpointUrl: this.endpointUrl.toString(), | |
| rpcMethodName: this.#currentRpcMethodName, | |
| traceId: this.#currentTraceId, | |
| }; | |
| if (hasProperty(data, 'duration') && typeof data.duration === 'number') { | |
| listener({ | |
| endpointUrl: this.endpointUrl.toString(), | |
| rpcMethodName: this.#currentRpcMethodName, | |
| ...commonFields, | |
| duration: data.duration, | |
| }); | |
| } else { | |
| listener({ | |
| ...data, | |
| endpointUrl: this.endpointUrl.toString(), | |
| rpcMethodName: this.#currentRpcMethodName, | |
| ...commonFields, | |
| duration: undefined, | |
| }); | |
| } | |
| listener({ | |
| ...data, | |
| endpointUrl: this.endpointUrl.toString(), | |
| rpcMethodName: this.#currentRpcMethodName, | |
| traceId: this.#currentTraceId, | |
| }); |
|
|
||
| ### Changed | ||
|
|
||
| - **BREAKING:** The `RpcServiceRequestable` type's `onDegraded` listener now receives `duration?: number` and `traceId?: string` in its data parameter ([#8455](https://github.com/MetaMask/core/pull/8455)) |
There was a problem hiding this comment.
I've been meaning to get rid of this type as we don't really need it anymore. If we change RpcService not to implement this interface anymore then we don't need to change this type at all, and we can avoid the breaking change here. May be worth doing that before merging this.
There was a problem hiding this comment.
I've removed the changes from RpcServiceRequestable as well as the changelog here: 0e695d0
| payload: [ | ||
| { | ||
| chainId: Hex; | ||
| duration?: number; |
There was a problem hiding this comment.
I have to check to make sure that this also isn't a breaking change. I thought at first it wasn't, but then given that event payloads show up as arguments to callbacks, I am having second doubts.

Explanation
RPC endpoint degraded events (
NetworkController:rpcEndpointDegradedandNetworkController:rpcEndpointChainDegraded) now include two new optional properties in their payloads:duration(number | undefined): The policy execution time in milliseconds when the request succeeded but was slow (i.e., exceeded thedegradedThreshold). This isundefinedwhen the degraded event was caused by retries being exhausted.traceId(string | undefined): The value of thex-trace-idresponse header from the last request attempt. This enables correlating degraded events with backend traces for debugging RPC health issues. It isundefinedwhen no response was received or the header was not present.How it works
The
durationvalue originates from Cockatiel'sretryPolicy.onSuccesscallback increateServicePolicy. Previously, it was only used for the threshold comparison — now it is also emitted as part of the degraded event data.The
traceIdis captured inRpcServicefromresponse.headers.get('x-trace-id')in the samefinallyblock that tracksrpcMethodName, ensuring it reflects the last completed request attempt (handling concurrent request race conditions correctly).Both values are threaded through the event chain:
createServicePolicy→RpcService→RpcServiceChain→createNetworkClient→ messenger events.Breaking change
The
RpcServiceRequestabletype'sonDegradedlistener signature now includesduration?: numberandtraceId?: stringin its data parameter. Implementors of this interface will need to accept the new fields in theironDegradedcallback signature.References
Checklist
Note
Medium Risk
Introduces a breaking change to
ServicePolicy.onDegradedpayload typing and propagates new fields through RPC degraded event publishing; consumers may need updates to listener signatures and payload handling.Overview
Degraded RPC events now carry more diagnostics.
ServicePolicy.onDegradedno longer emitsvoidon slow-success; it emits{ duration: number }, andRpcServiceadditionally captures the latestx-trace-idresponse header.These values are threaded through
RpcService/RpcServiceChainintoNetworkController:rpcEndpointDegradedandNetworkController:rpcEndpointChainDegradedpayloads as optionaldurationandtraceId, with tests and changelogs updated accordingly.Reviewed by Cursor Bugbot for commit 0e695d0. Bugbot is set up for automated code reviews on this repo. Configure here.