feat: Refine tracing telemetry for client-side attributes#4148
feat: Refine tracing telemetry for client-side attributes#4148diegomarquezp wants to merge 17 commits intomainfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the observability capabilities by introducing a new Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a new observability attribute error.type to provide a more specific, low-cardinality classification of errors. The core logic is implemented in ObservabilityUtils.extractErrorType, which determines the error type based on a clear priority: ErrorInfo.reason, server error codes, specific client-side exceptions, and then fallbacks. The changes are well-supported by comprehensive unit tests in SpanTracerTest and integration tests in ITOtelTracing for both gRPC and HTTP/JSON transports, which is great to see.
My review includes one suggestion for a minor code simplification. Overall, this is a solid implementation.
gax-java/gax/src/main/java/com/google/api/gax/tracing/ObservabilityUtils.java
Outdated
Show resolved
Hide resolved
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a new observability attribute, error.type, to provide a low-cardinality classification of errors. The core logic is implemented in the new ErrorTypeUtil class, which extracts the error type based on a priority list, including ErrorInfo.reason, server status codes, and various client-side error categories. The changes are well-tested with both unit and integration tests.
My main feedback is regarding the implementation of client-side error detection in ErrorTypeUtil. Several checks rely on string matching on exception class names or messages (e.g., using String.contains()), which is fragile and can lead to incorrect error classification. I've left specific comments with suggestions to improve the robustness of these checks.
| private static boolean isClientRedirectError(Throwable e) { | ||
| return e.getMessage() != null && e.getMessage().contains("redirect"); | ||
| } |
There was a problem hiding this comment.
Relying on the content of an exception message with e.getMessage().contains("redirect") is highly fragile. Exception messages are not a stable API; they can be changed, might not be present, or could be localized, which would break this logic.
It would be much more robust to check for a specific exception type that indicates a redirect error. If no such standard exception exists, this approach has a high risk of being unreliable and should be documented as such.
There was a problem hiding this comment.
There are no redirect-specific exceptions in gax. This is only a heuristic to comply with the higher level effort.
| private static boolean isClientUnknownError(Throwable e) { | ||
| return e.getClass().getName().toLowerCase().contains("unknown"); | ||
| } |
There was a problem hiding this comment.
Matching unknown in a class name is very broad and risky. It could lead to misclassifying exceptions. For example, an exception from another library with unknown in its name would be incorrectly categorized as CLIENT_UNKNOWN_ERROR.
Given that UnknownHostException is already handled by isClientConnectionError, this check seems too generic. Please consider making this check more specific to the intended exception types or removing it if it's a speculative catch-all.
There was a problem hiding this comment.
Agreed we don't have a reliable heuristic that also prevents external code from being caught here. In favor of the current approach, this is the last case to be handled.
One alternative: do not handle CLIENT_UNKNOWN_ERROR.
gax-java/gax/src/main/java/com/google/api/gax/tracing/ErrorTypeUtil.java
Outdated
Show resolved
Hide resolved
gax-java/gax/src/main/java/com/google/api/gax/tracing/ErrorTypeUtil.java
Outdated
Show resolved
Hide resolved
gax-java/gax/src/main/java/com/google/api/gax/tracing/ErrorTypeUtil.java
Outdated
Show resolved
Hide resolved
error.type'error.type
error.typeerror.type
|
|
gax-java/gax/src/main/java/com/google/api/gax/tracing/ErrorTypeUtil.java
Outdated
Show resolved
Hide resolved
ldetmer
left a comment
There was a problem hiding this comment.
looks good, just one question
- Removed heuristic exception mappings for CLIENT_REDIRECT_ERROR and CLIENT_UNKNOWN_ERROR to fall back to simple class name - Refined mapping for CLIENT_AUTHENTICATION_ERROR by removing FileNotFoundException - Added unmapped placeholders to ErrorType enum - Updated unit and integration tests to verify new fallback mechanisms
…ry spans Added recursive logic to trace the final underlying exception message and updated relevant telemetry tests to test iteration logic as requested by client.
error.type…-attr/error.type # Conflicts: # gax-java/gax/src/main/java/com/google/api/gax/rpc/ResourceNameExtractor.java # gax-java/gax/src/main/java/com/google/api/gax/tracing/ObservabilityAttributes.java # gax-java/gax/src/main/java/com/google/api/gax/tracing/OpenTelemetryTraceManager.java # gax-java/gax/src/main/java/com/google/api/gax/tracing/SpanTracer.java # gax-java/gax/src/test/java/com/google/api/gax/tracing/SpanTracerTest.java





Description
This PR enhances GAX's distributed tracing telemetry to align with the latest OpenTelemetry conventions by tracking precise client-side exception metrics on spans.
Key Features
status.message: Introduces a new tracing attribute that captures human-readable error messages from failed spans. Utilizes recursive causal chain traversal to identify the exact cause and message dynamically.exception.type: Introduces exception class tracking using the fully qualified class name in line with standard diagnostic practices.error.typeRefinement: Adjusted client-side error type mapping strategy. We've introduced a clean fallback heuristic utilizing the exception class name for unresolved internal failures without masking errors under arbitrary fallback categories.mainfollowing transport attribute and tracing telemetry modifications.Testing
SpanTracerTest.java.gapic-showcaseintegration suite tests insideITOtelErrorTypeto seamlessly run with the updated attribution.