[PECOBLR-1383] Add statement execution hooks for telemetry collection#321
Conversation
db32fa3 to
f388244
Compare
f1c4641 to
a5ed499
Compare
a5ed499 to
36e8f99
Compare
ef71fd9 to
3b9923d
Compare
This commit completes the telemetry implementation by adding hooks to QueryContext and ExecContext methods to collect actual metrics. Changes: - Export BeforeExecute(), AfterExecute(), CompleteStatement() methods in telemetry.Interceptor for use by driver package - Add telemetry hooks to connection.QueryContext(): - Call BeforeExecute() with statement ID from operation handle GUID - Use defer to call AfterExecute() and CompleteStatement() - Add telemetry hooks to connection.ExecContext(): - Call BeforeExecute() with statement ID from operation handle GUID - Use defer to call AfterExecute() and CompleteStatement() - Handle both err and stagingErr for proper error reporting - Update DESIGN.md: - Mark Phase 6 as completed (all checklist items) - Add statement execution hooks to Phase 7 checklist Testing: - All 99 telemetry tests passing - All driver tests passing (58.576s) - No breaking changes to existing functionality This enables end-to-end telemetry collection from statement execution through aggregation and export to the Databricks telemetry service. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
3b9923d to
8fe174d
Compare
These functions/types are now used by the exported BeforeExecute, AfterExecute, and CompleteStatement methods wired into connection.go, so the unused suppression directives are no longer needed. Co-authored-by: samikshya-chand_data
vikrantpuppala
left a comment
There was a problem hiding this comment.
let's add some tests, thanks
connection.go
Outdated
| var statementID string | ||
| if c.telemetry != nil && exStmtResp != nil && exStmtResp.OperationHandle != nil && exStmtResp.OperationHandle.OperationId != nil { | ||
| statementID = client.SprintGuid(exStmtResp.OperationHandle.OperationId.GUID) | ||
| ctx = c.telemetry.BeforeExecute(ctx, statementID) |
There was a problem hiding this comment.
shouldn't this be before runQuery?
There was a problem hiding this comment.
we should write some correctness tests. how do we know the values telemetry is sending are even correct (so that we avoid bugs like these)
There was a problem hiding this comment.
Thanks for the review @vikrantpuppala
The follow up PR #322 already covers a lot of mock based tests on telemetry. And, it also addresses the runQuery issue in QueryContext, I will add more of these correctness tests in #322 PR.
There was a problem hiding this comment.
can we fix the fact that ctx = c.telemetry.BeforeExecute(ctx, statementID) needs to be called before runQuery?
There was a problem hiding this comment.
Accidentally deleted my previous comment inplace of editing it :
Both ExecContext and QueryContext now capture the complete query execution time in telemetry
for both PRs.
| } | ||
| c.telemetry.AfterExecute(ctx, finalErr) | ||
| c.telemetry.CompleteStatement(ctx, statementID, finalErr != nil) | ||
| }() |
There was a problem hiding this comment.
if the CloseOperation call fails below, that error is logged but never reflected in telemetry so we're missing capturing some errors in telemetry. Not sure if that's intended?
There was a problem hiding this comment.
Good point, fixing this.
|
|
||
| case "error": | ||
| // Check if terminal error | ||
| if metric.errorType != "" && isTerminalError(&simpleError{msg: metric.errorType}) { |
There was a problem hiding this comment.
why are we using simpleError here?
There was a problem hiding this comment.
This is the most light-weight string mapping of the error in this (potentially) hot-path. Let me know if you have any specific suggestion though.
… CloseOperation error tracking This commit addresses two key review comments from @vikrantpuppala on PR #321: 1. **ExecContext timing fix**: Capture execution start time BEFORE running query - Now captures `executeStart := time.Now()` before `runQuery()` call - Uses `BeforeExecuteWithTime()` with pre-captured timestamp - Matches the pattern already implemented in QueryContext - Ensures telemetry accurately measures actual query execution time 2. **CloseOperation error tracking**: Capture cleanup errors in telemetry - Added `closeOpErr` variable to track CloseOperation failures - Includes CloseOperation errors in telemetry's deferred function - Provides observability for resource cleanup issues - Operation still returns success to caller (cleanup is best-effort) These changes ensure telemetry captures the complete statement lifecycle, including both execution timing and cleanup operations, without impacting the caller's error handling semantics. Co-authored-by: Isaac
This addresses @vikrantpuppala's review comment: "if the CloseOperation call fails below, that error is logged but never reflected in telemetry so we're missing capturing some errors in telemetry" Changes: - Added closeOpErr variable to capture CloseOperation failures - Include CloseOperation errors in telemetry's deferred function - Provides observability for resource cleanup issues - Operation still returns success to caller (cleanup is best-effort) Note: The timing fix ("shouldn't this be before runQuery?") will be addressed in the follow-up PR once BeforeExecuteWithTime infrastructure is available.
… CloseOperation error tracking This commit addresses two key review comments from @vikrantpuppala on PR #321: 1. **ExecContext timing fix**: Capture execution start time BEFORE running query - Now captures `executeStart := time.Now()` before `runQuery()` call - Uses `BeforeExecuteWithTime()` with pre-captured timestamp - Matches the pattern already implemented in QueryContext - Ensures telemetry accurately measures actual query execution time 2. **CloseOperation error tracking**: Capture cleanup errors in telemetry - Added `closeOpErr` variable to track CloseOperation failures - Includes CloseOperation errors in telemetry's deferred function - Provides observability for resource cleanup issues - Operation still returns success to caller (cleanup is best-effort) These changes ensure telemetry captures the complete statement lifecycle, including both execution timing and cleanup operations, without impacting the caller's error handling semantics. Co-authored-by: Isaac
…cution This addresses the second part of @vikrantpuppala's review comment: "shouldn't this be before runQuery?" Changes: - Capture executeStart = time.Now() BEFORE calling runQuery() - Use BeforeExecuteWithTime() with the pre-captured timestamp - Ensures telemetry measures actual query execution time accurately Without this fix, telemetry would miss ~100-1000μs of execution time (the time between query start and getting the operation handle). Now ExecContext matches the pattern already implemented in QueryContext.
- Add sessionID field to metricContext struct - Update BeforeExecute to accept sessionID parameter - Add BeforeExecuteWithTime method for custom start times - Update connection.go to pass sessionID in BeforeExecute call This enables proper session tracking in telemetry and allows capturing accurate execution times by providing a custom start time.
Capture execution start time before runQuery and use BeforeExecuteWithTime to ensure telemetry accurately reflects actual query execution time. This completes the timing fix for both ExecContext and QueryContext.
Summary
This stacked PR builds on #320 and adds statement execution hooks to complete end-to-end telemetry collection.
Stack: Part 3 of 3
Changes
Exported Methods for Driver Integration
telemetry/interceptor.goBeforeExecute()- starts metric tracking for a statementAfterExecute()- records metric with timing and error infoAddTag()- adds tags to current metric contextCompleteStatement()- marks statement complete and flushesStatement Execution Hooks
connection.go✅ Added hooks to
QueryContext():BeforeExecute()with statement ID from operation handle GUIDAfterExecute()andCompleteStatement()✅ Added hooks to
ExecContext():BeforeExecute()with statement IDAfterExecute()andCompleteStatement()Documentation
telemetry/DESIGN.mdIntegration Flow
Testing
All tests passing ✅
End-to-End Telemetry
With this PR, the telemetry system is fully functional end-to-end:
Related Issues
Checklist