-
Notifications
You must be signed in to change notification settings - Fork 331
Fix DD_APM_TRACING_ENABLED to work with LLMObs #10989
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
matsumo-and
wants to merge
6
commits into
DataDog:master
Choose a base branch
from
matsumo-and:fix/LLMOBS-10051
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
29d3854
Fix NullPointerException in LLMObsSystem when tracing is disabled
matsumo-and 0f7a8dd
Support DD_APM_TRACING_ENABLED=false with DD_LLMOBS_ENABLED=true
matsumo-and 0029438
Improve LLMObs trace-disabled smoke test
matsumo-and 175ebff
format: spotless apply
matsumo-and b73eb94
Merge branch 'master' into fix/LLMOBS-10051
matsumo-and 69107cb
Handle both LLMObs and ASM enabled in standalone sampler
matsumo-and File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
53 changes: 53 additions & 0 deletions
53
dd-java-agent/agent-llmobs/src/test/groovy/datadog/trace/llmobs/LLMObsSystemTest.groovy
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,53 @@ | ||
| package datadog.trace.llmobs | ||
|
|
||
| import datadog.communication.ddagent.SharedCommunicationObjects | ||
| import datadog.trace.test.util.DDSpecification | ||
| import okhttp3.HttpUrl | ||
|
|
||
| class LLMObsSystemTest extends DDSpecification { | ||
|
|
||
| void 'start disabled when llmobs is disabled'() { | ||
| setup: | ||
| injectSysConfig('llmobs.enabled', 'false') | ||
| rebuildConfig() | ||
| final inst = Mock(java.lang.instrument.Instrumentation) | ||
| final sco = Mock(SharedCommunicationObjects) | ||
|
|
||
| when: | ||
| LLMObsSystem.start(inst, sco) | ||
|
|
||
| then: | ||
| 0 * sco._ | ||
| } | ||
|
|
||
| void 'start disabled when trace is disabled'() { | ||
| setup: | ||
| injectSysConfig('llmobs.enabled', 'true') | ||
| injectSysConfig('trace.enabled', 'false') | ||
| rebuildConfig() | ||
| final inst = Mock(java.lang.instrument.Instrumentation) | ||
| final sco = Mock(SharedCommunicationObjects) | ||
|
|
||
| when: | ||
| LLMObsSystem.start(inst, sco) | ||
|
|
||
| then: | ||
| 0 * sco._ | ||
| } | ||
|
|
||
| void 'start enabled when apm tracing disabled but llmobs enabled'() { | ||
| setup: | ||
| injectSysConfig('llmobs.enabled', 'true') | ||
| injectSysConfig('apm.tracing.enabled', 'false') | ||
| rebuildConfig() | ||
| final inst = Mock(java.lang.instrument.Instrumentation) | ||
| final sco = Mock(SharedCommunicationObjects) | ||
| sco.agentUrl = HttpUrl.parse('http://localhost:8126') | ||
|
|
||
| when: | ||
| LLMObsSystem.start(inst, sco) | ||
|
|
||
| then: | ||
| 1 * sco.createRemaining(_) | ||
| } | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
92 changes: 92 additions & 0 deletions
92
...ed/src/test/groovy/datadog/smoketest/apmtracingdisabled/LlmObsApmDisabledSmokeTest.groovy
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,92 @@ | ||
| package datadog.smoketest.apmtracingdisabled | ||
|
|
||
| import datadog.trace.api.sampling.PrioritySampling | ||
| import okhttp3.Request | ||
|
|
||
| class LlmObsApmDisabledSmokeTest extends AbstractApmTracingDisabledSmokeTest { | ||
|
|
||
| static final String LLMOBS_SERVICE_NAME = "llmobs-apm-disabled-test" | ||
|
|
||
| static final String[] LLMOBS_APM_DISABLED_PROPERTIES = [ | ||
| "-Ddd.apm.tracing.enabled=false", | ||
| "-Ddd.llmobs.enabled=true", | ||
| "-Ddd.llmobs.ml-app=test-app", | ||
| "-Ddd.service.name=${LLMOBS_SERVICE_NAME}", | ||
| ] | ||
|
|
||
| @Override | ||
| ProcessBuilder createProcessBuilder() { | ||
| return createProcess(LLMOBS_APM_DISABLED_PROPERTIES) | ||
| } | ||
|
|
||
| void 'When APM disabled and LLMObs enabled, LLMObs spans should be kept and APM spans should be dropped'() { | ||
| setup: | ||
| final llmobsUrl = "http://localhost:${httpPort}/rest-api/llmobs/test" | ||
| final llmobsRequest = new Request.Builder().url(llmobsUrl).get().build() | ||
|
|
||
| final apmUrl = "http://localhost:${httpPort}/rest-api/greetings" | ||
| final apmRequest = new Request.Builder().url(apmUrl).get().build() | ||
|
|
||
| when: "Create LLMObs span" | ||
| final llmobsResponse = client.newCall(llmobsRequest).execute() | ||
|
|
||
| then: "LLMObs request should succeed" | ||
| llmobsResponse.successful | ||
|
|
||
| when: "Create regular APM span" | ||
| final apmResponse = client.newCall(apmRequest).execute() | ||
|
|
||
| then: "APM request should succeed" | ||
| apmResponse.successful | ||
|
|
||
| and: "Wait for traces" | ||
| waitForTraceCount(2) | ||
|
|
||
| and: "LLMObs trace should be kept (SAMPLER_KEEP)" | ||
| def llmobsTrace = traces.find { trace -> | ||
| trace.spans.find { span -> | ||
| span.meta["http.url"] == llmobsUrl | ||
| } | ||
| } | ||
| assert llmobsTrace != null | ||
| // The LLMObs child span should have LLMObs tags | ||
| def llmobsChildSpan = llmobsTrace.spans.find { span -> | ||
| span.meta["_ml_obs_tag.model_name"] == "gpt-4" | ||
| } | ||
| assert llmobsChildSpan != null : "LLMObs child span with model_name=gpt-4 should exist" | ||
|
|
||
| and: "Regular APM trace should be dropped (SAMPLER_DROP)" | ||
| def apmTrace = traces.find { trace -> | ||
| trace.spans.find { span -> | ||
| span.meta["http.url"] == apmUrl | ||
| } | ||
| } | ||
| assert apmTrace != null | ||
| checkRootSpanPrioritySampling(apmTrace, PrioritySampling.SAMPLER_DROP) | ||
|
|
||
| and: "No NPE or errors in logs" | ||
| !isLogPresent { it.contains("NullPointerException") } | ||
| !isLogPresent { it.contains("ERROR") } | ||
| } | ||
|
|
||
| void 'LLMObs spans should have PROPAGATED_TRACE_SOURCE tag set'() { | ||
| setup: | ||
| final llmobsUrl = "http://localhost:${httpPort}/rest-api/llmobs/test" | ||
| final llmobsRequest = new Request.Builder().url(llmobsUrl).get().build() | ||
|
|
||
| when: | ||
| final response = client.newCall(llmobsRequest).execute() | ||
|
|
||
| then: | ||
| response.successful | ||
| waitForTraceCount(1) | ||
|
|
||
| and: "LLMObs span should be created successfully" | ||
| def trace = traces[0] | ||
| assert trace != null | ||
| def llmobsSpan = trace.spans.find { span -> | ||
| span.meta["_ml_obs_tag.model_name"] == "gpt-4" | ||
| } | ||
| assert llmobsSpan != null : "LLMObs span with model_name should exist" | ||
| } | ||
| } |
34 changes: 34 additions & 0 deletions
34
.../src/test/groovy/datadog/smoketest/apmtracingdisabled/LlmObsTraceDisabledSmokeTest.groovy
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,34 @@ | ||
| package datadog.smoketest.apmtracingdisabled | ||
|
|
||
| import okhttp3.Request | ||
|
|
||
| class LlmObsTraceDisabledSmokeTest extends AbstractApmTracingDisabledSmokeTest { | ||
|
|
||
| static final String[] LLMOBS_TRACE_DISABLED_PROPERTIES = [ | ||
| "-Ddd.trace.enabled=false", | ||
| "-Ddd.llmobs.enabled=true", | ||
| "-Ddd.llmobs.ml-app=test-app", | ||
| "-Ddd.service.name=llmobs-trace-disabled-test", | ||
| ] | ||
|
|
||
| @Override | ||
| ProcessBuilder createProcessBuilder() { | ||
| return createProcess(LLMOBS_TRACE_DISABLED_PROPERTIES) | ||
| } | ||
|
|
||
| void 'DD_TRACE_ENABLED=false with DD_LLMOBS_ENABLED=true should disable LLMObs gracefully'() { | ||
| setup: | ||
| final llmobsUrl = "http://localhost:${httpPort}/rest-api/llmobs/test" | ||
| final llmobsRequest = new Request.Builder().url(llmobsUrl).get().build() | ||
|
|
||
| when: "Call LLMObs endpoint" | ||
| final response = client.newCall(llmobsRequest).execute() | ||
|
|
||
| then: "Request should succeed" | ||
| response.successful | ||
| response.code() == 200 | ||
|
|
||
| and: "LLMObs disabled message in logs" | ||
| isLogPresent { it.contains("LLM Observability is disabled: tracing is disabled") } | ||
| } | ||
| } |
73 changes: 73 additions & 0 deletions
73
dd-trace-core/src/main/java/datadog/trace/common/sampling/LlmObsAndAsmStandaloneSampler.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,73 @@ | ||
| package datadog.trace.common.sampling; | ||
|
|
||
| import static datadog.trace.api.sampling.PrioritySampling.SAMPLER_DROP; | ||
| import static datadog.trace.api.sampling.PrioritySampling.SAMPLER_KEEP; | ||
|
|
||
| import datadog.trace.api.ProductTraceSource; | ||
| import datadog.trace.api.sampling.SamplingMechanism; | ||
| import datadog.trace.core.CoreSpan; | ||
| import datadog.trace.core.DDSpan; | ||
| import java.time.Clock; | ||
| import java.util.concurrent.atomic.AtomicLong; | ||
| import org.slf4j.Logger; | ||
| import org.slf4j.LoggerFactory; | ||
|
|
||
| /** | ||
| * This sampler is used when APM tracing is disabled but both LLM Observability and ASM are enabled. | ||
| * It keeps all LLMObs and ASM traces, and allows 1 APM trace per minute for billing/service catalog | ||
| * purposes. | ||
| */ | ||
| public class LlmObsAndAsmStandaloneSampler implements Sampler, PrioritySampler { | ||
|
|
||
| private static final Logger log = LoggerFactory.getLogger(LlmObsAndAsmStandaloneSampler.class); | ||
| private static final int RATE_IN_MILLISECONDS = 60000; // 1 minute | ||
|
|
||
| private final AtomicLong lastSampleTime; | ||
| private final Clock clock; | ||
|
|
||
| public LlmObsAndAsmStandaloneSampler(final Clock clock) { | ||
| this.clock = clock; | ||
| this.lastSampleTime = new AtomicLong(clock.millis() - RATE_IN_MILLISECONDS); | ||
| } | ||
|
|
||
| @Override | ||
| public <T extends CoreSpan<T>> boolean sample(final T span) { | ||
| // Priority sampling sends all traces to the core agent, including traces marked dropped. | ||
| // This allows the core agent to collect stats on all traces. | ||
| return true; | ||
| } | ||
|
|
||
| @Override | ||
| public <T extends CoreSpan<T>> void setSamplingPriority(final T span) { | ||
| T rootSpan = span.getLocalRootSpan(); | ||
| if (rootSpan instanceof DDSpan) { | ||
| DDSpan ddRootSpan = (DDSpan) rootSpan; | ||
| int traceSource = ddRootSpan.context().getPropagationTags().getTraceSource(); | ||
| if (ProductTraceSource.isProductMarked(traceSource, ProductTraceSource.LLMOBS)) { | ||
| log.debug("Set SAMPLER_KEEP for LLMObs span {}", span.getSpanId()); | ||
| span.setSamplingPriority(SAMPLER_KEEP, SamplingMechanism.DEFAULT); | ||
| return; | ||
| } | ||
| if (ProductTraceSource.isProductMarked(traceSource, ProductTraceSource.ASM)) { | ||
| log.debug("Set SAMPLER_KEEP for ASM span {}", span.getSpanId()); | ||
| span.setSamplingPriority(SAMPLER_KEEP, SamplingMechanism.APPSEC); | ||
| return; | ||
| } | ||
| } | ||
| // For APM-only traces, allow 1 per minute for billing/catalog purposes | ||
| if (shouldSample()) { | ||
| log.debug("Set SAMPLER_KEEP for APM span {}", span.getSpanId()); | ||
| span.setSamplingPriority(SAMPLER_KEEP, SamplingMechanism.APPSEC); | ||
| } else { | ||
| log.debug("Set SAMPLER_DROP for APM span {}", span.getSpanId()); | ||
| span.setSamplingPriority(SAMPLER_DROP, SamplingMechanism.APPSEC); | ||
| } | ||
| } | ||
|
|
||
| private boolean shouldSample() { | ||
| long now = clock.millis(); | ||
| return lastSampleTime.updateAndGet( | ||
| lastTime -> now - lastTime >= RATE_IN_MILLISECONDS ? now : lastTime) | ||
| == now; | ||
| } | ||
| } |
47 changes: 47 additions & 0 deletions
47
dd-trace-core/src/main/java/datadog/trace/common/sampling/LlmObsStandaloneSampler.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,47 @@ | ||
| package datadog.trace.common.sampling; | ||
|
|
||
| import static datadog.trace.api.sampling.PrioritySampling.SAMPLER_DROP; | ||
| import static datadog.trace.api.sampling.PrioritySampling.SAMPLER_KEEP; | ||
|
|
||
| import datadog.trace.api.ProductTraceSource; | ||
| import datadog.trace.api.sampling.SamplingMechanism; | ||
| import datadog.trace.core.CoreSpan; | ||
| import org.slf4j.Logger; | ||
| import org.slf4j.LoggerFactory; | ||
|
|
||
| /** | ||
| * This sampler is used when APM tracing is disabled but LLM Observability is enabled. Unlike ASM | ||
| * standalone mode which only needs 1 trace per minute for billing/catalog purposes, LLM | ||
| * Observability needs to capture all LLM interactions to track costs, latency, and quality metrics. | ||
| * Therefore, this sampler keeps all LLMOBS traces and drops all APM-only traces. | ||
| */ | ||
| public class LlmObsStandaloneSampler implements Sampler, PrioritySampler { | ||
|
|
||
| private static final Logger log = LoggerFactory.getLogger(LlmObsStandaloneSampler.class); | ||
|
|
||
| @Override | ||
| public <T extends CoreSpan<T>> boolean sample(final T span) { | ||
| // Priority sampling sends all traces to the core agent, including traces marked dropped. | ||
| // This allows the core agent to collect stats on all traces. | ||
| return true; | ||
| } | ||
|
|
||
| @Override | ||
| public <T extends CoreSpan<T>> void setSamplingPriority(final T span) { | ||
| // Only keep traces that have the LLMOBS product flag | ||
| // Drop regular APM traces when APM tracing is disabled | ||
| T rootSpan = span.getLocalRootSpan(); | ||
| if (rootSpan instanceof datadog.trace.core.DDSpan) { | ||
| datadog.trace.core.DDSpan ddRootSpan = (datadog.trace.core.DDSpan) rootSpan; | ||
| int traceSource = ddRootSpan.context().getPropagationTags().getTraceSource(); | ||
| if (ProductTraceSource.isProductMarked(traceSource, ProductTraceSource.LLMOBS)) { | ||
| log.debug("Set SAMPLER_KEEP for LLMObs span {}", span.getSpanId()); | ||
| span.setSamplingPriority(SAMPLER_KEEP, SamplingMechanism.DEFAULT); | ||
| return; | ||
| } | ||
| } | ||
| // Drop APM-only traces when APM tracing is disabled | ||
| log.debug("Set SAMPLER_DROP for APM-only span {}", span.getSpanId()); | ||
| span.setSamplingPriority(SAMPLER_DROP, SamplingMechanism.DEFAULT); | ||
| } | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears this would break ASM. If ASM is enabled, we still need to pass, at least, 1 APM/ASM per minute.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@smola
Thanks for pointing this out!
When both are enabled,
LlmObsStandaloneSamplerwas returned and the ASM branch was never reached.Fixed by adding
LlmObsAndAsmStandaloneSamplerthat keeps all LLMObs/ASM traces while still allowing 1 APM trace per minute for billing.69107cb