-
Notifications
You must be signed in to change notification settings - Fork 328
Description
Tracer Version(s)
1.55.0
Java Version(s)
21.0.9
JVM Vendor
Eclipse Adoptium / Temurin
Bug Report
The dd_oome_notifier.sh (and dd_crash_uploader.sh) scripts spawned via -XX:OnOutOfMemoryError do not unset JDK_JAVA_OPTIONS, JAVA_TOOL_OPTIONS, or _JAVA_OPTIONS before launching a child java process. The child JVM therefore inherits the full application JVM configuration, which causes three distinct problems:
- Port conflicts — flags like JMX remote (
-Dcom.sun.management.jmxremote.port=9012) causeBindExceptionbecause the parent JVM still holds the port - cgroup OOMKill — memory flags like
-Xms/-Xmxor-XX:MaxRAMPercentage=90cause the child JVM to compete with the still-alive parent for container memory, potentially triggering a kernel OOMKill - Lost OOM diagnostics — when the script fails for any of the above reasons, no OOME event reaches Datadog, and the original OOM exception details (stack trace, thread name) are also lost because
-XX:+ExitOnOutOfMemoryErrorforce-terminates after the handler runs
Actual output:
# java.lang.OutOfMemoryError: Metaspace
# -XX:OnOutOfMemoryError="/tmp/datadog/java/dd_oome_notifier.sh %p"
# Executing /bin/sh -c "/tmp/datadog/java/dd_oome_notifier.sh 1"...
Agent Jar: /opt/datadog/apm/library/java/dd-java-agent.jar
Tags: host:order-664fc65797-2bclc,...
JAVA_HOME: /opt/java/openjdk
PID: 1
NOTE: Picked up JDK_JAVA_OPTIONS: -XX:MaxGCPauseMillis=4000 -XX:MinRAMPercentage=25 -XX:MaxRAMPercentage=90 -XX:MaxMetaspaceSize=128m -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=9012 -Dcom.sun.management.jmxremote.rmi.port=9012 ...
Picked up JAVA_TOOL_OPTIONS: -javaagent:/opt/datadog/apm/library/java/dd-java-agent.jar ...
Caused by: java.rmi.server.ExportException: Port already in use: 9012; nested exception is:
java.net.BindException: Address already in use
...
Error: Failed to generate OOME event
Terminating due to java.lang.OutOfMemoryError: Metaspace
Expected Behavior
The OOME event should be sent to Datadog successfully, regardless of what JDK_JAVA_OPTIONS or JAVA_TOOL_OPTIONS contain. When the script fails, the original OOM exception details (stack trace, thread name) should still be visible in the application logs.
Reproduction Code
-
Configure a JVM application with JMX remote monitoring via
JDK_JAVA_OPTIONS:JDK_JAVA_OPTIONS=-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=9012 -Dcom.sun.management.jmxremote.rmi.port=9012 ... -
Have dd-java-agent injected (e.g. via Admission Controller), which sets:
JAVA_TOOL_OPTIONS=-javaagent:/opt/datadog/apm/library/java/dd-java-agent.jar -XX:OnOutOfMemoryError="/tmp/datadog/java/dd_oome_notifier.sh %p" -
Trigger an
OutOfMemoryError(in our case:java.lang.OutOfMemoryError: Metaspace) -
The JVM invokes
dd_oome_notifier.sh, which spawns a child java process:"$config_java_home/bin/java" -Ddd.dogstatsd.start-delay=0 -jar "$config_agent" sendOomeEvent "$config_tags"
-
This child process inherits
JDK_JAVA_OPTIONS(including JMX port flags) andJAVA_TOOL_OPTIONS(including the agent jar) from the environment. -
The child JVM tries to bind JMX to port 9012, which is still held by the dying parent JVM →
BindException: Address already in use→ "Error: Failed to generate OOME event"