Nexus caller timeouts by bergundy · Pull Request #2760 · temporalio/sdk-java

bergundy · 2026-01-14T00:57:17Z

What was changed

Added support for ScheduleToStartTimeout and StartToCloseTimeout for Nexus operations in the Java SDK and test server.
In the test server:
- Cap secondary timeouts to scheduleToClose timeout
- Support all three timeout types in timeoutNexusOperation()
- Calculate OPERATION_TIMEOUT header dynamically in pollNexusTaskQueue() and use the correct header format
- Terminate workflows on GRPC_MESSAGE_TOO_LARGE workflow task failures with FORCE_CLOSE_COMMAND cause, matching real server behavior
- Updated nexus failure wrapping to match real server: nexusFailureToAPIFailure no longer wraps non-temporal failures in ApplicationFailureInfo, and handlerErrorToFailure no longer sets message at the top level
Added tests for the new functionality
Bumped Temporal Server to 1.31.0 (CLI v1.6.1-server-1.31.0-151.0) and proto submodule to v1.61.0
Updated tests for server behavior changes:
- Activity timeout retry state is now RETRY_STATE_TIMEOUT for schedule-to-start and schedule-to-close timeouts (server fix for temporalio/temporal#3667)
- Workflows terminated due to gRPC message too large now fail with TerminatedFailure instead of TimeoutFailure
- Nexus operation error failures no longer include ApplicationFailureInfo wrapper
- Nexus handler error failure messages are now in the nested cause (not the top-level failure)
- Skipped tests that require test-server-specific behavior (time skipping, callback URL validation) when running against real server

Quinn-With-Two-Ns · 2026-01-14T01:54:20Z

Can we add some tests like we have for activity

sdk-java/temporal-sdk/src/test/java/io/temporal/workflow/activityTests/ActivityTimeoutTest.java

Line 51 in 2322bd0

public class ActivityTimeoutTest {

to verify the parameter is properly wired through and respected

bergundy · 2026-01-14T01:57:46Z

Can we add some tests like we have for activity

sdk-java/temporal-sdk/src/test/java/io/temporal/workflow/activityTests/ActivityTimeoutTest.java

Line 51 in 2322bd0

public class ActivityTimeoutTest {

to verify the parameter is properly wired through and respected

We have coverage in NexusWorkflowTest.java.

Quinn-With-Two-Ns · 2026-01-14T01:58:23Z

temporal-test-server/src/test/java/io/temporal/testserver/functional/NexusWorkflowTest.java

+      Assert.assertTrue("OPERATION_TIMEOUT should be positive", operationTimeoutMs > 0);
+
+      // Sleep longer than schedule-to-start timeout to trigger the timeout
+      Thread.sleep(2000);


Should be able to use testWorkflowRule.sleep here to avoid waiting real time in the time skipping test server, but I would prefer the test not use a sleep and just wait for the condition to be true. Should be possible to wait for the operation to fail by looking at the workflow describe output no?

I think the sleep is reasonable here and I don't expect it to be flaky. If you point me to an example of how to use some "wait for condition" or "eventually" in the Java tests, I can adapt that.

Quinn-With-Two-Ns · 2026-01-14T01:59:57Z

We have coverage in NexusWorkflowTest.java.

NexusWorkflowTest.java doesn't use the Java SDK so it is not providing coverage of the SDK

Quinn-With-Two-Ns · 2026-01-14T02:02:38Z

you also need to update constructPendingNexusOperationInfo in the test server I think, unless I missed that.

maciejdudko

I left a few comments about timeout calculation in test server but otherwise looks good.

maciejdudko · 2026-01-14T16:49:48Z

temporal-test-server/src/main/java/io/temporal/internal/testservice/TestWorkflowService.java

+        long elapsedSeconds = Timestamps.between(scheduledTime, currentTime).getSeconds();
+        long elapsedMillis = elapsedSeconds * 1000;


This rounds off milliseconds from the difference. At the extreme, a sub-second timeout will be rounded to zero.

Suggested change

long elapsedSeconds = Timestamps.between(scheduledTime, currentTime).getSeconds();

long elapsedMillis = elapsedSeconds * 1000;

long elapsedMillis = com.google.protobuf.util.Durations.toMillis(

Timestamps.between(scheduledTime, currentTime));

maciejdudko · 2026-01-15T15:50:16Z

temporal-test-server/src/main/java/io/temporal/internal/testservice/TestWorkflowService.java

+        // Calculate minimum of all applicable timeouts
+        Long remainingMillis = null;
+
+        if (!isStarted && scheduledEvent.hasScheduleToStartTimeout()) {


ScheduleToStartTimeout should not be part of the calculation here.

maciejdudko · 2026-01-15T16:03:17Z

temporal-test-server/src/main/java/io/temporal/internal/testservice/TestWorkflowService.java

+              com.google.protobuf.util.Durations.toMillis(
+                  scheduledEvent.getScheduleToCloseTimeout());
+          if (scheduleToCloseMillis > 0) {
+            long remaining = scheduleToCloseMillis - elapsedMillis;


This code should handle the situation where elapsedMillis is larger than scheduleToCloseMillis for some reason, just for extra safety.

Suggested change

long remaining = scheduleToCloseMillis - elapsedMillis;

long remaining = Math.max(1, scheduleToCloseMillis - elapsedMillis);

maciejdudko · 2026-01-15T16:11:35Z

temporal-test-server/src/main/java/io/temporal/internal/testservice/TestWorkflowService.java

+            long remaining = startToCloseMillis - elapsedMillis;
+            remainingMillis =
+                (remainingMillis == null) ? remaining : Math.min(remainingMillis, remaining);


StartToCloseTimeout should not have elapsed time subtracted.

Suggested change

long remaining = startToCloseMillis - elapsedMillis;

remainingMillis =

(remainingMillis == null) ? remaining : Math.min(remainingMillis, remaining);

remainingMillis =

(remainingMillis == null) ? startToCloseMillis : Math.min(remainingMillis, startToCloseMillis);

maciejdudko

There are two tests failing.

io.temporal.workflow.nexus.HeaderTest#testOperationHeaders (test server only): the test expects both request-timeout and operation-timeout headers to be always set. Currently, test server doesn't send operation-timeout if it's not set in options. The real server seems to have a default schedule-to-close timeout of 200 seconds, though I'm not sure where it comes from. The test server should match the real server and have a default timeout too.

io.temporal.testserver.functional.NexusWorkflowTest#testNexusOperationStartToCloseTimeout (real server only): there are two problems. One is that the real server sets error message as "operation timed out" but the test expects "operation timed out after starting". I think we should have a contains-substring check here rather than equality. The second I was unable to reproduce locally but it failed 2/2 times in CI - "OPERATION_TIMEOUT should be <= start-to-close timeout" assertion failed. Might be because of JDK8, or something else.

bergundy · 2026-02-05T02:38:51Z

Finally getting back to this.

There are two tests failing.

io.temporal.workflow.nexus.HeaderTest#testOperationHeaders (test server only): the test expects both request-timeout and operation-timeout headers to be always set. Currently, test server doesn't send operation-timeout if it's not set in options. The real server seems to have a default schedule-to-close timeout of 200 seconds, though I'm not sure where it comes from. The test server should match the real server and have a default timeout too.

Yeah, I noticed this, this PR fixes the bug in the Java test server where the operation-timeout would be set when it shouldn't have but looks like it's still not quite mimicking the real server. The "real" server defaults schedule to close timeout to the workflow run timeout, we need to fix this in the Java test server too.

io.temporal.testserver.functional.NexusWorkflowTest#testNexusOperationStartToCloseTimeout (real server only): there are two problems. One is that the real server sets error message as "operation timed out" but the test expects "operation timed out after starting". I think we should have a contains-substring check here rather than equality. The second I was unable to reproduce locally but it failed 2/2 times in CI - "OPERATION_TIMEOUT should be <= start-to-close timeout" assertion failed. Might be because of JDK8, or something else.

I'm changing the error messages everywhere to "operation timed out". We need to upgrade the CLI to include a server that supports the new timeouts before merging this.

The test server now caps the scheduleToCloseTimeout for Nexus operations to the workflow run timeout, matching the behavior of the real Temporal server. This ensures the operation-timeout header is properly set in tests. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

The ABANDON cancel-before-sent sub-test can start a handler workflow that sets the shared opStarted/handlerFinished signals, causing the subsequent after-start sub-test to proceed with stale signal state and fail. Clear both signals between the two sub-tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The previous commit (c2d4bcf "Use tagged protos") moved the proto submodule to 9daa310 on release/v1.60.x which doesn't include the schedule_to_start_timeout and start_to_close_timeout fields in ScheduleNexusOperationCommandAttributes. This updates it to v1.61.0 (831177c) which includes commit d2c3e7c "Add nexus caller timeouts (temporalio#695)".

Activity timeout tests now expect RETRY_STATE_TIMEOUT unconditionally, matching the server fix for temporalio/temporal#3667. GrpcMessageTooLarge tests expect TerminatedFailure with FORCE_CLOSE_COMMAND cause instead of TimeoutFailure. Updated test server to terminate workflows on GRPC_MESSAGE_TOO_LARGE failures, aligning with real server behavior.

… 1.31.0 - Update nexusFailureToAPIFailure to not wrap non-temporal failures in ApplicationFailureInfo (matching real server behavior) - Update handlerErrorToFailure to not set message at top level (message is only in the nested cause on real server) - Update NexusWorkflowTest assertions to check cause.getCause() for handler error messages - Remove applicationFailureInfo assertion from testNexusOperationError - Skip testNexusOperationTimeout_AfterCancel with real server (timeout behavior after cancel differs) - Skip testNexusOperationScheduleToStartTimeout with real server (requires time skipping) - Skip WorkflowIdConflictPolicyTest.conflictPolicyUseExisting with real server (callback URL validation rejects test URLs)

…r server 1.31.0" This reverts commit d2a6ac4. Claude mistakenly commited this.

tconley1428 · 2026-02-19T17:03:20Z

.github/workflows/ci.yml

            --dynamic-config-value frontend.workerVersioningDataAPIs=true \
-            --dynamic-config-value component.nexusoperations.recordCancelRequestCompletionEvents=true \
-            --dynamic-config-value component.callbacks.allowedAddresses='[{"Pattern":"*","AllowInsecure":true}]' \
+            --dynamic-config-value history.MaxBufferedQueryCount=100000 \


Why are we adding a bunch of (unrelated?) dynamic configs? Is this a merge conflict with us cleaning them up or are they actually needed?

Looks like a merge conflict that wasn't properly resolved. Thanks.

Quinn-With-Two-Ns · 2026-02-19T17:07:10Z

temporal-test-server/src/test/java/io/temporal/testserver/functional/NexusWorkflowTest.java


+  @Test
+  public void testNexusOperationScheduleToStartTimeout() {
+    assumeTrue(


Why are we skipping for the real server?

This is a mistake. Let me remove this.

Quinn-With-Two-Ns · 2026-02-19T17:08:56Z

temporal-sdk/src/main/java/io/temporal/internal/sync/SyncWorkflowContext.java

    attributes.setScheduleToCloseTimeout(
        ProtobufTimeUtils.toProtoDuration(input.getOptions().getScheduleToCloseTimeout()));
+    attributes.setScheduleToStartTimeout(
+        ProtobufTimeUtils.toProtoDuration(input.getOptions().getScheduleToStartTimeout()));


note: ProtobufTimeUtils.toProtoDuration will return a zero duration here not a null duration, sometimes the server treats a zero duration different then a empty duration so we need to be careful that is not the case here.

Server treats zero and null the same here.

Quinn-With-Two-Ns · 2026-02-19T17:13:40Z

temporal-test-server/src/test/java/io/temporal/testserver/functional/NexusWorkflowTest.java

      Assert.assertEquals("nexus operation completed unsuccessfully", failure.getMessage());
      io.temporal.api.failure.v1.Failure cause = failure.getCause();
      Assert.assertEquals("deliberate test failure", cause.getMessage());
-      Assert.assertTrue(cause.hasApplicationFailureInfo());


What type of failure info does the cause have now?

bergundy · 2026-02-19T17:20:56Z

...est-server/src/test/java/io/temporal/testserver/functional/WorkflowIdConflictPolicyTest.java


  @Test
  public void conflictPolicyUseExisting() {
+    org.junit.Assume.assumeTrue(


This also shouldn't have been committed.

- Update test server nexus failure wrapping to match real server behavior: remove ApplicationFailureInfo wrapping for non-temporal failures and remove redundant message from handler error top-level failure - Fix NexusWorkflowTest assertions for updated failure structure - Fix testNexusOperationTimeout_AfterCancel to handle intermediate WFTs caused by NEXUS_OPERATION_CANCEL_REQUEST_FAILED events - Fix testNexusOperationScheduleToStartTimeout timing for real server - Use unique workflow IDs in WorkflowIdConflictPolicyTest for real server - Add component.callbacks.allowedAddresses config to CI server startup

bergundy requested a review from a team as a code owner January 14, 2026 00:57

Quinn-With-Two-Ns reviewed Jan 14, 2026

View reviewed changes

maciejdudko requested changes Jan 15, 2026

View reviewed changes

maciejdudko requested changes Jan 16, 2026

View reviewed changes

bergundy requested review from Quinn-With-Two-Ns and maciejdudko January 16, 2026 18:08

bergundy force-pushed the nexus-caller-timeouts branch 3 times, most recently from fa5185d to 1bcbc59 Compare February 4, 2026 22:40

bergundy and others added 11 commits February 18, 2026 10:59

Nexus caller timeouts

72f0cab

Suppress more deprecation warnings

05aeb49

Fix timeout calculation for operation-timeout header

8433f78

Remove unneccessary sleep

18c3135

Address review comments

f946ca5

Bump server and remove unnecessary dynamic config overrides

6b40182

Bump the nexus schedule to close timeout in tests

996ba32

Use tagged protos

c2d4bcf

Bump to sever 161

fbbb14a

bergundy force-pushed the nexus-caller-timeouts branch from 0db9f72 to fbbb14a Compare February 18, 2026 18:59

bergundy added 2 commits February 18, 2026 11:27

bergundy force-pushed the nexus-caller-timeouts branch from 333a15e to f568ffd Compare February 18, 2026 21:10

Revert "Fix test server nexus failure wrapping and test assertions fo…

f828d68

…r server 1.31.0" This reverts commit d2a6ac4. Claude mistakenly commited this.

tconley1428 reviewed Feb 19, 2026

View reviewed changes

Quinn-With-Two-Ns reviewed Feb 19, 2026

View reviewed changes

bergundy commented Feb 19, 2026

View reviewed changes

bergundy added 2 commits February 19, 2026 09:24

Fix bad merge for dev server DC

1a5005e

		long elapsedSeconds = Timestamps.between(scheduledTime, currentTime).getSeconds();
		long elapsedMillis = elapsedSeconds * 1000;

	long remaining = scheduleToCloseMillis - elapsedMillis;
	long remaining = Math.max(1, scheduleToCloseMillis - elapsedMillis);

Conversation

bergundy commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What was changed

Uh oh!

Quinn-With-Two-Ns commented Jan 14, 2026

Uh oh!

bergundy commented Jan 14, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Quinn-With-Two-Ns commented Jan 14, 2026

Uh oh!

Quinn-With-Two-Ns commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maciejdudko left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maciejdudko left a comment

Choose a reason for hiding this comment

Uh oh!

bergundy commented Feb 5, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Comments

bergundy commented Jan 14, 2026 •

edited

Loading

Quinn-With-Two-Ns commented Jan 14, 2026 •

edited

Loading