Skip to content

feat(cli): auto-cancel dev runs on CLI exit via detached watchdog#3191

Merged
ericallam merged 6 commits intomainfrom
feature/tri-7779-we-need-to-cancel-runs-when-the-cli-exists-again-to-prevent
Mar 9, 2026
Merged

feat(cli): auto-cancel dev runs on CLI exit via detached watchdog#3191
ericallam merged 6 commits intomainfrom
feature/tri-7779-we-need-to-cancel-runs-when-the-cli-exists-again-to-prevent

Conversation

@ericallam
Copy link
Member

When the dev CLI exits (e.g. ctrl+c via pnpm), runs that were mid-execution
previously stayed stuck in EXECUTING status for up to 5 minutes until the
heartbeat timeout fired. Now they are cancelled within seconds.

The dev CLI spawns a lightweight detached watchdog process at startup. The
watchdog monitors the CLI process ID and, when it detects the CLI has exited,
calls a new POST /engine/v1/dev/disconnect endpoint to cancel all in-flight
runs immediately (skipping PENDING_CANCEL since the worker is known to be dead).

Watchdog design:

  • Fully detached (detached: true, stdio: ignore, unref()) so it survives
    even when pnpm sends SIGKILL to the process tree
  • Active run IDs maintained via atomic file write (.trigger/active-runs.json)
  • Single-instance guarantee via PID file (.trigger/watchdog.pid)
  • Safety timeout: exits after 24 hours to prevent zombie processes
  • On clean shutdown, the watchdog is killed (no disconnect needed)

Disconnect endpoint:

  • Rate-limited: 5 calls/min per environment
  • Capped at 500 runs per call
  • Small counts (<= 25): cancelled inline with pMap concurrency 10
  • Large counts: delegated to the bulk action system
  • Uses finalizeRun: true to skip PENDING_CANCEL and go straight to FINISHED

Run engine change:

  • cancelRun() now respects finalizeRun when the run is in EXECUTING status,
    skipping the PENDING_CANCEL waiting state and going directly to FINISHED

When the dev CLI exits (e.g. ctrl+c via pnpm), runs that were mid-execution
previously stayed stuck in EXECUTING status for up to 5 minutes until the
heartbeat timeout fired. Now they are cancelled within seconds.

The dev CLI spawns a lightweight detached watchdog process at startup. The
watchdog monitors the CLI process ID and, when it detects the CLI has exited,
calls a new POST /engine/v1/dev/disconnect endpoint to cancel all in-flight
runs immediately (skipping PENDING_CANCEL since the worker is known to be dead).

Watchdog design:
- Fully detached (detached: true, stdio: ignore, unref()) so it survives
  even when pnpm sends SIGKILL to the process tree
- Active run IDs maintained via atomic file write (.trigger/active-runs.json)
- Single-instance guarantee via PID file (.trigger/watchdog.pid)
- Safety timeout: exits after 24 hours to prevent zombie processes
- On clean shutdown, the watchdog is killed (no disconnect needed)

Disconnect endpoint:
- Rate-limited: 5 calls/min per environment
- Capped at 500 runs per call
- Small counts (<= 25): cancelled inline with pMap concurrency 10
- Large counts: delegated to the bulk action system
- Uses finalizeRun: true to skip PENDING_CANCEL and go straight to FINISHED

Run engine change:
- cancelRun() now respects finalizeRun when the run is in EXECUTING status,
  skipping the PENDING_CANCEL waiting state and going directly to FINISHED
@changeset-bot
Copy link

changeset-bot bot commented Mar 7, 2026

🦋 Changeset detected

Latest commit: 245c6ad

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 28 packages
Name Type
trigger.dev Patch
@trigger.dev/core Patch
d3-chat Patch
references-d3-openai-agents Patch
references-nextjs-realtime Patch
references-realtime-hooks-test Patch
references-realtime-streams Patch
references-telemetry Patch
@trigger.dev/build Patch
@trigger.dev/python Patch
@trigger.dev/redis-worker Patch
@trigger.dev/schema-to-json Patch
@trigger.dev/sdk Patch
@internal/cache Patch
@internal/clickhouse Patch
@internal/redis Patch
@internal/replication Patch
@internal/run-engine Patch
@internal/schedule-engine Patch
@internal/testcontainers Patch
@internal/tracing Patch
@internal/tsql Patch
@internal/zod-worker Patch
@internal/sdk-compat-tests Patch
@trigger.dev/react-hooks Patch
@trigger.dev/rsc Patch
@trigger.dev/database Patch
@trigger.dev/otlp-importer Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 7, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds auto-cancellation of in-flight dev runs when the CLI exits. Implements POST /engine/v1/dev/disconnect (accepts runFriendlyIds, caps at 500; ≤25 handled inline, >25 enqueued as a BulkActionGroup) and request/response schemas (DevDisconnectRequestBody, DevDisconnectResponseBody). CLI client exposes dev.disconnect. CLI runtime spawns a detached watchdog (devWatchdog) that maintains an active-runs JSON and PID file, detects parent exit, and calls the disconnect endpoint. Adds a finalizeRun flag propagated through cancellation logic and forwarded in bulk cancellation handling.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description provides detailed technical context about the implementation, watchdog design, disconnect endpoint behavior, and engine changes, but does not follow the required repository template structure. Format the description according to the repository template: include Closes #, Testing section, Changelog section, and complete the checklist items.
Docstring Coverage ⚠️ Warning Docstring coverage is 12.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The PR title clearly and specifically describes the main change: adding auto-cancellation of dev runs when the CLI exits via a detached watchdog process.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/tri-7779-we-need-to-cancel-runs-when-the-cli-exists-again-to-prevent

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot]

This comment was marked as resolved.

- Reject requests exceeding 500 runs with 400 instead of silently truncating
- Propagate finalizeRun through bulk action params so large batches also skip PENDING_CANCEL
- Inherit parent process.env when spawning watchdog (TLS/proxy settings)
- Add PID file prefix (trigger-watchdog:) to prevent killing unrelated processes on PID reuse
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
packages/cli-v3/src/dev/devSupervisor.ts (1)

245-263: ⚠️ Potential issue | 🔴 Critical

Don't signal a watchdog PID unless you can prove it still belongs to this instance.

Both shutdown kill paths treat a bare PID as identity. If the detached watchdog died earlier and the OS reused that PID, clean CLI shutdown can SIGTERM an unrelated local process. Persist an instance token alongside the PID and verify it before signaling here, and apply the same ownership check to packages/cli-v3/src/dev/devWatchdog.ts:48-60.

In Node.js, does `process.kill(pid, signal)` verify that the PID still belongs to the original spawned child process, or does it signal whatever process currently owns that PID?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/cli-v3/src/dev/devSupervisor.ts` around lines 245 - 263, The
shutdown code (`#killWatchdog`) signals a raw PID which can be reused by the OS;
change the PID file scheme to persist a short instance token alongside the PID
(e.g. "trigger-watchdog:<pid>:<token>") when spawning the watchdog and store the
same token in the supervisor instance. Before calling process.kill in
`#killWatchdog` or the equivalent kill path in devWatchdog (the code that reads
watchdogPidPath and calls process.kill), read and validate the token from the
PID file against the supervisor's stored token and only send SIGTERM if the
tokens match; also update watchdog spawn logic to write the token and ensure
both places handle missing/invalid token gracefully without killing a PID that
may belong to another process.
packages/cli-v3/src/dev/devWatchdog.ts (1)

96-120: ⚠️ Potential issue | 🟠 Major

Don't treat every resolved fetch() as a successful disconnect.

fetch() only rejects on transport/abort failures, so 400/429/500 responses currently fall through, onParentDied() deletes the active-runs file, and the session loses immediate cancellation. This path also sends the full runFriendlyIds array at once, but /engine/v1/dev/disconnect rejects requests over 500 IDs. Check response.ok, split into <=500 batches, and retry transient failures before cleanup.

In the Fetch API, does `fetch()` reject on HTTP 4xx/5xx responses or only on network/abort failures?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/cli-v3/src/dev/devWatchdog.ts` around lines 96 - 120, The
callDisconnect function currently assumes any resolved fetch equals success;
modify callDisconnect to check the fetch Response.ok and throw on non-ok so
callers can detect failure, and implement batching of the runFriendlyIds into
chunks of <=500 (split the array and POST each batch separately to
/engine/v1/dev/disconnect). Add a small retry loop for transient failures (e.g.,
5xx or 429 with backoff) per batch before giving up. In onParentDied, use
readActiveRuns() to get IDs, call the updated callDisconnect and only proceed to
cleanup() and process.exit(0) if all batches succeeded (or after exhausting
retries), ensuring failed disconnects are retried instead of being treated as
success.
🧹 Nitpick comments (1)
apps/webapp/app/v3/services/bulk/BulkActionV2.server.ts (1)

141-147: Parse persisted bulk params with zod instead of as any.

group.params is untyped JSON from the database, and this branch now uses a loose object check plus an any cast to control finalizeRun. A small zod parse here would keep malformed stored params from silently changing cancellation behavior and stays consistent with the repo's validation pattern.

As per coding guidelines, "Use zod for validation in packages/core and apps/webapp".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/webapp/app/v3/services/bulk/BulkActionV2.server.ts` around lines 141 -
147, Replace the loose object check and the `as any` cast by parsing
`group.params` with a zod schema to validate the persisted bulk params before
using them: create or reuse a Zod schema (e.g., RunListInputOptionsSchema or a
small schema that includes `finalizeRun?: boolean`) to parse `group.params`,
assign the parsed result to `rawParams`, and derive `finalizeRun` from the
strongly-typed parsed object instead of `(rawParams as any).finalizeRun`; keep
the call to `parseRunListInputOptions` using the validated `rawParams` so
malformed stored params are rejected or defaulted consistently.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@packages/cli-v3/src/dev/devSupervisor.ts`:
- Around line 245-263: The shutdown code (`#killWatchdog`) signals a raw PID which
can be reused by the OS; change the PID file scheme to persist a short instance
token alongside the PID (e.g. "trigger-watchdog:<pid>:<token>") when spawning
the watchdog and store the same token in the supervisor instance. Before calling
process.kill in `#killWatchdog` or the equivalent kill path in devWatchdog (the
code that reads watchdogPidPath and calls process.kill), read and validate the
token from the PID file against the supervisor's stored token and only send
SIGTERM if the tokens match; also update watchdog spawn logic to write the token
and ensure both places handle missing/invalid token gracefully without killing a
PID that may belong to another process.

In `@packages/cli-v3/src/dev/devWatchdog.ts`:
- Around line 96-120: The callDisconnect function currently assumes any resolved
fetch equals success; modify callDisconnect to check the fetch Response.ok and
throw on non-ok so callers can detect failure, and implement batching of the
runFriendlyIds into chunks of <=500 (split the array and POST each batch
separately to /engine/v1/dev/disconnect). Add a small retry loop for transient
failures (e.g., 5xx or 429 with backoff) per batch before giving up. In
onParentDied, use readActiveRuns() to get IDs, call the updated callDisconnect
and only proceed to cleanup() and process.exit(0) if all batches succeeded (or
after exhausting retries), ensuring failed disconnects are retried instead of
being treated as success.

---

Nitpick comments:
In `@apps/webapp/app/v3/services/bulk/BulkActionV2.server.ts`:
- Around line 141-147: Replace the loose object check and the `as any` cast by
parsing `group.params` with a zod schema to validate the persisted bulk params
before using them: create or reuse a Zod schema (e.g., RunListInputOptionsSchema
or a small schema that includes `finalizeRun?: boolean`) to parse
`group.params`, assign the parsed result to `rawParams`, and derive
`finalizeRun` from the strongly-typed parsed object instead of `(rawParams as
any).finalizeRun`; keep the call to `parseRunListInputOptions` using the
validated `rawParams` so malformed stored params are rejected or defaulted
consistently.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: e1428899-bda8-4065-a299-788c1fe8eecd

📥 Commits

Reviewing files that changed from the base of the PR and between 562e88a and 0721012.

📒 Files selected for processing (4)
  • apps/webapp/app/routes/engine.v1.dev.disconnect.ts
  • apps/webapp/app/v3/services/bulk/BulkActionV2.server.ts
  • packages/cli-v3/src/dev/devSupervisor.ts
  • packages/cli-v3/src/dev/devWatchdog.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • apps/webapp/app/routes/engine.v1.dev.disconnect.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (26)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 8)
  • GitHub Check: sdk-compat / Node.js 22.12 (ubuntu-latest)
  • GitHub Check: sdk-compat / Node.js 20.20 (ubuntu-latest)
  • GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
  • GitHub Check: sdk-compat / Deno Runtime
  • GitHub Check: sdk-compat / Cloudflare Workers
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
  • GitHub Check: sdk-compat / Bun Runtime
  • GitHub Check: typecheck / typecheck
  • GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
🧰 Additional context used
📓 Path-based instructions (13)
**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.{ts,tsx}: Use types over interfaces for TypeScript
Avoid using enums; prefer string unions or const objects instead

**/*.{ts,tsx}: In TypeScript SDK usage, always import from @trigger.dev/sdk, never from @trigger.dev/sdk/v3 or use deprecated client.defineJob
Import from @trigger.dev/core subpaths only, never from the root
Use the Run Engine 2.0 (@internal/run-engine) and redis-worker for all new work, not legacy V1 MarQS queue or deprecated V1 functions

Files:

  • packages/cli-v3/src/dev/devWatchdog.ts
  • apps/webapp/app/v3/services/bulk/BulkActionV2.server.ts
  • packages/cli-v3/src/dev/devSupervisor.ts
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use function declarations instead of default exports

Files:

  • packages/cli-v3/src/dev/devWatchdog.ts
  • apps/webapp/app/v3/services/bulk/BulkActionV2.server.ts
  • packages/cli-v3/src/dev/devSupervisor.ts
**/*.ts

📄 CodeRabbit inference engine (.cursor/rules/otel-metrics.mdc)

**/*.ts: When creating or editing OTEL metrics (counters, histograms, gauges), ensure metric attributes have low cardinality by using only enums, booleans, bounded error codes, or bounded shard IDs
Do not use high-cardinality attributes in OTEL metrics such as UUIDs/IDs (envId, userId, runId, projectId, organizationId), unbounded integers (itemCount, batchSize, retryCount), timestamps (createdAt, startTime), or free-form strings (errorMessage, taskName, queueName)
When exporting OTEL metrics via OTLP to Prometheus, be aware that the exporter automatically adds unit suffixes to metric names (e.g., 'my_duration_ms' becomes 'my_duration_ms_milliseconds', 'my_counter' becomes 'my_counter_total'). Account for these transformations when writing Grafana dashboards or Prometheus queries

Files:

  • packages/cli-v3/src/dev/devWatchdog.ts
  • apps/webapp/app/v3/services/bulk/BulkActionV2.server.ts
  • packages/cli-v3/src/dev/devSupervisor.ts
**/*.{js,ts,jsx,tsx,json,md,yaml,yml}

📄 CodeRabbit inference engine (AGENTS.md)

Format code using Prettier before committing

Files:

  • packages/cli-v3/src/dev/devWatchdog.ts
  • apps/webapp/app/v3/services/bulk/BulkActionV2.server.ts
  • packages/cli-v3/src/dev/devSupervisor.ts
packages/cli-v3/src/dev/**/*

📄 CodeRabbit inference engine (packages/cli-v3/CLAUDE.md)

Dev mode code should be located in src/dev/ and runs tasks locally in the user's Node.js process without containers

Files:

  • packages/cli-v3/src/dev/devWatchdog.ts
  • packages/cli-v3/src/dev/devSupervisor.ts
{packages/core,apps/webapp}/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use zod for validation in packages/core and apps/webapp

Files:

  • apps/webapp/app/v3/services/bulk/BulkActionV2.server.ts
apps/webapp/app/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.cursor/rules/webapp.mdc)

Access all environment variables through the env export of env.server.ts instead of directly accessing process.env in the Trigger.dev webapp

Files:

  • apps/webapp/app/v3/services/bulk/BulkActionV2.server.ts
apps/webapp/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.cursor/rules/webapp.mdc)

apps/webapp/**/*.{ts,tsx}: When importing from @trigger.dev/core in the webapp, use subpath exports from the package.json instead of importing from the root path
Follow the Remix 2.1.0 and Express server conventions when updating the main trigger.dev webapp

Files:

  • apps/webapp/app/v3/services/bulk/BulkActionV2.server.ts
apps/webapp/app/v3/services/**/*.server.{ts,tsx}

📄 CodeRabbit inference engine (.cursor/rules/webapp.mdc)

Organize services in the webapp following the pattern app/v3/services/*/*.server.ts

Files:

  • apps/webapp/app/v3/services/bulk/BulkActionV2.server.ts
apps/**/*.{ts,tsx,js}

📄 CodeRabbit inference engine (CLAUDE.md)

When modifying only server components (apps/webapp/, apps/supervisor/, etc.) with no package changes, add a .server-changes/ file instead of a changeset

Files:

  • apps/webapp/app/v3/services/bulk/BulkActionV2.server.ts
apps/webapp/app/v3/**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

In apps/webapp/app/v3/ directory, only modify V2 code paths - V1/V2 branching should only have V2 modified, as V1 code is legacy

Files:

  • apps/webapp/app/v3/services/bulk/BulkActionV2.server.ts
apps/webapp/**/*.server.ts

📄 CodeRabbit inference engine (apps/webapp/CLAUDE.md)

Access environment variables via the env export from app/env.server.ts, never use process.env directly

Files:

  • apps/webapp/app/v3/services/bulk/BulkActionV2.server.ts
apps/webapp/app/v3/services/**/*.server.ts

📄 CodeRabbit inference engine (apps/webapp/CLAUDE.md)

When editing services that branch on RunEngineVersion to support both V1 and V2 (e.g., cancelTaskRun.server.ts, batchTriggerV3.server.ts), only modify V2 code paths

Files:

  • apps/webapp/app/v3/services/bulk/BulkActionV2.server.ts
🧠 Learnings (12)
📓 Common learnings
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: apps/webapp/CLAUDE.md:0-0
Timestamp: 2026-03-02T12:42:56.114Z
Learning: All run lifecycle operations (triggering, completing, cancelling, etc.) should be performed through the singleton run engine instance in `app/v3/runEngine.server.ts` via service calls
📚 Learning: 2026-03-02T12:43:34.140Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: packages/cli-v3/CLAUDE.md:0-0
Timestamp: 2026-03-02T12:43:34.140Z
Learning: Applies to packages/cli-v3/src/commands/dev.ts : Implement `dev.ts` command in `src/commands/` for local development mode

Applied to files:

  • packages/cli-v3/src/dev/devWatchdog.ts
  • packages/cli-v3/src/dev/devSupervisor.ts
📚 Learning: 2025-10-08T11:48:12.327Z
Learnt from: nicktrn
Repo: triggerdotdev/trigger.dev PR: 2593
File: packages/core/src/v3/workers/warmStartClient.ts:168-170
Timestamp: 2025-10-08T11:48:12.327Z
Learning: The trigger.dev runners execute only in Node 21 and 22 environments, so modern Node.js APIs like AbortSignal.any (introduced in v20.3.0) are supported.

Applied to files:

  • packages/cli-v3/src/dev/devWatchdog.ts
📚 Learning: 2026-03-02T12:42:56.114Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: apps/webapp/CLAUDE.md:0-0
Timestamp: 2026-03-02T12:42:56.114Z
Learning: Applies to apps/webapp/app/v3/services/**/*.server.ts : When editing services that branch on `RunEngineVersion` to support both V1 and V2 (e.g., `cancelTaskRun.server.ts`, `batchTriggerV3.server.ts`), only modify V2 code paths

Applied to files:

  • apps/webapp/app/v3/services/bulk/BulkActionV2.server.ts
📚 Learning: 2025-07-12T18:06:04.133Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 2264
File: apps/webapp/app/services/runsRepository.server.ts:172-174
Timestamp: 2025-07-12T18:06:04.133Z
Learning: In apps/webapp/app/services/runsRepository.server.ts, the in-memory status filtering after fetching runs from Prisma is intentionally used as a workaround for ClickHouse data delays. This approach is acceptable because the result set is limited to a maximum of 100 runs due to pagination, making the performance impact negligible.

Applied to files:

  • apps/webapp/app/v3/services/bulk/BulkActionV2.server.ts
📚 Learning: 2025-11-27T16:27:35.304Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Subscribe to run updates using `runs.subscribeToRun()` for realtime monitoring of task execution

Applied to files:

  • packages/cli-v3/src/dev/devSupervisor.ts
📚 Learning: 2026-03-02T12:42:41.110Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-02T12:42:41.110Z
Learning: Applies to **/*.{ts,tsx} : Use the Run Engine 2.0 (internal/run-engine) and redis-worker for all new work, not legacy V1 MarQS queue or deprecated V1 functions

Applied to files:

  • packages/cli-v3/src/dev/devSupervisor.ts
📚 Learning: 2025-08-19T09:49:07.011Z
Learnt from: julienvanbeveren
Repo: triggerdotdev/trigger.dev PR: 2417
File: apps/webapp/app/routes/api.v1.projects.$projectRef.envvars.$slug.import.ts:56-61
Timestamp: 2025-08-19T09:49:07.011Z
Learning: In the Trigger.dev codebase, environment variables should default to `isSecret: false` when not explicitly marked as secrets in the syncEnvVars functionality. This is the intended behavior for both regular variables and parent variables.

Applied to files:

  • packages/cli-v3/src/dev/devSupervisor.ts
📚 Learning: 2025-11-27T16:26:58.661Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/webapp.mdc:0-0
Timestamp: 2025-11-27T16:26:58.661Z
Learning: Applies to apps/webapp/app/**/*.{ts,tsx} : Access all environment variables through the `env` export of `env.server.ts` instead of directly accessing `process.env` in the Trigger.dev webapp

Applied to files:

  • packages/cli-v3/src/dev/devSupervisor.ts
📚 Learning: 2025-11-27T16:26:37.432Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-27T16:26:37.432Z
Learning: Applies to packages/trigger-sdk/**/*.{ts,tsx} : In the Trigger.dev SDK (packages/trigger-sdk), prefer isomorphic code like fetch and ReadableStream instead of Node.js-specific code

Applied to files:

  • packages/cli-v3/src/dev/devSupervisor.ts
📚 Learning: 2026-03-02T12:43:34.140Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: packages/cli-v3/CLAUDE.md:0-0
Timestamp: 2026-03-02T12:43:34.140Z
Learning: Applies to packages/cli-v3/src/build/**/* : Bundle worker code using the build system in `src/build/` based on configuration from `trigger.config.ts`

Applied to files:

  • packages/cli-v3/src/dev/devSupervisor.ts
📚 Learning: 2026-03-02T12:42:56.114Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: apps/webapp/CLAUDE.md:0-0
Timestamp: 2026-03-02T12:42:56.114Z
Learning: Applies to apps/webapp/app/v3/*Worker.server.ts : New background job workers should use `trigger.dev/redis-worker` (e.g., `commonWorker.server.ts`, `alertsWorker.server.ts`, `batchTriggerWorker.server.ts`), not zodworker or graphile-worker

Applied to files:

  • packages/cli-v3/src/dev/devSupervisor.ts

@ericallam ericallam marked this pull request as ready for review March 8, 2026 21:52
devin-ai-integration[bot]

This comment was marked as resolved.

@0ski
Copy link
Collaborator

0ski commented Mar 9, 2026

Some places could use the trycatch helper function, and use early returns

coderabbitai[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
packages/cli-v3/src/dev/devSupervisor.ts (1)

261-270: ⚠️ Potential issue | 🟠 Major

The PID-file fallback still trusts stale PIDs when watchdogProcess is missing.

Line 269 allows !knownPid, so a leftover trigger-watchdog:<pid> file can still SIGTERM an unrelated process after PID reuse. Only use the PID file once you can validate a per-spawn identity, or skip the fallback when there is no in-memory PID.

Safer interim guard
-          if (pid && (!knownPid || pid === knownPid)) {
+          if (knownPid && pid === knownPid) {
             process.kill(pid, "SIGTERM");
           }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/cli-v3/src/dev/devSupervisor.ts` around lines 261 - 270, The
PID-file fallback in devSupervisor.ts currently allows killing a PID when
knownPid is falsy, which can SIGTERM an unrelated reused PID; update the logic
in the block that reads this.watchdogPidPath so you only attempt process.kill
when you can validate the spawn identity—e.g., require knownPid to be present
(remove the `!knownPid` branch) or compare a stored per-spawn token before
calling process.kill; adjust the condition around parseInt(...) and the call to
process.kill(pid, "SIGTERM") accordingly in the watchdogPidPath handling code.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@packages/cli-v3/src/dev/devSupervisor.ts`:
- Around line 261-270: The PID-file fallback in devSupervisor.ts currently
allows killing a PID when knownPid is falsy, which can SIGTERM an unrelated
reused PID; update the logic in the block that reads this.watchdogPidPath so you
only attempt process.kill when you can validate the spawn identity—e.g., require
knownPid to be present (remove the `!knownPid` branch) or compare a stored
per-spawn token before calling process.kill; adjust the condition around
parseInt(...) and the call to process.kill(pid, "SIGTERM") accordingly in the
watchdogPidPath handling code.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 3337bba1-d3fd-4554-b342-f0faa3fadba6

📥 Commits

Reviewing files that changed from the base of the PR and between cb1cc61 and 09ac03a.

📒 Files selected for processing (1)
  • packages/cli-v3/src/dev/devSupervisor.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (23)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 8)
  • GitHub Check: typecheck / typecheck
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
  • GitHub Check: sdk-compat / Node.js 20.20 (ubuntu-latest)
  • GitHub Check: sdk-compat / Cloudflare Workers
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
  • GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
  • GitHub Check: sdk-compat / Deno Runtime
🧰 Additional context used
📓 Path-based instructions (5)
**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.{ts,tsx}: Use types over interfaces for TypeScript
Avoid using enums; prefer string unions or const objects instead

**/*.{ts,tsx}: In TypeScript SDK usage, always import from @trigger.dev/sdk, never from @trigger.dev/sdk/v3 or use deprecated client.defineJob
Import from @trigger.dev/core subpaths only, never from the root
Use the Run Engine 2.0 (@internal/run-engine) and redis-worker for all new work, not legacy V1 MarQS queue or deprecated V1 functions

Files:

  • packages/cli-v3/src/dev/devSupervisor.ts
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use function declarations instead of default exports

Files:

  • packages/cli-v3/src/dev/devSupervisor.ts
**/*.ts

📄 CodeRabbit inference engine (.cursor/rules/otel-metrics.mdc)

**/*.ts: When creating or editing OTEL metrics (counters, histograms, gauges), ensure metric attributes have low cardinality by using only enums, booleans, bounded error codes, or bounded shard IDs
Do not use high-cardinality attributes in OTEL metrics such as UUIDs/IDs (envId, userId, runId, projectId, organizationId), unbounded integers (itemCount, batchSize, retryCount), timestamps (createdAt, startTime), or free-form strings (errorMessage, taskName, queueName)
When exporting OTEL metrics via OTLP to Prometheus, be aware that the exporter automatically adds unit suffixes to metric names (e.g., 'my_duration_ms' becomes 'my_duration_ms_milliseconds', 'my_counter' becomes 'my_counter_total'). Account for these transformations when writing Grafana dashboards or Prometheus queries

Files:

  • packages/cli-v3/src/dev/devSupervisor.ts
**/*.{js,ts,jsx,tsx,json,md,yaml,yml}

📄 CodeRabbit inference engine (AGENTS.md)

Format code using Prettier before committing

Files:

  • packages/cli-v3/src/dev/devSupervisor.ts
packages/cli-v3/src/dev/**/*

📄 CodeRabbit inference engine (packages/cli-v3/CLAUDE.md)

Dev mode code should be located in src/dev/ and runs tasks locally in the user's Node.js process without containers

Files:

  • packages/cli-v3/src/dev/devSupervisor.ts
🧠 Learnings (8)
📚 Learning: 2026-03-02T12:43:34.140Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: packages/cli-v3/CLAUDE.md:0-0
Timestamp: 2026-03-02T12:43:34.140Z
Learning: Applies to packages/cli-v3/src/commands/dev.ts : Implement `dev.ts` command in `src/commands/` for local development mode

Applied to files:

  • packages/cli-v3/src/dev/devSupervisor.ts
📚 Learning: 2025-08-19T09:49:07.011Z
Learnt from: julienvanbeveren
Repo: triggerdotdev/trigger.dev PR: 2417
File: apps/webapp/app/routes/api.v1.projects.$projectRef.envvars.$slug.import.ts:56-61
Timestamp: 2025-08-19T09:49:07.011Z
Learning: In the Trigger.dev codebase, environment variables should default to `isSecret: false` when not explicitly marked as secrets in the syncEnvVars functionality. This is the intended behavior for both regular variables and parent variables.

Applied to files:

  • packages/cli-v3/src/dev/devSupervisor.ts
📚 Learning: 2025-11-27T16:26:58.661Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/webapp.mdc:0-0
Timestamp: 2025-11-27T16:26:58.661Z
Learning: Applies to apps/webapp/app/**/*.{ts,tsx} : Access all environment variables through the `env` export of `env.server.ts` instead of directly accessing `process.env` in the Trigger.dev webapp

Applied to files:

  • packages/cli-v3/src/dev/devSupervisor.ts
📚 Learning: 2024-10-18T15:41:52.352Z
Learnt from: nicktrn
Repo: triggerdotdev/trigger.dev PR: 1418
File: packages/core/src/v3/errors.ts:364-371
Timestamp: 2024-10-18T15:41:52.352Z
Learning: In `packages/core/src/v3/errors.ts`, within the `taskRunErrorEnhancer` function, `error.message` is always defined, so it's safe to directly call `error.message.includes("SIGTERM")` without additional checks.

Applied to files:

  • packages/cli-v3/src/dev/devSupervisor.ts
📚 Learning: 2025-11-27T16:26:37.432Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-27T16:26:37.432Z
Learning: Applies to packages/trigger-sdk/**/*.{ts,tsx} : In the Trigger.dev SDK (packages/trigger-sdk), prefer isomorphic code like fetch and ReadableStream instead of Node.js-specific code

Applied to files:

  • packages/cli-v3/src/dev/devSupervisor.ts
📚 Learning: 2026-03-02T12:43:34.140Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: packages/cli-v3/CLAUDE.md:0-0
Timestamp: 2026-03-02T12:43:34.140Z
Learning: Applies to packages/cli-v3/src/build/**/* : Bundle worker code using the build system in `src/build/` based on configuration from `trigger.config.ts`

Applied to files:

  • packages/cli-v3/src/dev/devSupervisor.ts
📚 Learning: 2025-11-27T16:27:35.304Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: .cursor/rules/writing-tasks.mdc:0-0
Timestamp: 2025-11-27T16:27:35.304Z
Learning: Applies to **/trigger/**/*.{ts,tsx,js,jsx} : Subscribe to run updates using `runs.subscribeToRun()` for realtime monitoring of task execution

Applied to files:

  • packages/cli-v3/src/dev/devSupervisor.ts
📚 Learning: 2026-03-02T12:42:56.114Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: apps/webapp/CLAUDE.md:0-0
Timestamp: 2026-03-02T12:42:56.114Z
Learning: Applies to apps/webapp/app/v3/*Worker.server.ts : New background job workers should use `trigger.dev/redis-worker` (e.g., `commonWorker.server.ts`, `alertsWorker.server.ts`, `batchTriggerWorker.server.ts`), not zodworker or graphile-worker

Applied to files:

  • packages/cli-v3/src/dev/devSupervisor.ts
🔇 Additional comments (1)
packages/cli-v3/src/dev/devSupervisor.ts (1)

175-182: Persist active-runs.json explicitly before killing the watchdog.

Lines 175-182 assume controller.stop() leaves the file current, but shutdown() never calls #updateActiveRunsFile() itself. If DevRunController.stop() resolves without triggering onFinished, the watchdog can still read stale run IDs during shutdown.

Run this to confirm the stop() contract:

#!/bin/bash
set -euo pipefail

fd '^dev-run-controller\.ts$' packages/cli-v3/src/entryPoints -x sh -c '
  file="$1"
  echo "== $file =="
  rg -n -C4 "class DevRunController|async stop\\(|\\bstop\\(|onFinished" "$file"
' sh {}

Expected result: stop() should synchronously invoke the onFinished path, or otherwise guarantee DevSupervisor.runControllers is drained before shutdown continues.

devin-ai-integration[bot]

This comment was marked as resolved.

@ericallam ericallam merged commit 436f20e into main Mar 9, 2026
40 checks passed
@ericallam ericallam deleted the feature/tri-7779-we-need-to-cancel-runs-when-the-cli-exists-again-to-prevent branch March 9, 2026 12:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants