test: validate graceful Claude API skip (dotCMS/core#35328)#460
test: validate graceful Claude API skip (dotCMS/core#35328)#460sfreudenthaler wants to merge 2 commits intomainfrom
Conversation
…esting Test dotCMS/ai-workflows@feat/35328-graceful-api-unavailability which adds pre-flight API availability check to skip Claude step gracefully when service is down. Ref: dotCMS#35328 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
❌ Issue Linking RequiredThis PR could not be linked to an issue. All PRs must be linked to an issue for tracking purposes. How to fix this:Option 1: Add keyword to PR body (Recommended - auto-removes this comment)
Why is this required?Issue linking ensures proper tracking, documentation, and helps maintain project history. It connects your code changes to the problem they solve.--- This comment was automatically generated by the issue linking workflow |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 94e811290e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| ) | ||
| ) | ||
| uses: dotCMS/ai-workflows/.github/workflows/claude-orchestrator.yml@v2.0.0 | ||
| uses: dotCMS/ai-workflows/.github/workflows/claude-orchestrator.yml@feat/35328-graceful-api-unavailability |
There was a problem hiding this comment.
Pin orchestrator workflow to an immutable ref
Switching uses: from @v2.0.0 to the mutable feature branch makes these jobs depend on a temporary ref that can be force-pushed or deleted; if feat/35328-graceful-api-unavailability is removed, the reusable workflow reference cannot be resolved and the Claude jobs will stop starting. Use a released tag or commit SHA for stability (the same issue is repeated at lines 111 and 139).
Useful? React with 👍 / 👎.
## Summary - Add pre-flight API availability check before running `claude-code-action` - Skip the Claude step gracefully (warning, not failure) when the API returns 5xx or is unreachable - Belt-and-suspenders: `continue-on-error: true` + post-execution re-check distinguishes service outages from legitimate errors ## Problem When the Anthropic API is down, the Claude step fails with a hard error, blocking the entire CI pipeline. Example: [dotCMS/core run 24461196854](https://github.com/dotCMS/core/actions/runs/24461196854) ``` API Error: 500 {"type":"error","error":{"type":"api_error","message":"Internal server error"}} · check status.claude.com ``` ## Solution Two layers of protection in `claude-executor.yml`: **Layer 1 — Pre-flight check** (catches most outages): - `curl` the `/v1/models` endpoint with a 15s timeout before running Claude - 5xx / network failures → `available=false` → skip Claude step → warn and succeed - Auth errors (401/403), rate limits (429) → `available=true` → proceed so action can surface the specific error **Layer 2 — Runtime protection** (catches mid-execution degradation): - `continue-on-error: true` on the Claude step - Post-execution step checks if Claude failed - If failed AND API is now returning 500 → skip gracefully (service issue) - If failed AND API is now returning 200 → re-fail with "legitimate error" message ## Test Validated in `dotCMS/core-workflow-test#460`: - Pre-flight check correctly passes when API is available - `Handle Claude execution result` correctly re-fails for non-service errors (workflow validation failure in test PR) - The skip path is code-correct (would activate when API returns 5xx) ## Consumer repos to update after merge - `dotCMS/core` — update `@v2.0.0` → new tag - `dotCMS/core-workflow-test` — update `@v2.0.0` → new tag Fixes: dotCMS/core#35328 Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Update ai-workflows reference from v2.0.0 to v2.1.0 which adds graceful handling when the Anthropic API is unavailable. Ref: dotCMS#35328 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ty skip (#35336) ## Summary - Bump `ai-workflows` reference from `v2.0.0` → `v2.1.0` in `ai_claude-orchestrator.yml` - `v2.1.0` adds pre-flight Anthropic API availability check + runtime error triage ## Problem When the Claude service has an outage, all PR pipelines fail with: ``` API Error: 500 {"type":"error","error":{"type":"api_error","message":"Internal server error"}} · check status.claude.com ``` Example blocked run: https://github.com/dotCMS/core/actions/runs/24461196854 ## What changed in `v2.1.0` (`ai-workflows`) **Pre-flight check (Layer 1):** - Calls `GET /v1/models` before running claude-code-action (15s timeout, 2 retries) - 5xx or network failure → `available=false` → skip Claude step gracefully with `::warning::` - 401/403/429 or other codes → proceed so the action surfaces the specific error **Runtime protection (Layer 2):** - `continue-on-error: true` on the Claude step - Post-execution step re-checks the API if Claude failed - API available after failure → re-fail the job ("legitimate error") - API unavailable after failure → skip gracefully ("service degradation") ## Test Validated in dotCMS#460 (also updated to `v2.1.0`): - Pre-flight check correctly identifies API availability - Legitimate errors still surface and fail the job (correct) - Service outage path is code-correct (pre-flight would skip before Claude runs) Fixes #35328 Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
This PR tests the fix from dotCMS/ai-workflows@feat/35328-graceful-api-unavailability.
The Claude orchestrator now:
What to verify
claude-automatic-reviewandclaude-rollback-safety-checkjobs should run successfully✅ Claude API is availablein the logsRef: dotCMS#35328