feat(kiloclaw): Gmail push notifications via Google Cloud Pub/Sub#1065
Open
feat(kiloclaw): Gmail push notifications via Google Cloud Pub/Sub#1065
Conversation
Validates the bearer JWT that Google attaches to Pub/Sub push deliveries: checks issuer, audience, and that the email claim matches the Google system service account. Also adds cloudflare-gmail-push to the pnpm workspace and bumps its vitest to 3.2.4 (required for vite 7 compat).
Adds a setGmailNotifications mutation hook to useKiloClawMutations and renders a Gmail Notifications toggle in GoogleAccountSection (visible only when Google is connected). Also fixes pre-existing lint/typecheck issues in cloudflare-gmail-push and kiloclaw test files introduced by earlier tasks in this branch.
…nfig to subscription - Replace fragile URL string-replace with GMAIL_PUSH_WORKER_URL env var; skip Pub/Sub setup with warning if not set - Wrap JWT payload parsing in try/catch and guard subscription creation on pushUserId being present - Add --push-auth-service-account and --push-auth-token-audience to both create and update commands so Google sends OIDC tokens in push requests
…c routing Without this header, requests to https://<app>.fly.dev can land on any machine in the app. Include the fly-force-instance-id header (already confirmed non-null by the guard above) to ensure the push notification reaches the correct machine.
…nGateway - Add 'Cannot enable Gmail' to SAFE_ERROR_PREFIXES so the helpful error message reaches the UI instead of a generic failure - Add email_verified claim check in OIDC validation (defense-in-depth) - Rename internal spawnGateway → spawnProcess since supervisor is generic - Add OIDC_AUDIENCE ↔ setup.mjs cross-reference comments
The gog CLI needs GOG_KEYRING_BACKEND and GOG_KEYRING_PASSWORD env vars to find credentials. Every other gog invocation in setup.mjs passes gogEnv except this one. Also strengthen push test to assert auth header.
…e restart failure - Remove --push-auth-service-account from Pub/Sub subscription: the gmail-api-push SA is Google-managed and can't be impersonated by the user's Pub/Sub service agent - Make OIDC validation in push worker warn-only when no auth header (still rejects invalid tokens) - Surface restartFailed in GmailNotificationsResponse so UI can inform user if the toggle was saved but machine restart failed
…ents - Add restartFailed to updateGmailNotifications return type declaration - Update wrangler.jsonc OIDC_AUDIENCE comments (push auth SA was removed)
Contributor
Code Review SummaryStatus: 1 Issues Found | Recommendation: Address before merge Overview
Issue Details (click to expand)WARNING
Fix these issues in Kilo Cloud Other Observations (not in diff)None. Files Reviewed (49 files)
Reviewed by gpt-5.4-20260305 · 1,787,419 tokens |
Derive a per-user token via HMAC-SHA256(INTERNAL_API_SECRET, userId) and embed it in the Pub/Sub push subscription URL. The push worker verifies this token before processing, preventing anyone who guesses a userId from triggering fake pushes. Route changes: /push/user/:userId → /push/user/:userId/:token
# Conflicts: # kiloclaw/src/routes/platform.ts
…otency DO - Use BigInt instead of parseInt for historyId comparison in kiloclaw DO - Replace fixed-delay message.retry() with manual exponential backoff (6 retries, 2x factor, base 60s, ~1h total) - Add GmailPushIdempotency DO to deduplicate Pub/Sub redeliveries (keyed by userId, 24h TTL with batched cleanup) - Extract messageId from Pub/Sub envelope via zod schema in push route - Add unit tests for updateGmailHistoryId (5 cases) and idempotency DO
…enant forgery Use per-user audience (OIDC_AUDIENCE/push/user/<userId>) so the OIDC token is cryptographically bound to the specific user. A token minted for one user's subscription cannot be replayed against another user's push endpoint.
…worker Was kiloclaw-gmail-push.kiloapps.ai (wrong subdomain + TLD), should be kiloclaw-gmail.kiloapps.io matching the wrangler.jsonc route.
retryWithBackoff used void to discard the queue send promise, silently dropping messages if the re-enqueue failed. Now awaits the send and logs errors so failures are observable.
BigInt() throws SyntaxError on non-numeric strings. The DO's updateGmailHistoryId wraps this in try/catch but patchGogHistoryId did not, violating its documented "never throws" contract.
…ropping Only 400 and 422 are truly permanent client errors. 401 (gateway token drift during restart/rotation) and 404 (old image without the route) are transient and should be retried with backoff.
If startWatchRenewal is called twice, the first timer set would leak since only the last reference was stored. Now cleans up any existing timers before creating new ones.
…flows - Topic creation: distinguish "already exists" from real errors - Publisher role: use actual stderr instead of generic err.message - SA creation + token creator: escalate from warning to error with clear "Gmail push notifications will not work" message - userId extraction: skip Pub/Sub instead of killing entire setup - Gmail watch failure: drop incorrect manual retry command - Subscription failures: track pushSetupOk flag - Final summary: show conditional next steps based on push setup result; replace stale "Redeploy to activate" with correct instructions
…idation Move the email_verified guard above the allowedEmail check so unverified tokens are always rejected, regardless of whether allowedEmail is passed. Add test covering the production code path (no allowedEmail).
The audience was hardcoded to the production URL while pushEndpoint respected the GMAIL_PUSH_WORKER_URL override, causing audience mismatch in dev/tunnel flows.
…tions Without this, human-readable errors like "Cannot enable Gmail notifications without a connected Google account" were swallowed and surfaced as opaque KiloClawApiError messages in the UI toast.
Without a timeout, a hung gog process would block the entire controller event loop indefinitely, stalling inbound push handling.
The worker implements its own retry-via-reenqueue with exponential backoff and always calls message.ack(). Disable platform-level retries to prevent them from bypassing the custom backoff logic on unhandled crashes.
HTTP error responses were silently swallowed since only exceptions hit the catch block. A persistent kiloclaw-side failure would be invisible in logs.
Guard token-creator grant, subscription creation, and Gmail watch registration behind pushSetupOk checks so dependent steps are skipped when a prerequisite fails, avoiding confusing cascading errors. Also remove redundant oidcAudience variable (identical to pushEndpoint).
Move startWatchRenewal() to after gmailWatchSupervisor.start() succeeds so the renewal timer doesn't fire against a process that failed to spawn.
gog can return both 200 (hook delivered) and 202 (no new messages). The type annotation now reflects both possible values.
The env var is a base URL, not the full audience — the worker appends /push/user/<userId> at runtime. The old name was misleading and could cause a double-path audience if someone set it to the full per-user URL.
Move gcpProject extraction inside the pushSetupOk-guarded block so it doesn't run after earlier failures, preventing confusing cascading errors from (unset) project IDs propagating into SA emails and topic paths.
If the child exits between kill('SIGKILL') and the await expression,
handleChildExit could null out childExitPromise, making the await
resolve immediately via the fallback instead of waiting for exit cleanup.
…and fix stale toast - Let retryWithBackoff propagate queue send errors so message.ack() is skipped and Cloudflare redelivers the original message - Use mutation response data for toast message instead of stale closure - Fix README env var name: OIDC_AUDIENCE → OIDC_AUDIENCE_BASE
…g event loop execFileSync blocks the Node.js event loop for up to 30s during Gmail watch renewal, stalling all HTTP requests (health checks, push routes). Switch to async execFile so the controller remains responsive.
| batch: MessageBatch<GmailPushQueueMessage>, | ||
| env: AppEnv | ||
| ): Promise<void> { | ||
| await Promise.allSettled(batch.messages.map(message => processMessage(message, env))); |
Contributor
There was a problem hiding this comment.
WARNING: Promise.allSettled suppresses queue redelivery
If processMessage() rejects after a transient re-enqueue failure, this batch handler still resolves successfully because allSettled() swallows the rejection. Cloudflare then treats the batch as handled, so the original message is acknowledged instead of being redelivered, which reintroduces the notification-loss path this retry logic is trying to avoid.
The glob `**/src/**/*.test.ts` in testMatch was picking up cloudflare-gmail-push vitest tests, causing Jest to fail with "Vitest cannot be imported in a CommonJS module using require()".
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
cloudflare-gmail-pushCloudflare Worker validates OIDC JWTs and forwards push messages to the bot's Fly machine controllergog gmail watch servesupervisor process alongside the existing gateway supervisorArchitecture
Security model
/push/user/{userId}) prevents cross-user notification hijacking/_kilo/gmail-pubsubResilience
max_retries: 0)gmailLastHistoryIduses monotonic-only writes (BigInt comparison); controller patches gog's state file on startup to prevent notification gaps across restartsclearGoogleCredentialsauto-disables gmail notifications; supervisoronSignalstops watch renewal timersChanges
New:
cloudflare-gmail-push/— Cloudflare Worker with OIDC JWT validation, service binding to kiloclaw, Fly machine routingController (
kiloclaw/controller/)/_kilo/gmail-pubsubproxy route to forward pushes to local gog processDO & platform (
kiloclaw/src/)gmailNotificationsEnabledpersisted state + auto-disable on Google disconnectgmailLastHistoryIdmonotonic state for notification orderingGMAIL_NOTIFICATIONS_ENABLEDenv var plumbed to machine buildsGoogle setup (
kiloclaw/google-setup/)gog gmail watch startCloud app (
src/)setGmailNotificationsmutationTest plan
pnpm run typecheck)pnpm run lint)