Skip to content

feat(kiloclaw): Gmail push notifications via Google Cloud Pub/Sub#1065

Open
iscekic wants to merge 99 commits intomainfrom
feature/gmail-pub-sub
Open

feat(kiloclaw): Gmail push notifications via Google Cloud Pub/Sub#1065
iscekic wants to merge 99 commits intomainfrom
feature/gmail-pub-sub

Conversation

@iscekic
Copy link
Contributor

@iscekic iscekic commented Mar 12, 2026

Summary

  • Add Gmail push notification support for kiloclaw bots via Google Cloud Pub/Sub
  • New cloudflare-gmail-push Cloudflare Worker validates OIDC JWTs and forwards push messages to the bot's Fly machine controller
  • Controller runs a gog gmail watch serve supervisor process alongside the existing gateway supervisor
  • Google setup flow extended with Pub/Sub topic/subscription creation and Gmail watch registration
  • Settings UI toggle to enable/disable notifications (requires connected Google account)

Architecture

Gmail API → Pub/Sub topic → Push subscription → cloudflare-gmail-push worker
  → service binding (status + gateway token) → kiloclaw DO
  → fly-force-instance-id → controller /_kilo/gmail-pubsub → gog gmail watch serve (localhost:3002)

Security model

  • Edge: OIDC JWT validation (issuer + per-user audience /push/user/{userId}) prevents cross-user notification hijacking
  • DO: Internal API key for service binding access
  • Controller: Timing-safe gateway token comparison on /_kilo/gmail-pubsub
  • Local: gog manages its own OAuth tokens for Gmail API access

Resilience

  • Idempotency: DO-based dedup with 24h TTL alarm cleanup; distinguishes Pub/Sub redeliveries (attempt 0, blocked) from intentional retries (attempt > 0, allowed)
  • Retry: Custom exponential backoff via queue re-enqueue (max 6 retries, 60s base delay) — worker manages its own retry instead of relying on CF Queues built-in retry (max_retries: 0)
  • State: gmailLastHistoryId uses monotonic-only writes (BigInt comparison); controller patches gog's state file on startup to prevent notification gaps across restarts
  • Cleanup: clearGoogleCredentials auto-disables gmail notifications; supervisor onSignal stops watch renewal timers

Changes

New: cloudflare-gmail-push/ — Cloudflare Worker with OIDC JWT validation, service binding to kiloclaw, Fly machine routing

Controller (kiloclaw/controller/)

  • Generalize supervisor from gateway-only to support arbitrary processes
  • Add gmail watch supervisor (conditional on env vars)
  • Add /_kilo/gmail-pubsub proxy route to forward pushes to local gog process
  • Gmail watch renewal with exponential backoff initial delay
  • Gog credentials management (write tarball, patch historyId)

DO & platform (kiloclaw/src/)

  • gmailNotificationsEnabled persisted state + auto-disable on Google disconnect
  • gmailLastHistoryId monotonic state for notification ordering
  • GMAIL_NOTIFICATIONS_ENABLED env var plumbed to machine builds
  • Platform routes for enable/disable with machine restart

Google setup (kiloclaw/google-setup/)

  • Pub/Sub topic creation + publisher role grant
  • Push subscription pointing to worker with OIDC auth
  • Gmail watch registration via gog gmail watch start

Cloud app (src/)

  • tRPC setGmailNotifications mutation
  • Internal client enable/disable methods
  • Settings UI toggle (visible when Google account connected)

Test plan

  • 570 kiloclaw tests pass (including DO state, env var, supervisor, controller route tests)
  • 13 gmail-push worker tests pass (OIDC validation, push routing, idempotency, consumer backoff)
  • Typecheck clean (pnpm run typecheck)
  • Lint clean (pnpm run lint)
  • Manual: connect Google account, enable notifications, verify push delivery wakes bot
  • Manual: disable notifications, verify pushes stop
  • Manual: disconnect Google account, verify notifications auto-disabled

iscekic added 20 commits March 12, 2026 17:07
Validates the bearer JWT that Google attaches to Pub/Sub push deliveries:
checks issuer, audience, and that the email claim matches the Google
system service account. Also adds cloudflare-gmail-push to the pnpm
workspace and bumps its vitest to 3.2.4 (required for vite 7 compat).
Adds a setGmailNotifications mutation hook to useKiloClawMutations and
renders a Gmail Notifications toggle in GoogleAccountSection (visible
only when Google is connected). Also fixes pre-existing lint/typecheck
issues in cloudflare-gmail-push and kiloclaw test files introduced by
earlier tasks in this branch.
…nfig to subscription

- Replace fragile URL string-replace with GMAIL_PUSH_WORKER_URL env var; skip Pub/Sub setup with warning if not set
- Wrap JWT payload parsing in try/catch and guard subscription creation on pushUserId being present
- Add --push-auth-service-account and --push-auth-token-audience to both create and update commands so Google sends OIDC tokens in push requests
…c routing

Without this header, requests to https://<app>.fly.dev can land on any
machine in the app. Include the fly-force-instance-id header (already
confirmed non-null by the guard above) to ensure the push notification
reaches the correct machine.
…nGateway

- Add 'Cannot enable Gmail' to SAFE_ERROR_PREFIXES so the helpful error
  message reaches the UI instead of a generic failure
- Add email_verified claim check in OIDC validation (defense-in-depth)
- Rename internal spawnGateway → spawnProcess since supervisor is generic
- Add OIDC_AUDIENCE ↔ setup.mjs cross-reference comments
The gog CLI needs GOG_KEYRING_BACKEND and GOG_KEYRING_PASSWORD env vars
to find credentials. Every other gog invocation in setup.mjs passes
gogEnv except this one. Also strengthen push test to assert auth header.
…e restart failure

- Remove --push-auth-service-account from Pub/Sub subscription: the
  gmail-api-push SA is Google-managed and can't be impersonated by
  the user's Pub/Sub service agent
- Make OIDC validation in push worker warn-only when no auth header
  (still rejects invalid tokens)
- Surface restartFailed in GmailNotificationsResponse so UI can
  inform user if the toggle was saved but machine restart failed
…ents

- Add restartFailed to updateGmailNotifications return type declaration
- Update wrangler.jsonc OIDC_AUDIENCE comments (push auth SA was removed)
@iscekic iscekic self-assigned this Mar 12, 2026
@kilo-code-bot
Copy link
Contributor

kilo-code-bot bot commented Mar 12, 2026

Code Review Summary

Status: 1 Issues Found | Recommendation: Address before merge

Overview

Severity Count
CRITICAL 0
WARNING 1
SUGGESTION 0
Issue Details (click to expand)

WARNING

File Line Issue
cloudflare-gmail-push/src/consumer.ts 73 Promise.allSettled suppresses queue redelivery and can still drop notifications after re-enqueue failures

Fix these issues in Kilo Cloud

Other Observations (not in diff)

None.

Files Reviewed (49 files)
  • cloudflare-gmail-push/.gitignore
  • cloudflare-gmail-push/README.md
  • cloudflare-gmail-push/eslint.config.mjs
  • cloudflare-gmail-push/package.json
  • cloudflare-gmail-push/src/auth/oidc.test.ts
  • cloudflare-gmail-push/src/auth/oidc.ts
  • cloudflare-gmail-push/src/consumer.test.ts
  • cloudflare-gmail-push/src/consumer.ts - 1 issue
  • cloudflare-gmail-push/src/idempotency.test.ts
  • cloudflare-gmail-push/src/idempotency.ts
  • cloudflare-gmail-push/src/index.ts
  • cloudflare-gmail-push/src/routes/push.test.ts
  • cloudflare-gmail-push/src/routes/push.ts
  • cloudflare-gmail-push/src/types.ts
  • cloudflare-gmail-push/tsconfig.json
  • cloudflare-gmail-push/vitest.config.ts
  • cloudflare-gmail-push/worker-configuration.d.ts
  • cloudflare-gmail-push/wrangler.jsonc
  • kiloclaw/controller/src/gmail-watch-renewal.test.ts
  • kiloclaw/controller/src/gmail-watch-renewal.ts
  • kiloclaw/controller/src/gog-credentials.test.ts
  • kiloclaw/controller/src/gog-credentials.ts
  • kiloclaw/controller/src/index.ts
  • kiloclaw/controller/src/routes/gmail-push.test.ts
  • kiloclaw/controller/src/routes/gmail-push.ts
  • kiloclaw/controller/src/supervisor.test.ts
  • kiloclaw/controller/src/supervisor.ts
  • kiloclaw/google-setup/push-dev.sh
  • kiloclaw/google-setup/setup.mjs
  • kiloclaw/packages/secret-catalog/src/__tests__/catalog.test.ts
  • kiloclaw/packages/secret-catalog/src/catalog.ts
  • kiloclaw/src/durable-objects/kiloclaw-instance.test.ts
  • kiloclaw/src/durable-objects/kiloclaw-instance/config.ts
  • kiloclaw/src/durable-objects/kiloclaw-instance/index.ts
  • kiloclaw/src/durable-objects/kiloclaw-instance/state.ts
  • kiloclaw/src/durable-objects/kiloclaw-instance/types.ts
  • kiloclaw/src/gateway/env.test.ts
  • kiloclaw/src/gateway/env.ts
  • kiloclaw/src/routes/platform.ts
  • kiloclaw/src/schemas/instance-config.ts
  • kiloclaw/start-openclaw.sh
  • pnpm-lock.yaml
  • pnpm-workspace.yaml
  • src/app/(app)/claw/components/SettingsTab.tsx
  • src/app/(app)/claw/components/withStatusQueryBoundary.test.ts
  • src/hooks/useKiloClaw.ts
  • src/lib/kiloclaw/kiloclaw-internal-client.ts
  • src/lib/kiloclaw/types.ts
  • src/routers/kiloclaw-router.ts

Reviewed by gpt-5.4-20260305 · 1,787,419 tokens

iscekic added 2 commits March 12, 2026 18:40
Derive a per-user token via HMAC-SHA256(INTERNAL_API_SECRET, userId)
and embed it in the Pub/Sub push subscription URL. The push worker
verifies this token before processing, preventing anyone who guesses
a userId from triggering fake pushes.

Route changes: /push/user/:userId → /push/user/:userId/:token
iscekic added 3 commits March 13, 2026 18:05
…otency DO

- Use BigInt instead of parseInt for historyId comparison in kiloclaw DO
- Replace fixed-delay message.retry() with manual exponential backoff
  (6 retries, 2x factor, base 60s, ~1h total)
- Add GmailPushIdempotency DO to deduplicate Pub/Sub redeliveries
  (keyed by userId, 24h TTL with batched cleanup)
- Extract messageId from Pub/Sub envelope via zod schema in push route
- Add unit tests for updateGmailHistoryId (5 cases) and idempotency DO
iscekic added 7 commits March 13, 2026 19:11
…enant forgery

Use per-user audience (OIDC_AUDIENCE/push/user/<userId>) so the OIDC
token is cryptographically bound to the specific user. A token minted
for one user's subscription cannot be replayed against another user's
push endpoint.
…worker

Was kiloclaw-gmail-push.kiloapps.ai (wrong subdomain + TLD), should be
kiloclaw-gmail.kiloapps.io matching the wrangler.jsonc route.
retryWithBackoff used void to discard the queue send promise, silently
dropping messages if the re-enqueue failed. Now awaits the send and
logs errors so failures are observable.
BigInt() throws SyntaxError on non-numeric strings. The DO's
updateGmailHistoryId wraps this in try/catch but patchGogHistoryId
did not, violating its documented "never throws" contract.
…ropping

Only 400 and 422 are truly permanent client errors. 401 (gateway token
drift during restart/rotation) and 404 (old image without the route)
are transient and should be retried with backoff.
If startWatchRenewal is called twice, the first timer set would leak
since only the last reference was stored. Now cleans up any existing
timers before creating new ones.
…flows

- Topic creation: distinguish "already exists" from real errors
- Publisher role: use actual stderr instead of generic err.message
- SA creation + token creator: escalate from warning to error with
  clear "Gmail push notifications will not work" message
- userId extraction: skip Pub/Sub instead of killing entire setup
- Gmail watch failure: drop incorrect manual retry command
- Subscription failures: track pushSetupOk flag
- Final summary: show conditional next steps based on push setup result;
  replace stale "Redeploy to activate" with correct instructions
iscekic added 9 commits March 13, 2026 19:29
…idation

Move the email_verified guard above the allowedEmail check so unverified
tokens are always rejected, regardless of whether allowedEmail is passed.
Add test covering the production code path (no allowedEmail).
The audience was hardcoded to the production URL while pushEndpoint
respected the GMAIL_PUSH_WORKER_URL override, causing audience mismatch
in dev/tunnel flows.
…tions

Without this, human-readable errors like "Cannot enable Gmail notifications
without a connected Google account" were swallowed and surfaced as opaque
KiloClawApiError messages in the UI toast.
Without a timeout, a hung gog process would block the entire controller
event loop indefinitely, stalling inbound push handling.
The worker implements its own retry-via-reenqueue with exponential backoff
and always calls message.ack(). Disable platform-level retries to prevent
them from bypassing the custom backoff logic on unhandled crashes.
HTTP error responses were silently swallowed since only exceptions hit the
catch block. A persistent kiloclaw-side failure would be invisible in logs.
Guard token-creator grant, subscription creation, and Gmail watch
registration behind pushSetupOk checks so dependent steps are skipped
when a prerequisite fails, avoiding confusing cascading errors.

Also remove redundant oidcAudience variable (identical to pushEndpoint).
Move startWatchRenewal() to after gmailWatchSupervisor.start() succeeds
so the renewal timer doesn't fire against a process that failed to spawn.
gog can return both 200 (hook delivered) and 202 (no new messages).
The type annotation now reflects both possible values.
iscekic added 3 commits March 13, 2026 19:44
The env var is a base URL, not the full audience — the worker appends
/push/user/<userId> at runtime. The old name was misleading and could
cause a double-path audience if someone set it to the full per-user URL.
Move gcpProject extraction inside the pushSetupOk-guarded block so it
doesn't run after earlier failures, preventing confusing cascading errors
from (unset) project IDs propagating into SA emails and topic paths.
If the child exits between kill('SIGKILL') and the await expression,
handleChildExit could null out childExitPromise, making the await
resolve immediately via the fallback instead of waiting for exit cleanup.
iscekic added 2 commits March 13, 2026 20:38
…and fix stale toast

- Let retryWithBackoff propagate queue send errors so message.ack() is
  skipped and Cloudflare redelivers the original message
- Use mutation response data for toast message instead of stale closure
- Fix README env var name: OIDC_AUDIENCE → OIDC_AUDIENCE_BASE
…g event loop

execFileSync blocks the Node.js event loop for up to 30s during Gmail
watch renewal, stalling all HTTP requests (health checks, push routes).
Switch to async execFile so the controller remains responsive.
batch: MessageBatch<GmailPushQueueMessage>,
env: AppEnv
): Promise<void> {
await Promise.allSettled(batch.messages.map(message => processMessage(message, env)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: Promise.allSettled suppresses queue redelivery

If processMessage() rejects after a transient re-enqueue failure, this batch handler still resolves successfully because allSettled() swallows the rejection. Cloudflare then treats the batch as handled, so the original message is acknowledged instead of being redelivered, which reintroduces the notification-loss path this retry logic is trying to avoid.

The glob `**/src/**/*.test.ts` in testMatch was picking up
cloudflare-gmail-push vitest tests, causing Jest to fail with
"Vitest cannot be imported in a CommonJS module using require()".
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant