Skip to content

feat(jobs): Add data retention jobs#4128

Open
TheodoreSpeaks wants to merge 16 commits intostagingfrom
feat/auto-redaction
Open

feat(jobs): Add data retention jobs#4128
TheodoreSpeaks wants to merge 16 commits intostagingfrom
feat/auto-redaction

Conversation

@TheodoreSpeaks
Copy link
Copy Markdown
Collaborator

@TheodoreSpeaks TheodoreSpeaks commented Apr 13, 2026

Summary

Add data retention jobs. 3 jobs created:

  1. Clean up soft deleted resources (7 days free, 30 days paid, customizable enterprise)
  2. Log retention cleanup (7 days free, infinite paid, customizable enterprise)
  3. Task cleanup (7 days free, infinite paid, customizable enterprise)

Type of Change

  • Bug fix
  • New feature
  • Breaking change
  • Documentation
  • Other: ___________

Testing

  • Tested locally. Validated that data is deleted from sim and copilot dbs. Validated that s3 buckets clean up data as well.

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

Screenshots/Videos

@vercel
Copy link
Copy Markdown

vercel bot commented Apr 13, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped Apr 19, 2026 2:27am

Request Review

@waleedlatif1
Copy link
Copy Markdown
Collaborator

@TheodoreSpeaks let's consolidate the migrations into a single one, just delete the existing ones and run it once over all the changes in shcema.ts

@TheodoreSpeaks TheodoreSpeaks changed the title Feat/auto redaction (wip) feat(jobs): Add data retention jobs Apr 18, 2026
@TheodoreSpeaks
Copy link
Copy Markdown
Collaborator Author

@BugBot review

@cursor
Copy link
Copy Markdown

cursor bot commented Apr 18, 2026

PR Summary

High Risk
High risk because it introduces new automated deletion jobs (logs, soft-deleted records, copilot tasks/files) and new workspace-level retention fields, which could cause unintended data loss or performance impact if misconfigured or run at scale.

Overview
Adds a data retention system driven by three new cleanup job types (cleanup-logs, cleanup-soft-deletes, cleanup-tasks) with plan-based defaults and per-workspace enterprise overrides.

Replaces the previous inline /api/logs/cleanup deletion logic with a dispatcher-based cron flow, adds new cron endpoints for the other cleanup types, and implements the actual background tasks (DB batch deletes plus associated cloud storage + copilot backend cleanup).

Introduces an Enterprise-only Data Retention settings section and API (GET/PUT /api/workspaces/[id]/data-retention) backed by new workspace retention columns, plus a DB migration adding those columns and partial indexes to support cleanup queries.

Reviewed by Cursor Bugbot for commit 990d56a. Bugbot is set up for automated code reviews on this repo. Configure here.

Comment thread apps/sim/background/cleanup-soft-deletes.ts Outdated
Comment thread packages/db/migrations/meta/_journal.json
Comment thread packages/db/migrations/meta/0191_snapshot.json Outdated
Comment thread apps/sim/background/cleanup-tasks.ts
@TheodoreSpeaks
Copy link
Copy Markdown
Collaborator Author

@BugBot review

Comment thread apps/sim/lib/billing/cleanup-dispatcher.ts Outdated
@TheodoreSpeaks TheodoreSpeaks marked this pull request as ready for review April 18, 2026 19:34
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 18, 2026

Greptile Summary

Adds three data retention background jobs (soft-delete cleanup, log cleanup, task/chat cleanup) dispatched via Trigger.dev or an inline fallback, with an enterprise-gated UI and API for per-workspace configuration. The migration replaces full soft-delete indexes with partial indexes and adds three retention columns to workspace.

  • P1 — S3 objects orphaned for workspace_file rows: cleanupWorkspaceFileStorage only queries workspaceFiles (plural) for S3 cleanup, but CLEANUP_TARGETS also includes workspaceFile (singular), which has its own key column. When cleanupTable hard-deletes those rows the corresponding S3 objects are never removed.
  • P2 — task cleanup defaults mismatch: The PR description states "7 days free" for task cleanup, but getRetentionDefaultHours returns null for both free and paid tiers (confirmed by the in-code comment "No task cleanup"). Clarify whether this is intentional or a description oversight.

Confidence Score: 3/5

Not safe to merge as-is: the S3 cleanup gap will permanently orphan workspace_file objects in object storage on every cleanup run.

One confirmed P1 data-integrity bug (workspace_file S3 objects never deleted) that will silently accumulate orphaned cloud storage objects on each cron execution. Everything else — batching logic, auth, migration, Trigger.dev wiring, enterprise UI — is well-structured.

apps/sim/background/cleanup-soft-deletes.ts — cleanupWorkspaceFileStorage must also cover the workspaceFile (singular) table

Important Files Changed

Filename Overview
apps/sim/background/cleanup-soft-deletes.ts Batched soft-delete cleanup with partial-index support; P1 bug: workspaceFile S3 objects are never deleted when DB rows are purged
apps/sim/background/cleanup-tasks.ts Task/chat/run cleanup with correct deletion ordering; duplicate copilotChats query for feedback deletion (addressed in prior thread)
apps/sim/background/cleanup-logs.ts Batched execution log cleanup with S3 file handling; dynamic import inside loop (addressed in prior thread)
apps/sim/lib/billing/cleanup-dispatcher.ts Dispatches free/paid/enterprise cleanup jobs; taskCleanupHours returns null for free/paid despite PR description claiming 7-day free default
apps/sim/lib/cleanup/chat-cleanup.ts Collects file refs from workspaceFiles and JSONB messages, calls copilot backend, deletes S3 files with correct context per file
apps/sim/app/api/workspaces/[id]/data-retention/route.ts GET/PUT API for data retention config; properly gates writes to enterprise plan with admin permission check and audit logging
packages/db/migrations/0193_unknown_franklin_richards.sql Adds log/soft-delete/task retention columns to workspace; replaces full indexes with partial indexes on deleted_at/archived_at for query efficiency
apps/sim/ee/data-retention/components/data-retention-settings.tsx Enterprise-gated UI for configuring retention periods; correctly renders locked vs. editable views based on plan
apps/sim/ee/data-retention/hooks/data-retention.ts React Query hooks for data retention with correct staleTime, signal forwarding, and onSettled invalidation
apps/sim/lib/core/async-jobs/backends/trigger-dev.ts Adds cleanup-logs, cleanup-soft-deletes, cleanup-tasks to the Trigger.dev job type mapping; no issues

Sequence Diagram

sequenceDiagram
    participant Cron as Cron (GET /api/cron/*)
    participant Dispatcher as dispatchCleanupJobs
    participant Queue as JobQueue (Trigger.dev / DB)
    participant Task as Background Task
    participant DB as Database
    participant S3 as Object Storage
    participant Copilot as Copilot Backend

    Cron->>Dispatcher: dispatchCleanupJobs(jobType, retentionColumn)
    Dispatcher->>Queue: enqueue free-tier job
    Dispatcher->>Queue: enqueue paid-tier job
    Dispatcher->>DB: query enterprise workspaces with non-NULL retention
    Dispatcher->>Queue: batchTrigger enterprise jobs

    Queue->>Task: run(payload)
    Task->>DB: resolveTierWorkspaceIds or lookup workspace retention
    Task->>DB: SELECT expiring rows (batched, LIMIT 2000)
    Task->>S3: delete associated files (pre-deletion)
    Task->>Copilot: POST /api/tasks/cleanup (chat IDs)
    Task->>DB: DELETE rows by ID
    Task-->>Queue: complete
Loading

Reviews (2): Last reviewed commit: "fix lint" | Re-trigger Greptile

Comment thread apps/sim/background/cleanup-soft-deletes.ts
Comment thread apps/sim/background/cleanup-tasks.ts
Comment thread apps/sim/lib/billing/cleanup-dispatcher.ts Outdated
Comment thread apps/sim/background/cleanup-logs.ts Outdated
@TheodoreSpeaks
Copy link
Copy Markdown
Collaborator Author

@greptile review

Comment thread apps/sim/background/cleanup-soft-deletes.ts
Comment thread apps/sim/lib/billing/cleanup-dispatcher.ts Outdated
Comment thread apps/sim/app/api/workspaces/[id]/data-retention/route.ts
Comment thread apps/sim/background/cleanup-tasks.ts
Comment thread apps/sim/background/cleanup-soft-deletes.ts Outdated
Comment thread apps/sim/lib/billing/cleanup-dispatcher.ts Outdated
Comment thread apps/sim/ee/data-retention/components/data-retention-settings.tsx
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 990d56a. Configure here.

if (workspaceIds.length === 0) {
logger.info(`[${label}] No workspaces to process`)
return
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Snapshot cleanup skipped when no free workspaces exist

Medium Severity

The early return when workspaceIds.length === 0 prevents cleanupOrphanedSnapshots from ever running if no free-tier workspaces exist. Since snapshot cleanup is a global operation (deletes orphaned snapshots across all workspaces), it's independent of whether there are free workspaces to process. In a deployment where all users have paid plans, orphaned snapshots would accumulate indefinitely. The old code in the logs cleanup route always ran snapshot cleanup regardless of workspace count — this refactoring introduced a regression.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 990d56a. Configure here.

)
)
.where(and(isNull(workspace.archivedAt), isNotNull(retentionCol)))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing DISTINCT causes duplicate enterprise cleanup jobs

Low Severity

The enterprise workspace query uses an INNER JOIN on subscription without DISTINCT. If a billedAccountUserId has multiple matching subscription rows (e.g., one active and one past_due, both included in ENTITLED_SUBSCRIPTION_STATUSES), the same workspace ID appears multiple times. Each duplicate triggers a separate cleanup job, causing redundant concurrent deletes against the same workspace data. The same issue exists in resolveWorkspaceIdsForPlan for pro/team plans, though there duplicates in an inArray clause are harmless.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 990d56a. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants