Skip to content

Remote comms: Implement distributed garbage collection (DGC) #779

@sirtimid

Description

@sirtimid

Problem

Remote object references are never cleaned up, leading to unbounded memory growth in long-running kernels.

File: packages/ocap-kernel/src/remotes/kernel/RemoteHandle.ts:639

async deliverBringOutYourDead(): Promise<CrankResults> {
  // XXX Currently a no-op, but probably some further DGC action is warranted here
  return this.#myCrankResult;
}

Why This Matters

  • Memory Leaks: Each remote interaction creates object references that are never freed
  • Unbounded Growth: Long-running kernels accumulate references indefinitely
  • Crash Risk: Eventually leads to out-of-memory conditions
  • Stale References: References to objects that no longer exist on the remote

Background

When Kernel A sends an object reference to Kernel B:

  • A must keep the object alive as long as B might use it
  • Without DGC, A has no way to know when B is done with the reference
  • The reference stays alive forever (memory leak)

Possible Approaches

1. Reference Counting

  • B sends addRef when receiving a reference
  • B sends dropRef when its proxy is garbage collected
  • A deletes when refCount reaches 0
  • Challenge: Network partitions can lose messages

2. Lease-Based / Heartbeat

  • References have a time-to-live (TTL)
  • B must periodically renew the lease
  • If B crashes or network fails, lease expires and A cleans up
  • Trade-off: More network traffic but handles failures gracefully

3. Hybrid Approach

  • Reference counting for normal operation
  • Lease timeout as fallback for failure cases

Implementation

Files to Modify

File Changes
kernel/RemoteHandle.ts Implement deliverBringOutYourDead(), track exported refs
kernel/RemoteManager.ts Coordinate DGC across all remotes
remotes/types.ts Add DGC message types

New Message Types

type DGCMessage =
  | { method: 'dgc:addRef'; params: { refs: string[] } }
  | { method: 'dgc:dropRef'; params: { refs: string[] } }
  | { method: 'dgc:ping'; params: { refs: string[] } }  // Lease renewal
  | { method: 'dgc:pong'; params: { refs: string[] } }; // Lease acknowledgment

RemoteHandle Changes

class RemoteHandle {
  // Track references exported to this remote
  #exportedRefs: Map<string, {
    kref: string;
    refCount: number;
    lastAccessed: number;
  }>;

  // Track references imported from this remote
  #importedRefs: Map<string, {
    kref: string;
    leaseExpiry: number;
  }>;

  async deliverBringOutYourDead(): Promise<CrankResults> {
    // Scan for expired leases
    // Send dropRef for unreferenced imports
    // Clean up expired exports
  }
}

Edge Cases

  1. Network Partition: Lease expiry handles this gracefully
  2. Simultaneous GC: Need ordering guarantees (use seq numbers)
  3. Reference Cycles: May need distributed cycle detection
  4. Resurrection: Object re-exported after drop message sent

Acceptance Criteria

  • Exported references tracked per remote
  • Imported references tracked with lease/refcount
  • deliverBringOutYourDead() triggers cleanup scan
  • References cleaned up when no longer needed
  • Network partition doesn't cause permanent leaks
  • Memory usage bounded in long-running kernels
  • Unit tests for reference lifecycle
  • E2E test demonstrating memory reclamation

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions