Sandboxed Tasks

Spawn isolated Firecracker microVMs running a Pi coding agent with authenticated CLI access to the platform. The agent can do anything the caller's API key permits — scoped, metered, auto-killed. Tasks are how this platform turns "natural language instruction" into "executed multi-step work" without each caller having to write the orchestration.

Default stance

Pass an instruction, not a recipe. The whole point of the task system is that the caller doesn't know how, only what. The Pi agent has the CLI tools — let it figure out the steps. If you find yourself encoding step-by-step logic in the instruction, you probably want a regular API endpoint, not a task.

Always scope the sub-key to the minimum the task needs. The intersection of caller scopes and scopes parameter is what the sub-agent gets — favor smaller intersections. A task that only needs to read keys should not be passed keys:write "just in case."

Fire-and-forget by default. Subscribe to task.completed / task.failed webhooks instead of polling. The whole runtime is built around webhook notification.

Use this skill when

Implementing agent workflows, designing the task instruction surface for a product, debugging sandbox execution, changing Pi/sandbox snapshot behavior, or scoping sub-keys for delegated work.

Lifecycle

runTask()
  1. Insert task row (status: pending)
  2. Mint task-scoped sub-key (TTL = timeout + 30s buffer)
  3. Update to running
  4. createAndRunSandbox(instruction, seedApiKey, seedApiUrl, timeoutMs)
     → Pi agent runs in Firecracker VM with the CLI authenticated
  5. Update to completed/failed with stdout, stderr, exit code, tokens
  6. Record usage event
  7. Fire webhook: task.completed or task.failed
  8. (finally) Revoke the sub-key — cleanup always runs

The sub-key is the security boundary. It's revoked in finally, so even a thrown error or timeout cleans up.

Create and run a task

# REST
curl -X POST https://your-app.com/api/v1/tasks/run \
  -H "Authorization: Bearer sk_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "instruction": "Audit our API key inventory and revoke any expired ones",
    "scopes": ["keys:read", "keys:write"],
    "timeout": 60000
  }'
// Server-side
import { runTask } from "@/lib/tasks"

const result = await runTask({
  callerKeyId: apiKey.id,
  callerUserId: user.id,
  callerOrganizationId: organizationId,
  callerScopes: apiKey.scopes,
  instruction: "Audit our API key inventory and revoke any expired ones",
  scopes: ["keys:read", "keys:write"],
  timeoutMs: 60_000,
})

Scoping

The scopes parameter is intersected with the caller's scopes — you can only grant permissions you have. Omit scopes to inherit all caller permissions:

// Read-only audit — can't mutate anything
{ scopes: ["obs:read"] }

// Mixed read/write within a domain
{ scopes: ["keys:read", "keys:write", "obs:read"] }

// Inherit everything the caller has
{ /* no scopes */ }

Timeouts

Task typeRecommended
Simple lookup30 s
Multi-step workflow60 s
Data processing / generation180 s
Complex multi-tool agent work300 s (default)
Maximum allowed600 s

clampTimeout() enforces [10s, 600s]. The sub-key TTL is timeout + 30s so the agent can't finish but then make a follow-up call after revocation.

Snapshot management

SANDBOX_SNAPSHOT_ID points to a pre-built Firecracker snapshot containing Pi and its dependencies. It does NOT contain the seed CLI — the CLI is downloaded fresh per task so it's always the current version. Recreate the snapshot only when upgrading Pi or changing the default LLM model:

npx tsx scripts/setup-sandbox-snapshot.ts
# Then update SANDBOX_SNAPSHOT_ID in Vercel env vars

Hard rules

  • Don't pass full secrets in instruction. It ends up in the task record and webhook payload. Pass references; let the agent fetch via CLI.
  • Don't widen scopes to debug a failure. If the agent can't do something, that's the security boundary working — change the scope intentionally, not as a workaround.
  • Don't poll for status when you can subscribe. task.completed and task.failed webhooks fire automatically. See webhooks-and-events.
  • Don't bypass runTask to call sandbox directly. The wrapper handles sub-key minting, status transitions, usage recording, and revocation. Bypassing means you'll forget cleanup and leak access.

Cost model

  • Sandbox compute: ~$0.01-0.03 per 5-min task
  • LLM tokens: $0 with OpenRouter free tier (default)
  • CLI install: 2-3s per task
  • Vercel Pro $20/mo credit covers ~650 five-min tasks

For forkers: adding domain-specific tasks

You don't write task-specific code. Pi figures out which CLI commands to call from the natural language instruction. Your job:

  1. Make the CLI comprehensive — every domain action should have a seed <resource> <verb> command.
  2. Define correct scopescontacts:read, inventory:write, etc. (see add-api-endpoint).
  3. That's it. Pi discovers commands at runtime.

Where things live

FilePurpose
lib/tasks.tsrunTask, getTask, stopTask, scope intersection logic
lib/sandbox.tscreateAndRunSandbox, stopSandboxById
lib/api-keys.tscreateSubKey, revokeKeyWithCascade
app/api/v1/tasks/REST endpoints
pages/api/mcp.tsMCP exposure (Claude Desktop, Cursor)
scripts/setup-sandbox-snapshot.tsBuild a new sandbox snapshot
scripts/test-sandbox.tsLocal sandbox smoke test

Auxiliary content