Testing Infra

Six layers of test coverage: unit, integration, conformance, CLI (Go), build (Next.js), and deployed smoke. Each catches a different failure mode. Together they're what makes a fork credible — and skipping any of them is how silent regressions reach production.

Default stance

Add the narrowest fast test first. Add conformance and deployed smoke when behavior crosses a contract boundary. A new helper function probably wants a unit test. A new API endpoint also wants a conformance test (spec → handler) and probably deploy smoke. A schema change wants db:check (already a gate) and probably an integration test that exercises the migrated state.

Tests verify code correctness. Smoke verifies feature correctness. Type checking and unit tests are necessary but not sufficient. The deploy smoke against a live preview URL is what proves the feature actually works end-to-end with real auth, real DB, real env. If you can't run smoke (e.g. UI-only feature with no scriptable signal), say so explicitly rather than claiming success.

Stop at the first failing layer. Don't paper over a unit test failure to chase the integration test. Each layer is signal — fix it where it fails, then move down.

Use this skill when

Adding tests for new features, debugging missing test coverage, changing CI gates, restructuring test directories, auditing whether a fork has credible automated validation, or designing the test strategy for a new resource.

Test layers

LayerWhereRuns whenCatches
Unittests/unit/bun run test, gates, pre-commitLogic bugs in isolated functions
Integrationtests/integration/bun run testMulti-module behavior, real DB queries
Conformancetests/conformance/gates, CISpec ↔ handler ↔ scope drift
CLI (Go)cli/cmd/*_test.gocd cli && go test ./...CLI command correctness
Buildbun run buildgatesType errors, route conflicts, import issues
Deployed smokescripts/post-deploy-smoke.mjspromote-deploymentReal-world end-to-end against hosted URL

Workflow

  1. Inspect what exists./scripts/testing-infra-preflight.sh reports current coverage.
  2. Map the change to the right layer:
    • New function → unit
    • New API endpoint → unit (logic) + conformance (spec sync) + deploy smoke (real call)
    • Schema change → integration (migrated state) + db:check (gate)
    • New CLI command → Go test + deploy smoke (smoke uses the binary)
    • UI change → manual verification (we don't have automated UI testing)
  3. Add the narrowest fast test first. Skip integration if a unit test covers the logic.
  4. Run the local gates./scripts/gates.sh.
  5. For changes that cross contract boundaries — add conformance and deploy smoke.
  6. Before promotion — see promote-deployment for the deployed smoke flow.

What gets tested where

  • Unit (tests/unit/) — pure logic. No DB, no network. lib/scopes.ts, lib/api-keys.ts (with mocked DB), validation helpers.
  • Integration (tests/integration/) — multi-module flows with a real test DB. Org lifecycle, key creation cascade, webhook firing.
  • Conformance (tests/conformance/) — invariants between layers. Every OpenAPI path has a handler. Every scope in spec exists in lib/scopes.ts. Every CLI command maps to an endpoint.
  • CLI (cli/cmd/*_test.go) — Go tests next to source files. Tests run via go test ./... from cli/.
  • Build (bun run build) — Next.js production build. Type errors here are usually real bugs; don't disable types to make build pass.
  • Deployed smoke (scripts/post-deploy-smoke.mjs) — runs against the actual hosted URL with a real provisioned identity. The strongest signal short of users actually using the app.

Hard rules

  • Don't write a test that asserts on implementation details that a refactor would break. Test behavior, not internals.
  • Don't disable a failing test "for now." Either fix it, delete it, or document why it's skipped. Skipped tests rot.
  • Don't run unit tests against the production database. tests/unit/ mocks DB; tests/integration/ uses an isolated test DB. Never the real Neon branch.
  • Don't claim a UI feature works without trying it in a browser. Type checking is not feature verification.
  • Don't skip deploy smoke for "low-risk" changes. Smoke is fast (~30s) and catches the env-drift class of bugs nothing else catches.

Conformance gates

The conformance layer is what keeps spec ↔ code in sync. Key validators:

ValidatorAsserts
scripts/verify-openapi-routes.pyEvery spec path has a handler
scripts/verify-scope-sync.pyScopes match between openapi/v1.yaml and lib/scopes.ts
scripts/verify-skill-graph.pySKILL_GRAPH and skill cross-references stay consistent
scripts/verify-factory-contract.pyFork manifest matches actual fork state

These run in gates and CI. Failure means drift somewhere — fix the drift, don't disable the gate.

Where things live

FilePurpose
tests/unit/Fast, isolated unit tests
tests/integration/Multi-module integration tests
tests/conformance/Cross-layer invariants
tests/AGENTS.mdTest conventions
cli/cmd/*_test.goCLI tests (Go)
scripts/gates.shLocal gate entrypoint
scripts/release-gates.shStricter superset for releases
scripts/post-deploy-smoke.mjsHosted smoke
scripts/provision-smoke-identity.tsReal test user/org for smoke
vitest.config.tsVitest config

Skill web

  • run-quality-gates — runs the local gates that include unit + build
  • promote-deployment — runs deploy smoke as part of release validation
  • add-api-endpoint — every new endpoint needs unit + conformance coverage
  • cli-development — every new CLI command needs Go tests
  • db-health — schema changes interact with integration tests

Auxiliary content