Testing Infra
Six layers of test coverage: unit, integration, conformance, CLI (Go), build (Next.js), and deployed smoke. Each catches a different failure mode. Together they're what makes a fork credible — and skipping any of them is how silent regressions reach production.
Default stance
Add the narrowest fast test first. Add conformance and deployed smoke when behavior crosses a contract boundary. A new helper function probably wants a unit test. A new API endpoint also wants a conformance test (spec → handler) and probably deploy smoke. A schema change wants db:check (already a gate) and probably an integration test that exercises the migrated state.
Tests verify code correctness. Smoke verifies feature correctness. Type checking and unit tests are necessary but not sufficient. The deploy smoke against a live preview URL is what proves the feature actually works end-to-end with real auth, real DB, real env. If you can't run smoke (e.g. UI-only feature with no scriptable signal), say so explicitly rather than claiming success.
Stop at the first failing layer. Don't paper over a unit test failure to chase the integration test. Each layer is signal — fix it where it fails, then move down.
Use this skill when
Adding tests for new features, debugging missing test coverage, changing CI gates, restructuring test directories, auditing whether a fork has credible automated validation, or designing the test strategy for a new resource.
Test layers
| Layer | Where | Runs when | Catches |
|---|---|---|---|
| Unit | tests/unit/ | bun run test, gates, pre-commit | Logic bugs in isolated functions |
| Integration | tests/integration/ | bun run test | Multi-module behavior, real DB queries |
| Conformance | tests/conformance/ | gates, CI | Spec ↔ handler ↔ scope drift |
| CLI (Go) | cli/cmd/*_test.go | cd cli && go test ./... | CLI command correctness |
| Build | bun run build | gates | Type errors, route conflicts, import issues |
| Deployed smoke | scripts/post-deploy-smoke.mjs | promote-deployment | Real-world end-to-end against hosted URL |
Workflow
- Inspect what exists —
./scripts/testing-infra-preflight.shreports current coverage. - Map the change to the right layer:
- New function → unit
- New API endpoint → unit (logic) + conformance (spec sync) + deploy smoke (real call)
- Schema change → integration (migrated state) + db:check (gate)
- New CLI command → Go test + deploy smoke (smoke uses the binary)
- UI change → manual verification (we don't have automated UI testing)
- Add the narrowest fast test first. Skip integration if a unit test covers the logic.
- Run the local gates —
./scripts/gates.sh. - For changes that cross contract boundaries — add conformance and deploy smoke.
- Before promotion — see
promote-deploymentfor the deployed smoke flow.
What gets tested where
- Unit (
tests/unit/) — pure logic. No DB, no network.lib/scopes.ts,lib/api-keys.ts(with mocked DB), validation helpers. - Integration (
tests/integration/) — multi-module flows with a real test DB. Org lifecycle, key creation cascade, webhook firing. - Conformance (
tests/conformance/) — invariants between layers. Every OpenAPI path has a handler. Every scope in spec exists inlib/scopes.ts. Every CLI command maps to an endpoint. - CLI (
cli/cmd/*_test.go) — Go tests next to source files. Tests run viago test ./...fromcli/. - Build (
bun run build) — Next.js production build. Type errors here are usually real bugs; don't disable types to make build pass. - Deployed smoke (
scripts/post-deploy-smoke.mjs) — runs against the actual hosted URL with a real provisioned identity. The strongest signal short of users actually using the app.
Hard rules
- Don't write a test that asserts on implementation details that a refactor would break. Test behavior, not internals.
- Don't disable a failing test "for now." Either fix it, delete it, or document why it's skipped. Skipped tests rot.
- Don't run unit tests against the production database.
tests/unit/mocks DB;tests/integration/uses an isolated test DB. Never the real Neon branch. - Don't claim a UI feature works without trying it in a browser. Type checking is not feature verification.
- Don't skip deploy smoke for "low-risk" changes. Smoke is fast (~30s) and catches the env-drift class of bugs nothing else catches.
Conformance gates
The conformance layer is what keeps spec ↔ code in sync. Key validators:
| Validator | Asserts |
|---|---|
scripts/verify-openapi-routes.py | Every spec path has a handler |
scripts/verify-scope-sync.py | Scopes match between openapi/v1.yaml and lib/scopes.ts |
scripts/verify-skill-graph.py | SKILL_GRAPH and skill cross-references stay consistent |
scripts/verify-factory-contract.py | Fork manifest matches actual fork state |
These run in gates and CI. Failure means drift somewhere — fix the drift, don't disable the gate.
Where things live
| File | Purpose |
|---|---|
tests/unit/ | Fast, isolated unit tests |
tests/integration/ | Multi-module integration tests |
tests/conformance/ | Cross-layer invariants |
tests/AGENTS.md | Test conventions |
cli/cmd/*_test.go | CLI tests (Go) |
scripts/gates.sh | Local gate entrypoint |
scripts/release-gates.sh | Stricter superset for releases |
scripts/post-deploy-smoke.mjs | Hosted smoke |
scripts/provision-smoke-identity.ts | Real test user/org for smoke |
vitest.config.ts | Vitest config |
Skill web
run-quality-gates— runs the local gates that include unit + buildpromote-deployment— runs deploy smoke as part of release validationadd-api-endpoint— every new endpoint needs unit + conformance coveragecli-development— every new CLI command needs Go testsdb-health— schema changes interact with integration tests
Auxiliary content
- references/original-guide.md — operational reference (principles + workflow)
- references/workflow.md — step-by-step for adding test coverage
- references/graph.md — handoff boundaries to other skills
- scripts/testing-infra-preflight.sh — enumerates current test surfaces, gate status, and coverage gaps; run this first when auditing
- scripts/verify-skill-graph.sh — wrapper for the skill-graph conformance check
- assets/test-plan-template.md — fill-in template for designing test coverage on a new feature (which layers, why, what to assert)
- assets/evals/basic.json — eval cases for skill regression