Testing Infra

Six layers of test coverage: unit, integration, conformance, CLI (Go), build (Next.js), and deployed smoke. Each catches a different failure mode. Together they're what makes a fork credible — and skipping any of them is how silent regressions reach production.

Default stance

Add the narrowest fast test first. Add conformance and deployed smoke when behavior crosses a contract boundary. A new helper function probably wants a unit test. A new API endpoint also wants a conformance test (spec → handler) and probably deploy smoke. A schema change wants db:check (already a gate) and probably an integration test that exercises the migrated state.

Tests verify code correctness. Smoke verifies feature correctness. Type checking and unit tests are necessary but not sufficient. The deploy smoke against a live preview URL is what proves the feature actually works end-to-end with real auth, real DB, real env. If you can't run smoke (e.g. UI-only feature with no scriptable signal), say so explicitly rather than claiming success.

Stop at the first failing layer. Don't paper over a unit test failure to chase the integration test. Each layer is signal — fix it where it fails, then move down.

Use this skill when

Adding tests for new features, debugging missing test coverage, changing CI gates, restructuring test directories, auditing whether a fork has credible automated validation, or designing the test strategy for a new resource.

Test layers

Layer	Where	Runs when	Catches
Unit	`tests/unit/`	`bun run test`, gates, pre-commit	Logic bugs in isolated functions
Integration	`tests/integration/`	`bun run test`	Multi-module behavior, real DB queries
Conformance	`tests/conformance/`	gates, CI	Spec ↔ handler ↔ scope drift
CLI (Go)	`cli/cmd/*_test.go`	`cd cli && go test ./...`	CLI command correctness
Build	`bun run build`	gates	Type errors, route conflicts, import issues
Deployed smoke	`scripts/post-deploy-smoke.mjs`	`promote-deployment`	Real-world end-to-end against hosted URL

Workflow

Inspect what exists — ./scripts/testing-infra-preflight.sh reports current coverage.
Map the change to the right layer:
- New function → unit
- New API endpoint → unit (logic) + conformance (spec sync) + deploy smoke (real call)
- Schema change → integration (migrated state) + db:check (gate)
- New CLI command → Go test + deploy smoke (smoke uses the binary)
- UI change → manual verification (we don't have automated UI testing)
Add the narrowest fast test first. Skip integration if a unit test covers the logic.
Run the local gates — ./scripts/gates.sh.
For changes that cross contract boundaries — add conformance and deploy smoke.
Before promotion — see promote-deployment for the deployed smoke flow.

What gets tested where

Unit (tests/unit/) — pure logic. No DB, no network. lib/scopes.ts, lib/api-keys.ts (with mocked DB), validation helpers.
Integration (tests/integration/) — multi-module flows with a real test DB. Org lifecycle, key creation cascade, webhook firing.
Conformance (tests/conformance/) — invariants between layers. Every OpenAPI path has a handler. Every scope in spec exists in lib/scopes.ts. Every CLI command maps to an endpoint.
CLI (cli/cmd/*_test.go) — Go tests next to source files. Tests run via go test ./... from cli/.
Build (bun run build) — Next.js production build. Type errors here are usually real bugs; don't disable types to make build pass.
Deployed smoke (scripts/post-deploy-smoke.mjs) — runs against the actual hosted URL with a real provisioned identity. The strongest signal short of users actually using the app.

Hard rules

Don't write a test that asserts on implementation details that a refactor would break. Test behavior, not internals.
Don't disable a failing test "for now." Either fix it, delete it, or document why it's skipped. Skipped tests rot.
Don't run unit tests against the production database. tests/unit/ mocks DB; tests/integration/ uses an isolated test DB. Never the real Neon branch.
Don't claim a UI feature works without trying it in a browser. Type checking is not feature verification.
Don't skip deploy smoke for "low-risk" changes. Smoke is fast (~30s) and catches the env-drift class of bugs nothing else catches.

Conformance gates

The conformance layer is what keeps spec ↔ code in sync. Key validators:

Validator	Asserts
`scripts/verify-openapi-routes.py`	Every spec path has a handler
`scripts/verify-scope-sync.py`	Scopes match between `openapi/v1.yaml` and `lib/scopes.ts`
`scripts/verify-skill-graph.py`	SKILL_GRAPH and skill cross-references stay consistent
`scripts/verify-factory-contract.py`	Fork manifest matches actual fork state

These run in gates and CI. Failure means drift somewhere — fix the drift, don't disable the gate.

Where things live

File	Purpose
`tests/unit/`	Fast, isolated unit tests
`tests/integration/`	Multi-module integration tests
`tests/conformance/`	Cross-layer invariants
`tests/AGENTS.md`	Test conventions
`cli/cmd/*_test.go`	CLI tests (Go)
`scripts/gates.sh`	Local gate entrypoint
`scripts/release-gates.sh`	Stricter superset for releases
`scripts/post-deploy-smoke.mjs`	Hosted smoke
`scripts/provision-smoke-identity.ts`	Real test user/org for smoke
`vitest.config.ts`	Vitest config

Skill web

run-quality-gates — runs the local gates that include unit + build
promote-deployment — runs deploy smoke as part of release validation
add-api-endpoint — every new endpoint needs unit + conformance coverage
cli-development — every new CLI command needs Go tests
db-health — schema changes interact with integration tests

Auxiliary content

references/original-guide.md — operational reference (principles + workflow)
references/workflow.md — step-by-step for adding test coverage
references/graph.md — handoff boundaries to other skills
scripts/testing-infra-preflight.sh — enumerates current test surfaces, gate status, and coverage gaps; run this first when auditing
scripts/verify-skill-graph.sh — wrapper for the skill-graph conformance check
assets/test-plan-template.md — fill-in template for designing test coverage on a new feature (which layers, why, what to assert)
assets/evals/basic.json — eval cases for skill regression