Skip to content

Testing & Debugging

Claude Kit enforces quality through three connected workflows: TDD for building, systematic debugging for fixing, and verification before completion.

Triggers on: “implement”, “add feature”, “fix bug”, “write code”, “build”

The TDD skill enforces a strict red-green-refactor cycle for all production code changes:

1. Write a failing test → Run it → Confirm it fails (RED)
2. Write minimal code → Run it → Confirm it passes (GREEN)
3. Refactor if needed → Run it → Confirm it still passes
4. Commit
  • Tests document intent, not just behavior
  • Catches regressions immediately
  • Forces small, focused changes
  • Creates natural commit points
StackTest CommandFull Verify
Python/FastAPIpytest tests/test_<module>.py -vpytest -v && ruff check .
TypeScript/NestJSnpm test -- --testPathPattern=<module>npm test && npm run lint && npm run build
Next.js/Reactnpx vitest run <file>npm test && next lint && next build

Triggers on: “bug”, “error”, “failing”, “broken”, “doesn’t work”, “TypeError”, stack traces

The systematic-debugging skill follows a four-phase investigation:

Gather evidence before forming hypotheses:

  • Read the error message and stack trace
  • Reproduce the issue
  • Check logs and recent changes

Form specific, testable theories:

  • “The null check on line 42 doesn’t handle the empty array case”
  • Not: “Something is wrong with the data”

Verify each hypothesis systematically:

  • Add logging or breakpoints
  • Write a test that reproduces the bug
  • Isolate the failing component

Apply the minimal fix:

  • Fix the root cause, not the symptom
  • Add a regression test
  • Verify the original error is gone

Triggers on: deep bugs where the error location differs from the bug origin

For bugs that manifest far from their source, the root-cause-tracing skill traces the data flow backward to find where things first went wrong:

Error: NullPointerException at OrderService.getTotal()
↓ trace backward
OrderService.getTotal() receives null item
↓ trace backward
CartService.getItems() returns null for empty cart
↓ root cause
CartRepository.findByUserId() returns null instead of []

Auto-triggers on: “done”, “fixed”, “tests pass”, “build succeeds”

The verification skill prevents false completion claims. Before saying “done”, Claude must:

  1. Run the test suite and read the output
  2. Run the build and confirm it succeeds
  3. Check for regressions in related functionality
  4. Show evidence — actual command output, not assumptions
Without verification:
"I've fixed the bug" → Actually introduced a new failing test
With verification:
Run pytest → See 2 failures → Fix both → Run again → All green → "Fixed"

Triggers on: “mock”, “flaky test”, “test passes but bug ships”, “false positive”

The testing-anti-patterns skill catches common mistakes:

Anti-PatternProblemFix
Heavy mockingTests pass but production breaksTest real integrations
Testing implementationTests break on refactorTest behavior, not internals
No edge casesHappy path works, edge cases crashTest boundaries and errors
Flaky testsRandom failures erode trustFix or delete, never ignore

Triggers on: data validation bugs, “it slipped through”, bypass scenarios

The defense-in-depth skill adds validation at multiple layers so a single-point failure can’t cause data corruption:

API layer: Validate input shape (Pydantic/Zod)
Service layer: Validate business rules
Database layer: Constraints (NOT NULL, UNIQUE, CHECK)
AgentRole
testerRun test suites, analyze coverage, validate error handling
debuggerInvestigate bugs, check logs, reproduce issues
security-auditorSecurity-focused code review