📝 Written ● Intermediate Updated 2026-05-19

Test-driven development with LingCode

TDD with a human writer is about design pressure. TDD with an AI assistant is about something else: it's the cheapest defense against plausible-looking code that doesn't actually work. Write the test first, and LingCode can't fake passing.

Why TDD with LingCode solves a different problem

Classic TDD literature pitches the red-green-refactor loop as a design discipline — writing the test first forces you to think about the interface before the implementation. That's still true. But when you're pairing with LingCode, a sharper benefit shows up:

LingCode writes code that looks right by default. Method names match, types line up, the structure resembles working examples it's seen. Whether the logic actually does what you asked is a different question.
"Looks right" passes review faster than "is right." If you skim a reasonable-looking function and ship it, you've shipped on plausibility, not correctness.
A test that fails before the function exists is unfakeable. LingCode can't satisfy a red test by writing prose; it has to write logic that makes the assertions hold.

So TDD with LingCode is, before anything else, an anti-hallucination tool. The skill in this tutorial keeps LingCode on the red-green-refactor loop so you can audit the output by running it, not by reading it carefully enough to spot a subtle bug.

What you need

LingCode — download the installer.
A test runner already wired into the project — pytest, jest, XCTest, go test, cargo test, etc. Don't introduce one mid-flight.
A feature description small enough to express as one or two test cases. If you can't, you need to split the work, not skip the tests.

RED — write a failing test before any implementation

The first step is asking LingCode for a test, not a function. Be explicit about wanting it to fail:

Write the first failing test for <feature>. Do NOT write the
implementation. The test must:
- Reference the public API by name (file, function, signature)
  even if it doesn't exist yet.
- Assert behavior the user actually cares about, not internal
  structure.
- Run and fail with a "no such symbol" / "not implemented" /
  "expected X got Y" error — not a syntax error.

Then run the test and paste the exact failure output.

The "paste the exact failure output" step is the audit. If LingCode says "test written, it fails" without showing the failure, run it yourself. A common drift mode is writing a test that crashes before reaching any assertion — that's not a real red, and it doesn't constrain the implementation.

If LingCode wrote a test that already passes, the test is wrong. Either it's testing a feature that already exists (you're not actually building something new) or the assertions are too weak. Stop and rewrite before continuing.

GREEN — minimal implementation, nothing more

Once you have a verified red, ask LingCode for the smallest change that makes the test pass:

The test is red. Write the minimum implementation to make this
ONE test pass. Constraints:
- No code for behavior the test doesn't exercise yet.
- No extra abstractions ("in case we need them later").
- No error handling for cases this test doesn't cover.

Run the test. Paste the exact output. Confirm green.

"No code for behavior the test doesn't exercise yet" is the rule that prevents LingCode from helpfully adding edge cases that aren't tested and might be wrong. If the test covers happy-path string input and LingCode adds a null check, that null check is untested code — drop it or write a test for it.

Add one test at a time

Now extend coverage one assertion at a time. The discipline is: each new red exposes a missing behavior; each new green adds exactly enough code to cover it:

Add ONE more test case. Pick the next unhandled behavior:
- An edge case (empty input, boundary value, off-by-one).
- A different equivalence class (negative number, unicode,
  missing field).
- An error path (invalid input, dependency failure).

Confirm it fails. Then add minimum code to make it pass.
Confirm everything is still green.

Repeat until every behavior in <feature spec> has at least
one test.

The temptation is to write five tests at once and then implement them all together. Resist it. Multi-test-at-once collapses back into "write the function, hope the tests pass" — exactly the workflow TDD is designed to prevent.

Watch for "false green" — the most common failure mode

A test passing doesn't always mean the test is doing what you think. False greens come from three places, and LingCode produces all three reliably:

The test mocks the thing under test. A mock that returns the expected value will always make any assertion pass. If the function being tested is itself a mock, you're testing the mock.
The assertion is too weak. assertTrue(result) passes for any truthy value. assertEqual(result, expected) with a hand-computed expected is the right shape.
Setup matches the assertion. If the test inserts the value it later asserts is present, the test passes regardless of whether the function under test ran at all.

The cheapest way to catch a false green is to break the implementation on purpose and re-run:

For each test that just turned green, mutate the implementation
in one of these ways and confirm the test now fails:
- Replace the return value with a hard-coded wrong value.
- Comment out the body and return early.
- Skip the side effect the function should produce.

If the test still passes after a destructive mutation, the test
is not actually exercising the behavior. Fix the test.

This is mutation testing in miniature — and it's the single highest-yield habit when pairing with LingCode. Five seconds of "break it, re-run, confirm red" saves an afternoon of investigating why a "fully tested" function ships a bug.

REFACTOR — only with all tests green

Once the test suite covers the behavior and is fully green, refactoring is safe. Ask LingCode to improve structure without changing behavior:

All tests are green. Refactor the implementation for:
- Name clarity.
- Removing duplication (DRY only when names converge — don't
  over-extract).
- Extracting helper functions that have a single reason to
  change.
- Replacing conditional pyramids with early returns.

After each refactor step, re-run the full suite. If a test
goes red, the refactor changed behavior — revert and try
again. Don't "fix" the test to match new behavior.

"Don't fix the test to match new behavior" is the rule. The test is the spec. If a refactor breaks the spec, the refactor is wrong, not the spec.

When to skip TDD

TDD is the default for logic. It's overkill — or actively harmful — for a few cases:

Exploration spikes. Throwaway code answering "is this approach feasible?" doesn't need tests; it needs to be deleted as soon as you've learned.
UI tweaks. Pixel-pushing in SwiftUI / Compose / HTML is verified by eye, not by assertion. Write a test if there's a state machine; don't write one for "make the padding bigger."
Configuration changes. Editing JSON, YAML, or build scripts is verified by running the build, not by unit-testing the config.

Have LingCode ask up front:

Before starting, classify the task:
- LOGIC: anything with branches, state changes, or
  computed return values. Use TDD.
- EXPLORATION: throwaway code to learn. Skip TDD; delete after.
- UI: pure presentation. Verify by eye.
- CONFIG: build / env / data. Verify by running.

For LOGIC, do not write implementation code without a failing
test first.

The interaction with systematic debugging

When a bug shows up in code that wasn't written TDD-first, the cheapest path is to retrofit one test: write a failing test that reproduces the bug, then fix until green. That failing test is also the regression test the debugging protocol asks for — same artifact, double duty.

Bugs in TDD-built code show up too, but they show up earlier and smaller — usually as a missing equivalence class no one thought to test. Add the test, watch it fail, fix it. The loop is the same as feature work.

A test suite written entirely by red-green-refactor is its own design document. A new contributor reading it knows what behavior is committed to versus what's incidental implementation — because uncommitted behavior has no test.

Use this in LingCode

The TDD discipline is packaged as a skill — drop it into your skills folder and LingCode will reach for it before writing logic:

---
name: test-driven-development
description: Use when implementing any feature or bugfix, before writing implementation code. Triggers: 'write tests', 'add test coverage', 'TDD', new feature, new function, regression fix, 'verify it works', 'test first'. Actions: red (write failing test, paste output to verify real red), green-minimum (smallest code that passes), mutation-test each new green (catch false greens), refactor only while green. Reframes TDD as anti-hallucination tool — stops AI from writing plausible code that doesn't actually work. The test is the spec; AI can't fake passing.
---

Write logic via red-green-refactor. With an AI assistant, TDD's
primary value is as an anti-hallucination tool — a failing test
is the one constraint plausible-looking code can't fake.

Classify first: LOGIC (use TDD), EXPLORATION / UI / CONFIG
(skip, verify another way).

For LOGIC:

1. RED. Write the first failing test before any implementation.
   Reference the public API by name. Assert user-visible
   behavior, not internal structure. Run it. Paste the exact
   failure output. If it already passes, the test is wrong —
   rewrite.

2. GREEN. Write the MINIMUM implementation to make this one
   test pass. No code for untested behavior. No "in case"
   abstractions. Run it. Paste the exact pass output.

3. ONE-AT-A-TIME. Add the next failing test (edge case,
   equivalence class, error path). Confirm red. Add minimum
   code. Confirm green. Repeat until the spec is covered.

4. WATCH FOR FALSE GREEN. For each new green test, mutate the
   implementation destructively (wrong return value, no-op,
   skipped side effect). If the test still passes, the test
   isn't exercising the behavior. Fix the test.

5. REFACTOR. Only with all tests green. Improve names, remove
   real duplication, simplify control flow. Re-run after each
   step. If a test goes red, REVERT the refactor — do not
   "fix" the test.

For bugs in pre-existing code: write a failing test that
reproduces the bug, then fix until green. The test becomes
the regression test.

Save as ~/.lingcode/skills/test-driven-development/SKILL.md — see Install a skill for the exact location and how skills get discovered.

Get LingCode →