TDD with a human writer is about design pressure. TDD with an AI assistant is about something else: it's the cheapest defense against plausible-looking code that doesn't actually work. Write the test first, and LingCode can't fake passing.
Classic TDD literature pitches the red-green-refactor loop as a design discipline β writing the test first forces you to think about the interface before the implementation. That's still true. But when you're pairing with LingCode, a sharper benefit shows up:
So TDD with LingCode is, before anything else, an anti-hallucination tool. The skill in this tutorial keeps LingCode on the red-green-refactor loop so you can audit the output by running it, not by reading it carefully enough to spot a subtle bug.
The first step is asking LingCode for a test, not a function. Be explicit about wanting it to fail:
Write the first failing test for <feature>. Do NOT write the
implementation. The test must:
- Reference the public API by name (file, function, signature)
even if it doesn't exist yet.
- Assert behavior the user actually cares about, not internal
structure.
- Run and fail with a "no such symbol" / "not implemented" /
"expected X got Y" error β not a syntax error.
Then run the test and paste the exact failure output.
The "paste the exact failure output" step is the audit. If LingCode says "test written, it fails" without showing the failure, run it yourself. A common drift mode is writing a test that crashes before reaching any assertion β that's not a real red, and it doesn't constrain the implementation.
Once you have a verified red, ask LingCode for the smallest change that makes the test pass:
The test is red. Write the minimum implementation to make this
ONE test pass. Constraints:
- No code for behavior the test doesn't exercise yet.
- No extra abstractions ("in case we need them later").
- No error handling for cases this test doesn't cover.
Run the test. Paste the exact output. Confirm green.
"No code for behavior the test doesn't exercise yet" is the rule that prevents LingCode from helpfully adding edge cases that aren't tested and might be wrong. If the test covers happy-path string input and LingCode adds a null check, that null check is untested code β drop it or write a test for it.
Now extend coverage one assertion at a time. The discipline is: each new red exposes a missing behavior; each new green adds exactly enough code to cover it:
Add ONE more test case. Pick the next unhandled behavior:
- An edge case (empty input, boundary value, off-by-one).
- A different equivalence class (negative number, unicode,
missing field).
- An error path (invalid input, dependency failure).
Confirm it fails. Then add minimum code to make it pass.
Confirm everything is still green.
Repeat until every behavior in <feature spec> has at least
one test.
The temptation is to write five tests at once and then implement them all together. Resist it. Multi-test-at-once collapses back into "write the function, hope the tests pass" β exactly the workflow TDD is designed to prevent.
A test passing doesn't always mean the test is doing what you think. False greens come from three places, and LingCode produces all three reliably:
assertTrue(result) passes for any truthy value. assertEqual(result, expected) with a hand-computed expected is the right shape.The cheapest way to catch a false green is to break the implementation on purpose and re-run:
For each test that just turned green, mutate the implementation
in one of these ways and confirm the test now fails:
- Replace the return value with a hard-coded wrong value.
- Comment out the body and return early.
- Skip the side effect the function should produce.
If the test still passes after a destructive mutation, the test
is not actually exercising the behavior. Fix the test.
This is mutation testing in miniature β and it's the single highest-yield habit when pairing with LingCode. Five seconds of "break it, re-run, confirm red" saves an afternoon of investigating why a "fully tested" function ships a bug.
Once the test suite covers the behavior and is fully green, refactoring is safe. Ask LingCode to improve structure without changing behavior:
All tests are green. Refactor the implementation for:
- Name clarity.
- Removing duplication (DRY only when names converge β don't
over-extract).
- Extracting helper functions that have a single reason to
change.
- Replacing conditional pyramids with early returns.
After each refactor step, re-run the full suite. If a test
goes red, the refactor changed behavior β revert and try
again. Don't "fix" the test to match new behavior.
"Don't fix the test to match new behavior" is the rule. The test is the spec. If a refactor breaks the spec, the refactor is wrong, not the spec.
TDD is the default for logic. It's overkill β or actively harmful β for a few cases:
Have LingCode ask up front:
Before starting, classify the task:
- LOGIC: anything with branches, state changes, or
computed return values. Use TDD.
- EXPLORATION: throwaway code to learn. Skip TDD; delete after.
- UI: pure presentation. Verify by eye.
- CONFIG: build / env / data. Verify by running.
For LOGIC, do not write implementation code without a failing
test first.
When a bug shows up in code that wasn't written TDD-first, the cheapest path is to retrofit one test: write a failing test that reproduces the bug, then fix until green. That failing test is also the regression test the debugging protocol asks for β same artifact, double duty.
Bugs in TDD-built code show up too, but they show up earlier and smaller β usually as a missing equivalence class no one thought to test. Add the test, watch it fail, fix it. The loop is the same as feature work.
The TDD discipline is packaged as a skill β drop it into your skills folder and LingCode will reach for it before writing logic:
---
name: test-driven-development
description: Use when implementing any feature or bugfix, before writing implementation code. Triggers: 'write tests', 'add test coverage', 'TDD', new feature, new function, regression fix, 'verify it works', 'test first'. Actions: red (write failing test, paste output to verify real red), green-minimum (smallest code that passes), mutation-test each new green (catch false greens), refactor only while green. Reframes TDD as anti-hallucination tool β stops AI from writing plausible code that doesn't actually work. The test is the spec; AI can't fake passing.
---
Write logic via red-green-refactor. With an AI assistant, TDD's
primary value is as an anti-hallucination tool β a failing test
is the one constraint plausible-looking code can't fake.
Classify first: LOGIC (use TDD), EXPLORATION / UI / CONFIG
(skip, verify another way).
For LOGIC:
1. RED. Write the first failing test before any implementation.
Reference the public API by name. Assert user-visible
behavior, not internal structure. Run it. Paste the exact
failure output. If it already passes, the test is wrong β
rewrite.
2. GREEN. Write the MINIMUM implementation to make this one
test pass. No code for untested behavior. No "in case"
abstractions. Run it. Paste the exact pass output.
3. ONE-AT-A-TIME. Add the next failing test (edge case,
equivalence class, error path). Confirm red. Add minimum
code. Confirm green. Repeat until the spec is covered.
4. WATCH FOR FALSE GREEN. For each new green test, mutate the
implementation destructively (wrong return value, no-op,
skipped side effect). If the test still passes, the test
isn't exercising the behavior. Fix the test.
5. REFACTOR. Only with all tests green. Improve names, remove
real duplication, simplify control flow. Re-run after each
step. If a test goes red, REVERT the refactor β do not
"fix" the test.
For bugs in pre-existing code: write a failing test that
reproduces the bug, then fix until green. The test becomes
the regression test.
Save as ~/.lingcode/skills/test-driven-development/SKILL.md β see Install a skill for the exact location and how skills get discovered.