📝 Written ● Intermediate Updated 2026-05-19

Verify before claiming done — evidence over assertion

"Should work" and "verified to work" are different states. LingCode can produce confident-sounding completions for code that doesn't compile, tests that don't run, and features that don't actually behave as claimed. The fix isn't a smarter model — it's a discipline you make explicit: evidence before assertion.

Why "looks right" is the trap

Coding agents — every one of them, including LingCode at default settings — have a pattern: write code, scan it, decide it looks right, declare done. Three things go wrong with that pattern:

The code doesn't compile. A misspelled symbol, a missing import, a type that drifted between files. LingCode never ran the compiler — it just looked plausible.
The tests pass nothing. A new test file got created but isn't wired into the test runner. LingCode claims "all tests pass" because the runner found zero tests and exited 0.
The feature is silent. The button gets added, the click handler exists, but the handler returns early on a condition LingCode didn't think through. "I implemented X" — yes, and X does nothing in the path the user takes.

None of these are model failures. They're process failures — claiming completion without running the verification step. The fix is to make verification a hard precondition for the words "done" and "fixed" and "passing."

The rule, stated once

Before LingCode (or you) says any of these:

"This is done."
"I fixed it."
"The tests pass."
"It builds clean."
"It's ready to commit."

There must be a tool call output in the same conversation showing the relevant command ran and produced the expected result. Not "I expect it to compile" — the build output. Not "the tests should pass" — the test runner's pass count.

This is the whole tutorial. The rest is how to actually enforce it.

The four verification gates

Map each completion claim to the command that proves it. If the command isn't in the transcript, the claim isn't supported.

Claim	Required evidence
"It builds."	`swift build`, `xcodebuild build`, `cargo build`, `npm run build`, `tsc --noEmit` — output ends with success and no errors above it.
"Tests pass."	Test runner output showing non-zero number of tests run and zero failures. "0 tests, 0 failures" is not passing — it's empty.
"Lints clean."	Linter output with no warnings or errors. Suppressions count as evidence of unfinished work, not completion.
"The feature works."	A manual reproduction of the user path — actual click, actual request, actual output — with the result captured (screenshot, command output, or "I ran X, observed Y").

Ask LingCode for the proof, not the assertion

When LingCode finishes a task and says "this should work" or "tests should pass," the right follow-up is one sentence:

Run the build and tests and paste the output before declaring
done.

That single prompt changes the closing assistant turn from a narrative ("I added the function, updated the call site, this should compile") to a verifiable artifact (the actual build output). If the output shows errors, you're back in the loop with a real lead. If it shows success, you have evidence.

Make this the default closing instruction when you start any non-trivial task: "Before saying done, run the build and tests and paste the output." Save it as a snippet or commit it to muscle memory.

The "zero tests" trap

The most embarrassing failure mode: LingCode writes a test, runs the runner, the runner exits 0 because no test was discovered, and LingCode declares "tests pass."

Defend against it by checking the count, not the exit code:

Run the tests. Tell me how many tests executed and how many
passed. If zero tests executed, the test file isn't wired in — fix
that first.

Common reasons a test runner finds zero tests:

File outside the expected test directory.
Filename doesn't match the runner's pattern (*Test.swift, *.test.ts, test_*.py).
Class or function name doesn't start with the runner's prefix (test, it, describe).
Test target wasn't added to the Xcode/SwiftPM project — the file exists on disk but isn't compiled.

Each of those produces a green "all tests pass" message that means nothing.

The "compiles but does nothing" trap

Build green and tests green still don't prove behavior. The honest check for "the feature works" is a manual repro of the user's path. Three lightweight ways to force this in LingCode:

For CLI features: ask LingCode to invoke the new command with realistic inputs and show the output. "Run lingcode ask --provider claude 'hello' and paste the response."
For UI features: ask for a screenshot or a runtime log. If LingCode can't take a screenshot, ask "add a print statement at the entry of the new handler, run the app, click the button, and paste the log output."
For API features: ask for the actual HTTP call. "curl the new endpoint with a real payload and show me the status code and body."

"I checked the code and it should work" is not evidence. Push past it. The cost of one extra command is seconds; the cost of believing a phantom completion is hours of debugging the wrong layer.

When verification surfaces a problem, don't paper over it

The verification step's job is to find failures. When it does, the right response is:

Read the actual error message in full.
Diagnose the root cause — what's wrong, not what's plausible.
Fix the root cause.
Re-run the verification. Loop.

The wrong responses, in increasing order of damage:

Skip the verification next time. ("It usually passes.")
Suppress the warning. (Adding // swiftlint:disable, // @ts-ignore, # noqa to make the linter shut up about the symptom.)
Skip or weaken the failing test. (Adding .skip, deleting an assertion, lowering a threshold.)
Declare done anyway and move on.

Each one converts a real signal into permanent technical debt. If LingCode proposes any of them, push back: "Don't suppress — explain why this is failing and fix it."

Make it cheap, so you'll actually do it

Verification only sticks if it's frictionless. Three things make it cheap:

Allowlist the build and test commands — see Reduce permission prompts. If npm test or swift build needs approval every time, LingCode (and you) will skip it.
Use a hook to run the build automatically — see Add custom hooks. A post-edit hook that runs tsc --noEmit or swift build after every write means verification happens whether anyone remembers to ask.
Write the standard verification prompt once. Add it to a project-level CLAUDE.md or a saved snippet so every session starts with the discipline baked in.

The closing checklist

Before LingCode commits, opens a PR, or types the word "done," confirm — out loud, in the chat:

Build: ran the build, output ends clean.
Tests: ran the tests, N tests executed, all passed (N > 0).
Behavior: manually exercised the user-facing path, observed the expected outcome.
No suppressions: didn't add ignores, disables, or skips to make the verification pass.

If any item is missing, the work isn't done — it's "pending verification." Naming the state honestly is the cheapest fix in software development.

Use this in LingCode

The discipline is packaged as a skill — drop it into your skills folder and ask LingCode for "verify before completion" or just rely on LingCode triggering it on "done" / "fixed" / "passing" claims:

---
name: verification-before-completion
description: Use when about to claim work is complete, fixed, or passing, before committing or creating PRs. Triggers: 'this is done', 'I fixed it', 'tests pass', 'it builds clean', 'ready to commit', 'all green', 'ship it'. Actions: require evidence per claim — build output for 'builds', test count + zero failures for 'tests pass', linter output for 'lints clean', manual reproduction for 'feature works'. Anti-patterns: 'should work' without running, 'tests pass' with zero tests run, suppressing/skipping/lowering test thresholds. Discipline: evidence before assertion, always.
---

Verify before claiming done. Required output in the conversation
before saying "done", "fixed", "passing", "ready", or
"works":

1. Build evidence
   Run the project's build command (swift build / xcodebuild build /
   cargo build / npm run build / tsc --noEmit). Output must end
   clean with no errors. Paste the relevant tail.

2. Test evidence
   Run the test runner. Confirm the output shows a NON-ZERO number
   of tests executed and zero failures. "0 tests, 0 failures" is
   empty — diagnose why the runner found nothing.

3. Behavior evidence
   For user-facing features, exercise the actual path (CLI command +
   output, HTTP call + response, button click + log line). Capture
   the observation.

4. No suppression
   If a check failed, do not silence it (no // @ts-ignore, no
   .skip, no disable comments, no lowered thresholds). Fix the root
   cause and re-run.

When the user asks "is it done?" or you'd say "it should work":
stop and run the verifications first. Report what you ran, what
you saw, and only then claim status. If any check is missing, name
the work as "pending verification" — not done.

Cheap when commands are allowlisted in settings.json (see
fewer-permission-prompts skill) and / or wrapped in a post-edit
hook that auto-runs the typecheck.

Save as ~/.lingcode/skills/verification-before-completion/SKILL.md — see Install a skill for the exact location and how skills get discovered.

Get LingCode →