Insights / How I use Claude agents in a 200K-line Xcode project
⚠️ Template / sample post. The author and specifics are placeholder. Use this file as the structural reference for new pieces, then replace with the real contributor's content.

How I use Claude agents in a 200K-line Xcode project

By Mei Chen · Engineering Lead, Cipher · 2026-05-11 · 12 min read

Six months ago we turned on autonomous AI agents in our main iOS codebase — 200K lines, eight years of accumulated architecture, eleven engineers — and let them ship real code. This is what worked, what didn't, and the five workflow patterns I've kept.

The setup, briefly

Our app is a mature B2B iOS product. Swift, UIKit + a growing SwiftUI surface, a custom dependency-injection layer, three feature modules each with their own owner. The kind of codebase where "make a small change" is rarely small.

The agent setup: Claude as the primary, DeepSeek as the cost-conscious fallback for routine work, and an explicit hand-off protocol I'll describe below. Agents run from the chat panel during business hours; I don't let them run unattended overnight.

Lesson 1: the project file is the danger zone

The first thing that broke was project.pbxproj. The agent added a new file, the file references landed in the wrong target, and a teammate's build started failing two days later because their target was missing the file. The agent's reasoning was correct; its edit to a complex Apple-internal XML format was not.

What I do now:

Lesson 2: separate the "think" from the "do"

The single biggest workflow win was making "plan first, execute second" a habit. Plan mode → read-only analysis → review the proposed plan → switch to acceptEdits mode → let it execute.

The plan step catches the kind of bad decision that's hard to undo: choosing the wrong abstraction, splitting a refactor at the wrong boundary, deciding to "just rewrite this." The execute step then runs fast because you're past the judgment calls.

"Show me what you'd do before you do it" is the prompt I type more than any other.

Lesson 3: providers aren't interchangeable

I ran the same six prompts on Claude and DeepSeek and graded the results. Claude won on complex refactors, multi-file edits, and anything requiring conventions inference. DeepSeek won on raw code generation and the kind of mechanical work (apply this pattern across 40 files) where the constraint is throughput, not judgment.

The split I ended up with:

Cost matters at our usage levels but it's not the main factor. Match the model to the task and you reduce both cost and latency.

Lesson 4: subagents are for the boring scary stuff

I write more subagents than I expected. The pattern: any task that the main chat keeps getting wrong because of context dilution becomes a subagent with a tight system prompt and a restricted tool set.

Two subagents I rely on weekly:

The narrower the surface, the more useful the subagent. Counterintuitive at first; obvious after you've written three.

Lesson 5: worktrees changed how I run experiments

I used to fight the impulse to let the agent try something speculative because "what if it makes a mess." Now I fork a worktree, let it run, and either keep what it built or delete the whole tree.

The unlock isn't the git plumbing — I knew worktrees existed before LingCode. It's that the IDE makes the fork-experiment-merge loop fast enough that I'll actually do it. Five minutes from "I want to try X" to "I've tried X and decided."

Where it still breaks

Not everything works. Three categories where I still drive manually:

What I'd tell a teammate starting tomorrow

Three rules:

The mental model that took the longest to internalize: the agent isn't a junior engineer who happens to type fast. It's a fundamentally different kind of collaborator with different strengths, different blind spots, and a different cost profile. Treating it like a junior leads to over-supervision; treating it like infrastructure leads to under-supervision. The right setting is somewhere in between, and it took me three months to find it.

Mei Chen Engineering Lead at Cipher, where she leads the iOS team. Previously at Twitter and Etsy. She writes occasionally about working with AI tools in established codebases.

Comments

Loading…