Tutorials Search / Native Mac IDE / Run lingcode serve as a backend
📝 Written ● Advanced Updated 2026-05-13

Run lingcode serve as a backend

Everything LingCode does for its own chat panel — the agent loop, the tool dispatch, the streaming output — is available as a local HTTP API. Run lingcode serve and you have a backend you can build your own UIs and bots on top of.

Once you've used an agent for a while, you start noticing places where you wish the agent could go. A Slack bot that answers questions about your codebase. A Raycast extension that takes a prompt and pipes the answer to your clipboard. A small webpage on your laptop that gives a friend access to the agent for one specific task. A CI step that asks the agent to review the PR. The agent is good. The agent should be everywhere you want it.

Most of those projects don't need a new agent loop. They need a way to call the agent loop you already have, with auth, with streaming, over a network the way every other service talks. That's what lingcode serve is. The Swift CLI exposes the same three independent loops the Mac app and the REPL use — Claude via the Anthropic SDK, DeepSeek native, OpenAI-compatible — as a Server-Sent Events API behind a bearer token, listening on localhost.

The point of running it locally rather than as a hosted service is the credential boundary. Your API keys live in your Keychain. Your file access lives on your machine. lingcode serve doesn't change either; it just lets your other tools call into the agent over HTTP rather than spawning a subprocess. This tutorial covers starting the server, talking to it with curl, and the considerations that turn a local backend into something you'd actually build on.

What you'll learn

Step 1: Start the server

1

One command, foreground or background

Assuming the CLI is installed, start it with:

lingcode serve

The server boots, prints the listening address (a localhost port), and reads ~/.lingcode/server.token — generating one if it doesn't exist. The token file is chmod 600; treat it like an SSH key. Run with --port <N> to override the port, with --token-file <path> to use a token from somewhere else.

The server is NWListener-based — Apple's native networking stack, not a third-party HTTP framework. It binds to 127.0.0.1 by default and refuses to bind to any other interface without an explicit flag, on purpose.

Step 2: Talk to it with curl

2

One POST, one streaming response

The main endpoint is POST /v1/agent/ask. A minimal request:

TOKEN=$(cat ~/.lingcode/server.token)
curl -N -H "Authorization: Bearer $TOKEN" \
     -H "Content-Type: application/json" \
     -d '{"prompt":"What does package.json declare?","provider":"claude"}' \
     http://127.0.0.1:<port>/v1/agent/ask

The response is a stream of Server-Sent Events. Each data: line is one event — text deltas, tool calls, tool results, the final assistant message. Your client reads the stream and renders the events however it wants. The shape matches what the Mac app's chat panel renders internally.

The -N flag tells curl not to buffer — you want the stream as it arrives, not when it's done.

Step 3: The SSE event types

3

What your client has to handle

The stream emits a small number of event types:

  • text_delta — chunks of the assistant's text response. Concatenate them as they arrive to render the message.
  • tool_call — the agent decided to call a tool. Includes the tool name and arguments. Your client can show this as "calling X with Y…" in the UI.
  • tool_result — what the tool returned. Often included for transparency but not always needed by the client.
  • message_stop — the assistant's turn ended. End of stream for this request.
  • error — something failed. The message field describes the failure; the stream closes after.

For a minimal UI, you can ignore everything except text_delta and message_stop. For a richer one, render tool calls inline. The exact shape is documented in the OpenAPI spec the server exposes at /openapi.json.

Step 4: Don't bind to 0.0.0.0

4

The bearer token is not strong enough for the open internet

The default 127.0.0.1 binding is intentional. The bearer token at ~/.lingcode/server.token is enough to keep local processes honest — your tools talk to your server, no other user on the machine can. It is not, however, designed to defend against the open internet. The server doesn't rate-limit. It doesn't lock out after failed auth. It doesn't refuse requests with weird shapes.

If you want to expose the agent over a network, do it through a real reverse proxy with TLS, IP allowlisting, and rate limiting. Or, more usefully: have each user run their own lingcode serve on their own machine, and have your shared tooling talk to localhost on each developer's box. That keeps the API keys per-user and the threat surface tiny.

Don't put the token in a public repo. Treat it like an API key. If you accidentally commit it, regenerate by deleting ~/.lingcode/server.token and restarting lingcode serve — a new one is created.

Step 5: How keys flow

5

Keychain is the single source

When your client posts to /v1/agent/ask with provider:"claude", the server doesn't take an API key in the request — it pulls it from your macOS Keychain. The client just says "use Claude," and the server does the credentialing.

This means: the bearer token authenticates the request, but the API keys belong to the account running lingcode serve. If you and a teammate both want to use the server with your own keys, you both run your own server. There's no key delegation; you can't hand a token to a teammate and have it use their Claude key.

Step 6: Two integration patterns

6

Per-user vs. shared

Per-user. Each developer runs lingcode serve on their machine. Shared tooling (a Slack bot, an editor extension) is configured per-developer to point at their localhost. Each developer's tokens stay theirs; usage attributes correctly. Right for personal-productivity tools.

Shared. One lingcode serve on a team box, behind a real reverse proxy with TLS and proper auth. Keys belong to the service account. All usage attributes to that account. Right for team-shared bots and CI integrations. Costs and rate limits accumulate centrally.

Pick the pattern that matches the use case. The same server binary supports both — the difference is in the deployment, not the code.

Run it as a launchd service: if you find yourself starting lingcode serve every morning, write a launchd plist that boots it on login. lingcode serve --background exits after detaching; pair with launchd's keepalive for resilience.

What's next