Persistent Claude CLI Session

patterns platform cli streaming

One persistent claude CLI subprocess per user session, reusing prompt cache across turns for 3.5x lower per-turn latency vs spawn-per-call.

Content

Shape (prototype CON-63, shipped CON-112 in PrePitch backend):

Single claude child per session, launched with:
- --input-format stream-json --output-format stream-json
- --model <id> --tools '' --verbose
- --system-prompt <persona> (spawn-time only → 1-session-1-persona invariant)
- No --print — that flag forces single-shot mode and defeats the pattern.
User turn is an NDJSON line that MUST end with \n:
```
{"type":"user","message":{"role":"user","content":[{"type":"text","text":"..."}]}}\n
```
Omit the newline and the CLI stalls — the parser is line-delimited.
Turn boundary: parser watches for event.type === 'result' and yields there.
Assistant text extracted via assistant/message slicing with a prevText watermark.
Class surface: sendTurn(text) → async generator of chunks, close() (SIGTERM + 2s SIGKILL fallback), isAlive().

Lifecycle: lazy-create on first turn, store on the session object, close() on session end before debrief work. Replacing only happens when !isAlive() — so persona swap MUST call endSession() first or force a new session.

Fallback: on subprocess crash mid-turn, log [ClaudeSession] crashed, falling back, null the session slot, discard partial output, re-run the turn through the one-shot callClaudeStream path.

Skip the pattern when ANTHROPIC_API_KEY is set — the SDK already pools connections; persistent subprocess adds nothing there.

Perf (PrePitch numbers): turn 1 ≈ 7.4s (cache creation), turn 2+ ≈ 4s with first chunk at ≈ 2.2s.

Gotchas

No per-turn watchdog timeout unless you add one. Spec targets 120s; spawn options don’t set timeout. Wire a setTimeout that calls close() + throws, so the one-shot fallback engages instead of hanging the transport.
Mid-turn crash replays the full turn, but the client has already rendered the partial output. Emit a reset signal (e.g., buyer_response_reset on a WS channel) so the frontend drops the partial bubble before the replay arrives.
--system-prompt is bound at spawn. Any system-prompt change requires a new subprocess.

Source: raw/eval-2026-04-18-prepitch-latency.md | Ingested: 2026-04-18

Rebar Wiki

Explorer

persistent-claude-session

Persistent Claude CLI Session

Content

Gotchas

Graph View

Table of Contents

Backlinks

Rebar Wiki

Explorer

persistent-claude-session

Persistent Claude CLI Session

Content

Gotchas

Related

Graph View

Table of Contents

Backlinks