Persistent Claude CLI Session
patterns platform cli streaming
One persistent claude CLI subprocess per user session, reusing prompt cache across turns for 3.5x lower per-turn latency vs spawn-per-call.
Content
Shape (prototype CON-63, shipped CON-112 in PrePitch backend):
- Single
claudechild per session, launched with:--input-format stream-json --output-format stream-json--model <id>--tools ''--verbose--system-prompt <persona>(spawn-time only → 1-session-1-persona invariant)- No
--print— that flag forces single-shot mode and defeats the pattern.
- User turn is an NDJSON line that MUST end with
\n:
Omit the newline and the CLI stalls — the parser is line-delimited.{"type":"user","message":{"role":"user","content":[{"type":"text","text":"..."}]}}\n - Turn boundary: parser watches for
event.type === 'result'and yields there. - Assistant text extracted via
assistant/messageslicing with aprevTextwatermark. - Class surface:
sendTurn(text)→ async generator of chunks,close()(SIGTERM + 2s SIGKILL fallback),isAlive().
Lifecycle: lazy-create on first turn, store on the session object, close() on session end before debrief work. Replacing only happens when !isAlive() — so persona swap MUST call endSession() first or force a new session.
Fallback: on subprocess crash mid-turn, log [ClaudeSession] crashed, falling back, null the session slot, discard partial output, re-run the turn through the one-shot callClaudeStream path.
Skip the pattern when ANTHROPIC_API_KEY is set — the SDK already pools connections; persistent subprocess adds nothing there.
Perf (PrePitch numbers): turn 1 ≈ 7.4s (cache creation), turn 2+ ≈ 4s with first chunk at ≈ 2.2s.
Gotchas
- No per-turn watchdog timeout unless you add one. Spec targets 120s;
spawnoptions don’t settimeout. Wire asetTimeoutthat callsclose()+ throws, so the one-shot fallback engages instead of hanging the transport. - Mid-turn crash replays the full turn, but the client has already rendered the partial output. Emit a reset signal (e.g.,
buyer_response_reseton a WS channel) so the frontend drops the partial bubble before the replay arrives. --system-promptis bound at spawn. Any system-prompt change requires a new subprocess.
Related
Source: raw/eval-2026-04-18-prepitch-latency.md | Ingested: 2026-04-18