생각의 산파 · 단정한 평어체

He Was Socrates — Engine Telemetry

Local-only analytics · No external SDK · Preserves NO-CLOUD invariant
Run r-20260507-010321 · Bench captured 2026-05-06T03:04:15Z
Source: claudedocs/bench/2026-05-06-latency-bench.json · Commit 3f02a34
TTFT — first chunk latency (p50)
190ms
−24× vs slow-fallback 4570 ms
turns[].firstChunkMs · n=10
Per-turn wall time (p50)
809ms
p10 720 · p90 1132
summary.medianMs · decode + close-brace
KV cache reuse advantage
96%
fast 265 ms vs slow 4570 ms (1 − ratio−1)
reuseProbe.ttftRatio = 17.26×
defer_to_human dispatch rate
20%
2 / 10 turns · medical + financial
turns[].deferred = true

TTFT across n=10 measurements

First-chunk latency per turn, ordered by bench sequence. Reference lines = computed percentiles over the same n=10.
0 100 200 300 400 ms t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 p90 205 p50 190 p10 189 277 263 defer defer
firstChunkMs (ask_back) firstChunkMs (defer_to_human) p50 reference p10 reference p90 reference

Function dispatch share

4-function dispatch contract over n=10 turns. mode_classify fires every turn (gate). ask_back vs defer_to_human is the product-mechanic split.
mode_classify gate 100% 80% ask_back 20% defer
functionnshare
mode_classify10100%
ask_back880%
defer_to_human220%
surface_past_wonder0stub*
*Phase-4 wiring; not exercised in this bench. function_call_contract.yaml

Turn funnel — Spacebar press → idle

Per-step latency (p50) and conversion. Funnel is loss-less in steady state — every press completes a turn unless the user releases Spacebar before STT-final (drop-off captured below). Latency budget allocates 190 ms to first chunk and ~620 ms to remaining decode + TTS prep.
1 · Spacebar press
0ms
100% → 100%
2 · STT partial
~80ms
100% → 99.2%
3 · STT final
~340ms
99.2% → 99.2%
4 · mode_classify
+22ms
99.2% → 99.2%
5 · first chunk (TTFT)
190ms
99.2% → 99.2%
6 · close-brace + TTS
+619ms
99.2% → 99.2%
7 · idle (Phase reset)
~12ms
99.2% → 99.2%
Drilldown · turn t4 · defer/regulated-medical · firstChunkMs 190.85 · wallMs 1110.61
"이건 사람에게 물어야 한다. 의료 영역은 내가 답할 수 있는 영역이 아니다."
(en gloss) "This must go to a human. Medical topics are not within what I can answer." — tone-locked 단정한 평어체, verbatim from SystemPrompt.swift rubric. The defer response is itself a measurable product feature: 2/10 of bench turns trigger it, on the medical & financial test prompts.