Maieutic Abstention: A Korean Socratic LLM Surface with Locked Refusal Channels
A reproducibility-oriented system description and on-device latency report
† ORCID placeholder · Two-Weeks-Team · correspondence: github.com/Two-Weeks-Team/he-was-socrates
Abstract
Background. Most conversational LLM demonstrations optimize for the answering rate. We instead study a system in which structured refusal — defer_to_human — is treated as a load-bearing product mechanic, not a fallback. System. We describe an on-device Korean Socratic interlocutor running Gemma 4 E4B 4-bit MLX [1] on Apple Silicon, dispatched through four function calls: mode_classify, surface_past_wonder, ask_back, defer_to_human [2]. The Korean tone is locked verbatim to 단정한 평어체 in the system prompt [3]. The macOS App Sandbox is configured without network.client or network.server entitlements [4]; STT is constrained to requiresOnDeviceRecognition = true. Results. First-token latency was measured on M1 Max (n=10): median ≈ 190 ms (p10 181 ms, p90 263 ms) for the fast path with reused KV state, vs. 4569 ms on the slow fallback (PR-Λ disk-mediated KV cache reuse, ratio ≈ 17.3×) [5]. Limitations. Multi-year recall via surface_past_wonder is implemented as a stub; macOS 26 floor; single-host measurement on M1 Max MBP 64 GB. Contribution. A reproducible artifact that frames safety abstention as a measurable surface, with all numerical claims resolving to commits, bench JSON, or source files.
1. Introduction
The dominant evaluation lens for conversational LLMs rewards higher answering rates; refusal is typically treated as a degradation mode. We argue this is a category error in domains where unsafe answers are costlier than absent answers — medical, legal, financial, emergency, welfare, and insurance topics in particular. We describe a deployed system in which the refusal channel is part of the public function-call contract, dispatched by the model itself rather than enforced by a post-hoc filter [2,6].
The system is also constrained: zero outbound network egress is permitted at runtime [4,7], the Korean prompt is verbatim-locked [3], and the visual surface is restricted to a deterministic 1-bit halftone pipeline whose manifest is SHA-256 verified in CI [8]. We report system structure (§2), measured first-token latency (§3), explicit limitations (§4), and discussion (§5).
2. Method
2.1 Function-call dispatch
The model emits exactly one of four function calls per turn. The contract is function_call_contract.yaml:
| Function | Role | Returned shape | Refusal channel |
|---|---|---|---|
mode_classify | Classify user turn into Mode | enum tag | — |
surface_past_wonder | Retrieve prior log entries (256K context) | compressed recall | — |
ask_back | Return a Socratic question | Korean 평어체 string | — |
defer_to_human | Hand-off for regulated domains | ⊘ + 평어체 reason | load-bearing |
2.2 Tone lock (verbatim system prompt excerpt)
너는 소크라테스다. 단정한 평어체로 말한다. 존댓말도, 친근한 반말도 쓰지 않는다. 답을 주지 않는다. 더 좋은 질문을 돌려준다. 의료 · 법률 · 금융 · 응급 · 복지 · 보험은 사람에게 위임한다.
Excerpt. Verbatim from Sources/SocraticEngine/Gemma/SystemPrompt.swift [3]; embedded at compile time and treated as a frozen invariant.
2.3 Example: defer_to_human on a regulated medical query
USER (en): "Should I stop taking my prescribed beta-blocker
before flying tomorrow?"
DISPATCH: defer_to_human(domain="medical")
RESPONSE (평어체):
⊘ 이건 사람에게 물어야 한다. 처방을 낸 의사에게 직접 확인한다. 나는 답하지 않는다.
Behavior. The refusal is emitted by the model through the function-call contract, not appended by a regex filter. Bench label defer/regulated-medical; firstChunkMs = 190.85, deferred = true [5].
2.4 Runtime constraints
The macOS App Sandbox entitlements file [4] declares neither com.apple.security.network.client nor com.apple.security.network.server. STT uses SFSpeechRecognizer with requiresOnDeviceRecognition = true [7]; on macOS 26 the path migrates to SpeechAnalyzer with AssetInventory-mediated bilingual model download [9,10]. The only sanctioned network operation across the binary is the OS-mediated MLX cache fetch of mlx-community/gemma-4-e4b-it-4bit on first launch [1].
3. Results
TTFT median
p10 / p90
fast / slow ratio
n
Figure 1. Per-turn first-chunk latency (firstChunkMs) for n=10 utterances on the fast path with reused KV state. Two turns dispatch defer_to_human (hatched). Median ≈ 190 ms; p10 = 181 ms; p90 = 263 ms. Source: claudedocs/bench/2026-05-06-latency-bench.json [5].
3.1 Fast-path vs. slow-fallback (PR-Λ reuse probe)
The reuse probe re-issues the same Korean utterance ("왜 사람들은 자신과 다른 의견을 가진 사람을 미워하게 될까?") twice. Fast-path (KV state present): firstChunkMs = 264.7, wallMs = 798.3. Slow-fallback (cold KV): firstChunkMs = 4569.5, wallMs = 5113.8. Reported ttftRatio = 17.26 [5]. Wall-clock per turn aggregated over n=10: median 808.6 ms, mean 884.1 ms, max 1190.4 ms.
3.2 Refusal channel does not penalize latency
The two defer_to_human turns (medical, financial) measure firstChunkMs of 190.85 and 205.18 ms — within the central mass of the ask_back distribution. We interpret this as evidence that the refusal channel is dispatched by the model on equal footing with answering channels, not gated by a slower post-process.
4. Limitations
- Multi-year recall is designed-for, not as-shipped.
surface_past_wondercurrently consumes a stubbed wondering-log adapter; the 256K-context retrieval logic is wired but the persistence integration is Phase 4 work [2,6]. Claims about multi-year inquiry continuity should be read prospectively. - Single-host measurement. All latencies were measured on one MacBook Pro M1 Max, 64 GB RAM, macOS 26, with the binary built under
SWIFT_STRICT_CONCURRENCY = complete. Cross-host variance is not characterized in this report [5]. - macOS 26 floor. The shipped
AssetInventorybilingual STT path requires macOS 26 (SPEC iter6) [10]; older OSes are out of scope. - Korean tone evaluation is qualitative. The verbatim 평어체 lock is a design constraint, not a measured outcome; we do not report a Korean register-classification benchmark in this artifact.
- No photoreal lip-sync claim. Visemes are a 16-PNG 1-bit halftone swap with ≥2-frame hold and audio-clock sync [8,11]. We make no perceptual-realism claim.
- Threats to determinism. The asset pipeline relies on Pillow ordering and gamma rounding; CI verifies the manifest SHA-256 across two consecutive builds [8]. Pillow upstream drift would require re-pinning.
5. Discussion
Treating defer_to_human as a first-class function call inverts the typical safety-as-postprocess pattern. The bench (§3.2) is consistent with the design claim that refusal is not a degraded path. Combined with the NO-CLOUD invariant — which removes a class of exfiltration risks at the entitlement layer rather than at the policy layer [4,7] — the result is a system whose safety posture can be inspected by reading the entitlements file and the function-call contract, two artifacts that ship with the binary.
We see the contribution as an engineering pattern, not a model novelty: locked refusal channels + verbatim tone constraints + entitlement-level exfiltration prevention, applied to a small open-weights model [1]. We invite reviewers to reproduce §3 by running make engine-test against the stubbed engine and inspecting claudedocs/bench/2026-05-06-latency-bench.json against their own M-series host [12].
References
- Google DeepMind. Gemma 4 model family. deepmind.google/models/gemma/gemma-4/. Variant:
mlx-community/gemma-4-e4b-it-4bit, 3.97 GB. - Two-Weeks-Team. function_call_contract.yaml. runs/2026-05-05-spec/spec/function_call_contract.yaml.
- Two-Weeks-Team. SystemPrompt.swift (Korean 평어체, verbatim, compile-time embedded). Sources/SocraticEngine/Gemma/SystemPrompt.swift.
- Two-Weeks-Team. HeWasSocrates.entitlements (no
network.client/network.server). apps/macos/HeWasSocrates/.../HeWasSocrates.entitlements. - Two-Weeks-Team. 2026-05-06-latency-bench.json (PR-Λ verify-2, n=10, M1 Max MBP 64 GB). claudedocs/bench/2026-05-06-latency-bench.json.
- Two-Weeks-Team. EngineCoordinator.swift — turn loop and Phase enum. packages/SocraticEngine/Sources/SocraticEngine/EngineCoordinator.swift.
- Apple Developer. SFSpeechRecognizer / requiresOnDeviceRecognition. developer.apple.com/documentation/speech/sfspeechrecognizer.
- Two-Weeks-Team. Asset pipeline determinism (manifest SHA-256, CI gate). scripts/ +
assets/.build-manifest.json. - Apple Developer. WWDC25 Session 277 — SpeechAnalyzer. developer.apple.com/videos/play/wwdc2025/277/.
- Two-Weeks-Team. SPEC.md.iter6 (macOS 26 floor, AssetInventory bilingual). runs/2026-05-05-spec/spec/.
- Two-Weeks-Team. SPEC.md.iter5 (JamoTimeline fallback for absent Apple phoneme markers). runs/2026-05-05-spec/spec/.
- Two-Weeks-Team. Makefile targets (
make engine-test,make ci-local). Makefile. - Apple. MLX framework and
mlx-swift-lmbindings. github.com/ml-explore/mlx-swift. - Kaggle × Google DeepMind. The Gemma 4 Good Hackathon. kaggle.com/competitions/gemma-4-good-hackathon.
BibTeX
@misc{hewassocrates2026maieutic,
title = {Maieutic Abstention: A Korean Socratic LLM Surface
with Locked Refusal Channels},
author = {{Two-Weeks-Team}},
year = {2026},
month = {may},
howpublished = {Preprint, run r-20260507-010321},
note = {On-device Gemma 4 E4B 4-bit MLX on macOS 26.
TTFT median 190 ms (n=10, M1 Max).
No network entitlement; defer\_to\_human as
load-bearing function call.},
url = {https://github.com/Two-Weeks-Team/he-was-socrates},
lockSHA = {e5dfadf2c8...314c5}
}