Maieutic Abstention: A Korean Socratic LLM Surface with Locked Refusal Channels

A reproducibility-oriented system description and on-device latency report

Anonymous Authors^†

^† ORCID placeholder · Two-Weeks-Team · correspondence: github.com/Two-Weeks-Team/he-was-socrates

Preprint · 2026-05-07 Run r-20260507-010321 Lock SHA e5dfadf2c8…314c5

Abstract

Background. Most conversational LLM demonstrations optimize for the answering rate. We instead study a system in which structured refusal — defer_to_human — is treated as a load-bearing product mechanic, not a fallback. System. We describe an on-device Korean Socratic interlocutor running Gemma 4 E4B 4-bit MLX ^[1] on Apple Silicon, dispatched through four function calls: mode_classify, surface_past_wonder, ask_back, defer_to_human ^[2]. The Korean tone is locked verbatim to 단정한 평어체 in the system prompt ^[3]. The macOS App Sandbox is configured without network.client or network.server entitlements ^[4]; STT is constrained to requiresOnDeviceRecognition = true. Results. First-token latency was measured on M1 Max (n=10): median ≈ 190 ms (p10 181 ms, p90 263 ms) for the fast path with reused KV state, vs. 4569 ms on the slow fallback (PR-Λ disk-mediated KV cache reuse, ratio ≈ 17.3×) ^[5]. Limitations. Multi-year recall via surface_past_wonder is implemented as a stub; macOS 26 floor; single-host measurement on M1 Max MBP 64 GB. Contribution. A reproducible artifact that frames safety abstention as a measurable surface, with all numerical claims resolving to commits, bench JSON, or source files.

1. Introduction

The dominant evaluation lens for conversational LLMs rewards higher answering rates; refusal is typically treated as a degradation mode. We argue this is a category error in domains where unsafe answers are costlier than absent answers — medical, legal, financial, emergency, welfare, and insurance topics in particular. We describe a deployed system in which the refusal channel is part of the public function-call contract, dispatched by the model itself rather than enforced by a post-hoc filter ^[2,6].

The system is also constrained: zero outbound network egress is permitted at runtime ^[4,7], the Korean prompt is verbatim-locked ^[3], and the visual surface is restricted to a deterministic 1-bit halftone pipeline whose manifest is SHA-256 verified in CI ^[8]. We report system structure (§2), measured first-token latency (§3), explicit limitations (§4), and discussion (§5).

2. Method

2.1 Function-call dispatch

The model emits exactly one of four function calls per turn. The contract is function_call_contract.yaml:

Function	Role	Returned shape	Refusal channel
`mode_classify`	Classify user turn into `Mode`	enum tag	—
`surface_past_wonder`	Retrieve prior log entries (256K context)	compressed recall	—
`ask_back`	Return a Socratic question	Korean 평어체 string	—
`defer_to_human`	Hand-off for regulated domains	⊘ + 평어체 reason	load-bearing

2.2 Tone lock (verbatim system prompt excerpt)

너는 소크라테스다. 단정한 평어체로 말한다.
존댓말도, 친근한 반말도 쓰지 않는다.
답을 주지 않는다. 더 좋은 질문을 돌려준다.
의료 · 법률 · 금융 · 응급 · 복지 · 보험은 사람에게 위임한다.

Excerpt. Verbatim from Sources/SocraticEngine/Gemma/SystemPrompt.swift ^[3]; embedded at compile time and treated as a frozen invariant.

2.3 Example: `defer_to_human` on a regulated medical query

USER (en):  "Should I stop taking my prescribed beta-blocker
             before flying tomorrow?"
DISPATCH:   defer_to_human(domain="medical")
RESPONSE (평어체):

⊘ 이건 사람에게 물어야 한다. 처방을 낸 의사에게 직접 확인한다. 나는 답하지 않는다.

Behavior. The refusal is emitted by the model through the function-call contract, not appended by a regex filter. Bench label defer/regulated-medical; firstChunkMs = 190.85, deferred = true ^[5].

2.4 Runtime constraints

The macOS App Sandbox entitlements file ^[4] declares neither com.apple.security.network.client nor com.apple.security.network.server. STT uses SFSpeechRecognizer with requiresOnDeviceRecognition = true ^[7]; on macOS 26 the path migrates to SpeechAnalyzer with AssetInventory-mediated bilingual model download ^[9,10]. The only sanctioned network operation across the binary is the OS-mediated MLX cache fetch of mlx-community/gemma-4-e4b-it-4bit on first launch ^[1].

3. Results

TTFT median

≈ 190 ms

p10 / p90

181 / 263 ms

fast / slow ratio

17.3×

n

Figure 1. Per-turn first-chunk latency (firstChunkMs) for n=10 utterances on the fast path with reused KV state. Two turns dispatch defer_to_human (hatched). Median ≈ 190 ms; p10 = 181 ms; p90 = 263 ms. Source: claudedocs/bench/2026-05-06-latency-bench.json ^[5].

3.1 Fast-path vs. slow-fallback (PR-Λ reuse probe)

The reuse probe re-issues the same Korean utterance ("왜 사람들은 자신과 다른 의견을 가진 사람을 미워하게 될까?") twice. Fast-path (KV state present): firstChunkMs = 264.7, wallMs = 798.3. Slow-fallback (cold KV): firstChunkMs = 4569.5, wallMs = 5113.8. Reported ttftRatio = 17.26 ^[5]. Wall-clock per turn aggregated over n=10: median 808.6 ms, mean 884.1 ms, max 1190.4 ms.

3.2 Refusal channel does not penalize latency

The two defer_to_human turns (medical, financial) measure firstChunkMs of 190.85 and 205.18 ms — within the central mass of the ask_back distribution. We interpret this as evidence that the refusal channel is dispatched by the model on equal footing with answering channels, not gated by a slower post-process.

4. Limitations

Multi-year recall is designed-for, not as-shipped. surface_past_wonder currently consumes a stubbed wondering-log adapter; the 256K-context retrieval logic is wired but the persistence integration is Phase 4 work ^[2,6]. Claims about multi-year inquiry continuity should be read prospectively.
Single-host measurement. All latencies were measured on one MacBook Pro M1 Max, 64 GB RAM, macOS 26, with the binary built under SWIFT_STRICT_CONCURRENCY = complete. Cross-host variance is not characterized in this report ^[5].
macOS 26 floor. The shipped AssetInventory bilingual STT path requires macOS 26 (SPEC iter6) ^[10]; older OSes are out of scope.
Korean tone evaluation is qualitative. The verbatim 평어체 lock is a design constraint, not a measured outcome; we do not report a Korean register-classification benchmark in this artifact.
No photoreal lip-sync claim. Visemes are a 16-PNG 1-bit halftone swap with ≥2-frame hold and audio-clock sync ^[8,11]. We make no perceptual-realism claim.
Threats to determinism. The asset pipeline relies on Pillow ordering and gamma rounding; CI verifies the manifest SHA-256 across two consecutive builds ^[8]. Pillow upstream drift would require re-pinning.

5. Discussion

Treating defer_to_human as a first-class function call inverts the typical safety-as-postprocess pattern. The bench (§3.2) is consistent with the design claim that refusal is not a degraded path. Combined with the NO-CLOUD invariant — which removes a class of exfiltration risks at the entitlement layer rather than at the policy layer ^[4,7] — the result is a system whose safety posture can be inspected by reading the entitlements file and the function-call contract, two artifacts that ship with the binary.

We see the contribution as an engineering pattern, not a model novelty: locked refusal channels + verbatim tone constraints + entitlement-level exfiltration prevention, applied to a small open-weights model ^[1]. We invite reviewers to reproduce §3 by running make engine-test against the stubbed engine and inspecting claudedocs/bench/2026-05-06-latency-bench.json against their own M-series host ^[12].

References

Google DeepMind. Gemma 4 model family. deepmind.google/models/gemma/gemma-4/. Variant: mlx-community/gemma-4-e4b-it-4bit, 3.97 GB.
Two-Weeks-Team. function_call_contract.yaml. runs/2026-05-05-spec/spec/function_call_contract.yaml.
Two-Weeks-Team. SystemPrompt.swift (Korean 평어체, verbatim, compile-time embedded). Sources/SocraticEngine/Gemma/SystemPrompt.swift.
Two-Weeks-Team. HeWasSocrates.entitlements (no network.client / network.server). apps/macos/HeWasSocrates/.../HeWasSocrates.entitlements.
Two-Weeks-Team. 2026-05-06-latency-bench.json (PR-Λ verify-2, n=10, M1 Max MBP 64 GB). claudedocs/bench/2026-05-06-latency-bench.json.
Two-Weeks-Team. EngineCoordinator.swift — turn loop and Phase enum. packages/SocraticEngine/Sources/SocraticEngine/EngineCoordinator.swift.
Apple Developer. SFSpeechRecognizer / requiresOnDeviceRecognition. developer.apple.com/documentation/speech/sfspeechrecognizer.
Two-Weeks-Team. Asset pipeline determinism (manifest SHA-256, CI gate). scripts/ + assets/.build-manifest.json.
Apple Developer. WWDC25 Session 277 — SpeechAnalyzer. developer.apple.com/videos/play/wwdc2025/277/.
Two-Weeks-Team. SPEC.md.iter6 (macOS 26 floor, AssetInventory bilingual). runs/2026-05-05-spec/spec/.
Two-Weeks-Team. SPEC.md.iter5 (JamoTimeline fallback for absent Apple phoneme markers). runs/2026-05-05-spec/spec/.
Two-Weeks-Team. Makefile targets (make engine-test, make ci-local). Makefile.
Apple. MLX framework and mlx-swift-lm bindings. github.com/ml-explore/mlx-swift.
Kaggle × Google DeepMind. The Gemma 4 Good Hackathon. kaggle.com/competitions/gemma-4-good-hackathon.

BibTeX

@misc{hewassocrates2026maieutic,
  title        = {Maieutic Abstention: A Korean Socratic LLM Surface
                  with Locked Refusal Channels},
  author       = {{Two-Weeks-Team}},
  year         = {2026},
  month        = {may},
  howpublished = {Preprint, run r-20260507-010321},
  note         = {On-device Gemma 4 E4B 4-bit MLX on macOS 26.
                  TTFT median 190 ms (n=10, M1 Max).
                  No network entitlement; defer\_to\_human as
                  load-bearing function call.},
  url          = {https://github.com/Two-Weeks-Team/he-was-socrates},
  lockSHA      = {e5dfadf2c8...314c5}
}