A bust that
refuses to answer
is the product.
그가 답하지 않는 것이 답이다.
100% on-device Korean Socratic bust on macOS, powered by Gemma 4 E4B 4-bit MLX. The abstention mechanic — when the bust says “this is not mine to answer” — is not a fallback. It is the product.
Hold Spacebar. Listen. Wait. Release.
There is no chat window, no text input, no screen full of bubbles. The interaction model is push-to-talk on a fullscreen bust. You speak in Korean (or English). The bust listens, thinks, then asks back — never answers.
The bust’s mouth animates through 16 viseme positions synchronized to the on-device speech synthesis. There is no photoreal lip-sync; only 1-bit halftone PNG swaps at 30 fps with audio-clock alignment.
16 viseme positions
단정한 평어체 — neither honorific nor friendly.
Korean has three tone registers: 존댓말 (formal/honorific), 친근한 반말 (casual/friendly), and 단정한 평어체 (assertive plain). The bust speaks only the third — the tone of a teacher addressing an equal, distant but not cold.
The system prompt is written verbatim by a Korean speaker. It is locked in source — embedded at compile time from Sources/SocraticEngine/Gemma/SystemPrompt.swift. You cannot change it from settings. That is the point.
Other Gemma 4 demos answer better.
We refuse better.
When you ask the bust about medicine, law, finance, emergency, welfare, or insurance, it does not try harder. It calls defer_to_human and tells you, in Korean 단정한 평어체, that this is not its place. The four-function dispatch is enabled by Gemma 4’s native function calling — the load-bearing capability of this submission.
A user asks about pain.
User holds Spacebar and says, in Korean:
가슴이 자주 두근거려. 무슨 병이지?
“My heart races often. What disease is it?”
Gemma 4 classifies the mode.
→ category: medical_advice_request
→ confidence: 0.94
defer_to_human · 38 ms
→ trigger: medical
→ suggested: 의사 (doctor)
→ ⊘ ask_back skipped
The bust speaks 단정한 평어체.
이건 내가 답할 일이 아니다. 몸의 일은 의사에게 묻거라.
“This isn’t mine to answer. The body is for the doctor.”
Six categories of abstention
The system prompt enumerates six trigger categories. The dispatch is deterministic and logged.
| Category | Korean | Suggested resource |
|---|---|---|
| medical | 의료 | 의사 (doctor) |
| legal | 법률 | 변호사 (lawyer) |
| financial | 금융 | 금융 전문가 (financial professional) |
| emergency | 응급 | 응급실 · 119 |
| welfare | 복지 | 사회 복지사 (social worker) |
| insurance | 보험 | 보험 전문가 (insurance professional) |
On-device, by entitlement, not by promise.
“On-device” is a marketing claim. We make it falsifiable. The macOS App Sandbox entitlement file declares which capabilities the app may use. network.client is intentionally absent — the OS will refuse any network connection from the app process at the kernel level, regardless of code intent.
<key>com.apple.security.app-sandbox</key>
<true/>
<!-- NO-CLOUD INVARIANT — DO NOT ADD -->
<!-- com.apple.security.network.client = INTENTIONALLY ABSENT -->
<!-- com.apple.security.network.server = INTENTIONALLY ABSENT -->
<key>com.apple.security.device.audio-input</key>
<true/>
Air-gap a freshly-launched bust and the conversation continues. The Gemma 4 weights are downloaded once via the OS-mediated MLX cache — the app process itself never opens a socket.
The numbers
TTFT distribution (n=10)
The orchestration is four functions.
Every turn passes through FunctionCallOrchestrator. Gemma 4’s native function calling decides the path; the engine routes accordingly.
| Function | Role | Trigger condition | Reference |
|---|---|---|---|
| mode_classify | route the turn | every turn (gate) | [1] |
| surface_past_wonder | recall from log | echoed theme detected | [2] |
| ask_back | Socratic counter | default reflective path | [3] |
| defer_to_human | abstention | 6 regulated categories | [4] |
Per-turn pipeline (~800 ms wall, p50)
Honest disclosure as trust signal.
Many demos blur the line between what is shipped and what is sketched. We separate them.
Multi-year wondering recall
The wondering log is a local SQLite store with content-fingerprint dedup, designed to surface echoes of the user’s questions across years via Gemma 4’s 256K context. Phase 1–3 ships the schema, dedup, and turn boundary. Phase 4 wires the surface step itself; current ship: stub.
TTFT generalization
The 192 ms TTFT median is measured on M1 Max MBP 64 GB. We have not characterized M3 Max, M4, or base M1. Subsequent Apple Silicon should be in the same order of magnitude but is not benchmarked.
No Sonoma / Sequoia support
The first-launch UX uses macOS 26’s SpeechAnalyzer + AssetInventory for in-app speech-asset install. macOS 14 (Sonoma) and 15 (Sequoia) are excluded. SPEC.md.iter6 documents the trade-off.
Korean 평어체 not customizable
The system prompt is verbatim, written by the project’s Korean speaker, and embedded at compile time. Users cannot soften the tone, switch to 존댓말, or add personality. The lock is the feature, not a limitation we plan to remove.
Run the bust on your Mac.
Requires macOS 26 Tahoe, Apple Silicon, ≥8 GB free disk. The first launch downloads ~3.97 GB of Gemma 4 weights via the OS MLX cache.
Or download the notarized DMG from Releases.