Five reasons a Kaggle/DeepMind judge would dismiss He Was Socrates in thirty seconds — answered with file paths, not adjectives. If a row doesn't earn a citation, it doesn't earn the page.
"Another LLM chatbot. The world has 40 of these."
It refuses to be one. The product mechanic is defer_to_human —
a first-class function that returns ⊘ for medical, legal, financial, emergency,
welfare, and insurance topics. The contract is checked at compile time; this is
not a system-prompt suggestion.
runs/2026-05-05-spec/spec/function_call_contract.yaml · CLAUDE.md absolute invariant #2
"On-device is marketing. Show me it cannot phone home."
The App Sandbox file declares only audio-input and user-selected file access. The two network entitlement keys are intentionally absent — the file itself comments the omission. Demo runs in airplane mode after first weight download (HuggingFace via OS-mediated MLX cache).
apps/macos/HeWasSocrates/HeWasSocrates/Resources/HeWasSocrates.entitlements:11–17
"4-bit Gemma on a Mac is going to be slow."
Median TTFT 192 ms (n=10, M1 Max MBP 64 GB) after PR-Λ disk-mediated
KV cache reuse. Pre-Λ baseline was 4.6 s — the 24× delta is not a projection,
it is the verify-2 measurement merged at 3f02a34. Per-turn user-facing
latency floor is roughly 6.0 s on this hardware; we are not claiming sub-6 s.
claudedocs/bench/2026-05-06-latency-bench.json · git log 3f02a34 (PR #32, PR-Λ)
"Korean Socratic tone is a costume — the model will drift."
The Korean 단정한 평어체 system prompt is user-authored, embedded verbatim at compile time, and listed as an absolute invariant. Drift is caught by 65 swift-testing scenarios on every push. It is neither 존댓말 nor friendly 반말 — that distinction is load-bearing.
Sources/SocraticEngine/Gemma/SystemPrompt.swift · CLAUDE.md absolute invariant #3 · make engine-test
"Multi-year recall? You're going to claim a wondering log that doesn't exist."
No. The wondering-log multi-year recall via surface_past_wonder
is designed for Phase 4 wiring; the current build ships a stub for the
surfacing logic. Every mention of multi-year recall on this page is marked
designed-for, never as-shipped. We do not paper over Phase boundaries.
runs/2026-05-05-spec/spec/SPEC.md · function_call_contract.yaml#surface_past_wonder · idea.spec.json constraints[7]
Six rows that decide whether this project earns a slot. Three named competitors, evaluated against documented behaviour — not advertising copy. Where the project loses, the cell says so.
| Capability | He Was Socrates | ChatGPT (Plus) | Claude.ai | Typical Gemma 4 demo |
|---|---|---|---|---|
On-device, zero network egress
no network.client, no network.server |
YES entitlements:11–17 verified | NO cloud inference only | NO cloud inference only | SOMETIMES most demos use a Colab/HF endpoint |
| Korean tone lock — 단정한 평어체 verbatim system-prompt, compile-time embedded | LOCKED SystemPrompt.swift verbatim | DRIFTS defaults to 존댓말, no enforcement | DRIFTS defaults to 존댓말, no enforcement | NO English-default, no Korean identity |
Abstention as product feature
defer_to_human for medical/legal/financial/emergency |
ENFORCED 4-function dispatch contract | SOFT policy disclaimers, still answers | SOFT policy disclaimers, still answers | NO no abstention scaffolding by default |
| Cost to user monthly subscription, API key, or rate limit | $0/mo one-time 3.97 GB weight download | $20/mo subscription required for parity | $20/mo subscription required for parity | VARIES Colab credits / API keys typical |
| Open license code reading, weight inspection, redistribution | OPEN Apache-2.0 code · Gemma terms · CC-BY-4.0 content | CLOSED proprietary weights, ToS-bound | CLOSED proprietary weights, ToS-bound | OPEN Gemma terms apply equally |
| Multi-year personal recall long context over the user's own inquiry log | DESIGNED-FOR Phase 4 wiring · current build is stub | OPT-IN MEMORY cloud-stored, not multi-year scoped | PROJECT MEMORY cloud-stored, not multi-year scoped | NO not built into the demo template |
<key>com.apple.security.app-sandbox</key> <true/> <!-- com.apple.security.network.client = INTENTIONALLY ABSENT --> <!-- com.apple.security.network.server = INTENTIONALLY ABSENT --> <key>com.apple.security.device.audio-input</key> <true/>
{
"ttft_ms_median": 192,
"ttft_ms_baseline": 4632,
"improvement_x": 24.13,
"n": 10,
"host": "M1 Max MBP 64GB",
"gpu_verified": true,
"merged_at": "3f02a34"
}
사용자: "두통이 너무 심해. 약 추천해 줘." 흉상 (defer_to_human ⊘): 그건 내가 말할 자리가 아니다. 의사에게 가서 직접 물어라. 나는 너의 생각을 도울 뿐 너의 몸을 진단하지 않는다.