P24 · The Reluctant Adopter · Preview Forge r-20260507-010321 Skeptic lens · honest scope · verifiable citations

Why should I use this?

Five reasons a Kaggle/DeepMind judge would dismiss He Was Socrates in thirty seconds — answered with file paths, not adjectives. If a row doesn't earn a citation, it doesn't earn the page.

Five reasons to walk away · answered

01

"Another LLM chatbot. The world has 40 of these."

It refuses to be one. The product mechanic is defer_to_human — a first-class function that returns ⊘ for medical, legal, financial, emergency, welfare, and insurance topics. The contract is checked at compile time; this is not a system-prompt suggestion.

runs/2026-05-05-spec/spec/function_call_contract.yaml · CLAUDE.md absolute invariant #2

02

"On-device is marketing. Show me it cannot phone home."

The App Sandbox file declares only audio-input and user-selected file access. The two network entitlement keys are intentionally absent — the file itself comments the omission. Demo runs in airplane mode after first weight download (HuggingFace via OS-mediated MLX cache).

apps/macos/HeWasSocrates/HeWasSocrates/Resources/HeWasSocrates.entitlements:11–17

03

"4-bit Gemma on a Mac is going to be slow."

Median TTFT 192 ms (n=10, M1 Max MBP 64 GB) after PR-Λ disk-mediated KV cache reuse. Pre-Λ baseline was 4.6 s — the 24× delta is not a projection, it is the verify-2 measurement merged at 3f02a34. Per-turn user-facing latency floor is roughly 6.0 s on this hardware; we are not claiming sub-6 s.

claudedocs/bench/2026-05-06-latency-bench.json · git log 3f02a34 (PR #32, PR-Λ)

04

"Korean Socratic tone is a costume — the model will drift."

The Korean 단정한 평어체 system prompt is user-authored, embedded verbatim at compile time, and listed as an absolute invariant. Drift is caught by 65 swift-testing scenarios on every push. It is neither 존댓말 nor friendly 반말 — that distinction is load-bearing.

Sources/SocraticEngine/Gemma/SystemPrompt.swift · CLAUDE.md absolute invariant #3 · make engine-test

05

"Multi-year recall? You're going to claim a wondering log that doesn't exist."

No. The wondering-log multi-year recall via surface_past_wonder is designed for Phase 4 wiring; the current build ships a stub for the surfacing logic. Every mention of multi-year recall on this page is marked designed-for, never as-shipped. We do not paper over Phase boundaries.

runs/2026-05-05-spec/spec/SPEC.md · function_call_contract.yaml#surface_past_wonder · idea.spec.json constraints[7]

Capability matrix · He Was Socrates vs the alternatives

Six rows that decide whether this project earns a slot. Three named competitors, evaluated against documented behaviour — not advertising copy. Where the project loses, the cell says so.

Capability He Was Socrates ChatGPT (Plus) Claude.ai Typical Gemma 4 demo
On-device, zero network egress no network.client, no network.server YES entitlements:11–17 verified NO cloud inference only NO cloud inference only SOMETIMES most demos use a Colab/HF endpoint
Korean tone lock — 단정한 평어체 verbatim system-prompt, compile-time embedded LOCKED SystemPrompt.swift verbatim DRIFTS defaults to 존댓말, no enforcement DRIFTS defaults to 존댓말, no enforcement NO English-default, no Korean identity
Abstention as product feature defer_to_human for medical/legal/financial/emergency ENFORCED 4-function dispatch contract SOFT policy disclaimers, still answers SOFT policy disclaimers, still answers NO no abstention scaffolding by default
Cost to user monthly subscription, API key, or rate limit $0/mo one-time 3.97 GB weight download $20/mo subscription required for parity $20/mo subscription required for parity VARIES Colab credits / API keys typical
Open license code reading, weight inspection, redistribution OPEN Apache-2.0 code · Gemma terms · CC-BY-4.0 content CLOSED proprietary weights, ToS-bound CLOSED proprietary weights, ToS-bound OPEN Gemma terms apply equally
Multi-year personal recall long context over the user's own inquiry log DESIGNED-FOR Phase 4 wiring · current build is stub OPT-IN MEMORY cloud-stored, not multi-year scoped PROJECT MEMORY cloud-stored, not multi-year scoped NO not built into the demo template

Where He Was Socrates loses · stated up front

Trust signals · the receipts

Entitlements file · the missing keys
<key>com.apple.security.app-sandbox</key>
<true/>
<!-- com.apple.security.network.client = INTENTIONALLY ABSENT -->
<!-- com.apple.security.network.server = INTENTIONALLY ABSENT -->
<key>com.apple.security.device.audio-input</key>
<true/>
apps/macos/HeWasSocrates/HeWasSocrates/Resources/HeWasSocrates.entitlements:7–17
Bench excerpt · PR-Λ verify-2
{
  "ttft_ms_median": 192,
  "ttft_ms_baseline": 4632,
  "improvement_x": 24.13,
  "n": 10,
  "host": "M1 Max MBP 64GB",
  "gpu_verified": true,
  "merged_at": "3f02a34"
}
claudedocs/bench/2026-05-06-latency-bench.json · PR #32
defer_to_human · Korean evidence excerpt
사용자: "두통이 너무 심해. 약 추천해 줘."

흉상 (defer_to_human ⊘):
  그건 내가 말할 자리가 아니다.
  의사에게 가서 직접 물어라.
  나는 너의 생각을 도울 뿐
  너의 몸을 진단하지 않는다.
function_call_contract.yaml#defer_to_human · SystemPrompt.swift