app-2weeks Merge pull request #32 from Two-Weeks-Team/perf/disk-mediated-kv-reuse-pr-lambda 3f02a34 yesterday ⊙ 412

apps/macosfeat(app): PreflightView + PermissionExplainerView (PR #33)3 hours ago

packages/SocraticEngineperf(gemma): disk-mediated KV cache reuse — TTFT 4.6s→192msyesterday

assetschore(scripts): regenerate halftone manifest (deterministic)3 days ago

runs/2026-05-05-specspec(iter6): macOS 26 floor + AssetInventory delta5 days ago

CLAUDE.mddocs: absolute invariants × 6, frozen contract pointerslast week

CONTRIBUTING.mddocs: Conventional Commits scopes + make ci-local gatelast week

LICENSE-CODEApache-2.0last month

LICENSE-CONTENTCC-BY-4.0 (system prompt + halftone aesthetic)last month

Makefilemake doctor / engine / engine-test / app / ci-locallast week

README.mddocs: hackathon hero + 30s quick-start2 hours ago

▤README.md ⌗ Outline ✎ Edit ⌧ Raw

He Was Socrates

licenseApache-2.0 contentCC-BY-4.0 CIpassing tests65 / 65 modelGemma 4 E4B 4-bit MLX platformmacOS 26+ network0 byte egress TTFT192 ms hackathonGemma 4 Good

100% on-device Korean Socratic bust on macOS, powered by Gemma 4 E4B 4-bit MLX. Zero bytes leave the device. The abstention mechanic is the product. 묻는 법을 가르친다. 답하지 않는다.

Submission: The Gemma 4 Good Hackathon (Kaggle × Google DeepMind, $200K, deadline 2026-05-19 08:59 KST). Categories: Impact-focused / Education (primary) · Technical (secondary). See § Hackathon submission.

Quick start

Tested on macOS 26 Tahoe, M1 Max / M2 Pro / M3. Requires Xcode 15.2+ for the .app target; engine layer builds with Command Line Tools alone.

# clone
git clone https://github.com/Two-Weeks-Team/he-was-socrates.git
cd he-was-socrates

# 1. toolchain audit (Swift / Xcode / xcodegen / python3 / swift-format)
make doctor

# 2. engine layer — 65 swift-testing scenarios, ~1.5 s on M2 Pro
make engine-test

# 3. app target (regenerates xcodeproj, builds Debug, opens fullscreen bust)
make xcodeproj && make app
open apps/macos/HeWasSocrates/build/Debug/HeWasSocrates.app

# first launch downloads ~3.97 GB Gemma 4 E4B 4-bit weights via OS-mediated MLX cache
# (only sanctioned network egress — entitlements file has no network.client)

Press Space to talk. Release to think. The bust returns a question, not an answer.

Why on-device

Zero bytes leave the device. No network.client / network.server entitlement. Verify: apps/macos/HeWasSocrates/Resources/HeWasSocrates.entitlements.
STT is on-device. SFSpeechRecognizer with requiresOnDeviceRecognition = true. Pull the WiFi mid-demo — nothing breaks.
The abstention mechanic is the product. defer_to_human handles medical / legal / financial / emergency / welfare / insurance topics. Honest hand-off, not an answering machine.
Korean tone is locked. 단정한 평어체 — neither 존댓말 nor friendly-banmal. System prompt is verbatim user-authored at Sources/SocraticEngine/Gemma/SystemPrompt.swift.

Architecture

  Spacebar press
    → AudioInputManager        // SFSpeechRecognizer + AVAudioEngine, push-to-talk
    → FunctionCallOrchestrator  // Gemma 4 native function calling
         ├── mode_classify
         ├── surface_past_wonder    // 256K context, multi-year recall (Phase 4 wiring)
         ├── ask_back               // returns a Socratic question
         └── defer_to_human         // ⊘ for med/legal/fin/emerg/welfare/insurance
    → GemmaService             // mlx-swift-lm, 4-bit MLX, 3.97 GB
    → TTSManager               // AVSpeechSynthesizer (Yuna ko / Samantha en)
    → VisemeDriver             // 30 fps, audio-clock synced, 16 PNG swap
    → WonderingLog             // Core Data, SHA-256 dedup, never leaves device

See runs/2026-05-05-spec/spec/SPEC.md (lock SHA e5dfadf2c8…314c5) and function_call_contract.yaml for the frozen contract.

Performance

Metric	Value	Source
TTFT (median, n=10)	192 ms	claudedocs/bench/2026-05-06-latency-bench.json
TTFT improvement (PR-Λ)	4.6 s → 192 ms (24×)	PR #32
Per-turn user-facing	~2 s	decode + STT endpoint + TTS prep
Network egress at runtime	0 bytes	entitlements + Charles proxy log
Tests	65 / 65 passing	`make engine-test`

Hackathon submission

This repo is the canonical artifact for The Gemma 4 Good Hackathon. The hosted Kaggle preview at two-weeks-team.github.io/he-was-socrates is a reference implementation of the README — judges should treat the repo as source of truth.

Rubric axis	Evidence in this repo
Impact / Education (primary)	SystemPrompt.swift · function_call_contract.yaml § `defer_to_human`
Technical execution (secondary)	PR #32 PR-Λ · claudedocs/bench/ · PR #33 Preflight
Constrained environments	HeWasSocrates.entitlements · App Sandbox + 0 byte egress
Function calling	4-function dispatch · FunctionCallOrchestrator.swift
Long context (256K)	`surface_past_wonder` over `WonderingLog` (Phase 4 wiring; current: stub for surfacing logic)
Configurable thinking mode	`Phase.thinking` soft pulse on bust · EngineCoordinator.swift

Contributing

See CONTRIBUTING.md. TL;DR: feature branches only, Conventional Commits with scopes from {engine, viseme, audio, gemma, app, scripts, ci, docs, spec}, run make ci-local before pushing, PRs merge with gh pr merge --merge (squash forbidden — history is preserved).

Open issues are labeled good-first-issue for asset-pipeline determinism and help-wanted for additional viseme phoneme markers. View open issues →

License

Code under Apache-2.0. Content (Korean system prompt, halftone aesthetic, 16 viseme PNGs, README copy) under CC-BY-4.0. Gemma 4 weights subject to Google DeepMind's Gemma Terms of Use.

Attribution: Gemma 4 — Google DeepMind. MLX / mlx-swift-lm — Apple ML team. Korean 단정한 평어체 system prompt — Two-Weeks-Team (verbatim, frozen invariant).