🎬 NEW · Interactive Demo Prototype (v0.2 · 2026-05-15)

3분 자동시연 · Judge × Participant 양쪽 viewport · 2026 디자인 트렌드 (OKLCH · Bento · 3D 콘스털레이션) · 백엔드 0줄

Onboarding Brief

Glasshat — 온보딩 종합 보고서

"The panel that audits itself." — 신규 팀원에게 현재 상태·의사결정·기술 분석·우승 전략을 한 번에 인계하는 종합 자료.

버전 v1.1 · 2026-05-15 KST · prototype 추가 상태: 계획·검증·prototype 완료 / 코드 미시작 생성: Claude Code

§0TL;DR 핵심 요약

Glasshat은 덱 PDF + GitHub 레포지토리를 입력으로 받아, 6-perspective AI 패널(Six Thinking Hats)이 BMAD 17항목 / 100점 rubric으로 점수를 매기는 비-챗봇 평가 파이프라인. 차별점은 패널이 자기 자신의 편향을 실시간으로 감지하고 자가-교정하는 audit-the-auditor 루프가 화면에서 보인다는 점. 단일 코드베이스로 Qdrant VSD 해커톤(2026-06-01) + Google Cloud Rapid Agent 해커톤 / Arize 트랙(2026-06-11) 양쪽에 다른 narration으로 제출.

제출 마감 #1

Jun 01

Qdrant VSD 23:59 PT

제출 마감 #2

Jun 11

Rapid Agent 14:00 PT

P(Qdrant top-3)

38-45%

post-spike 추정

P(Arize top-3)

62-68%

post-spike 추정

상태

PLANNED

스파이크 7/7 PASS · 코드 미작성

APPLY 기능

+9 STRETCH · 5 CUT

💡 한 문장 핵심 Glasshat은 "AI 패널이 자기 자신을 감사한다"는 단일 wow 모먼트를 두 해커톤에서 다르게 narration해서 양쪽 우승을 노리는 단일 프로젝트입니다. 모든 advanced 기능 결정, wow 모먼트 디자인, 기술 검증 spike는 모두 완료되었고, 다음 단계는 Phase 1 코드 작성입니다.

§1Glasshat이란

1.1 제품 정의

Glasshat은 artifact-ingesting fair-evaluation agent입니다. 채팅 인터페이스가 아닙니다. 사용자가 두 개의 드롭존(PDF 덱 + GitHub URL)에 자료를 넣으면, 자율적으로 다음 파이프라인을 실행합니다:

Ingest — PDF → Gemini multimodal 파싱 → 청크화 → Qdrant 임베딩 / 레포 shallow clone → 정적 휴리스틱 → 샘플 코드 → Qdrant 임베딩
Plan — Blue 오케스트레이터가 검수 가능한 plan object 발행 (어떤 hat, 어떤 BMAD 항목, 어떤 기법 적용, 검색 예산 등). 사용자가 plan을 검토/수정 (Human Gate 1)
Run Panel — 6 hats가 병렬 실행 (White facts / Red intuition / Yellow value / Black risk / Green creative / Blue synthesis)
Audit-the-Auditor — Black hat이 Yellow의 점수 편향 감지 → Blue planner가 Phoenix MCP에 과거 드리프트 조회 → Qdrant Recommendation API로 anti-pattern 앵커 검색 → 점수가 화면에서 자가-교정
Score — BMAD 17항목 / 100점 채점, 모든 sub-score에 evidence 부착
Report — 서명된 audit trail, 2D radar + 3D 평가 그래프, vector-search 페이지. 사용자가 임의 점수 override 가능 (Human Gate 2)

1.2 왜 "non-chatbot"인가 (Qdrant 룰)

Qdrant VSD 해커톤은 "submissions that are only chatbots are not allowed"가 hard rule입니다. Glasshat의 UI는 드롭존 + 플랜 카드 + 모니터 + 리포트 + 3D 그래프 + 벡터 검색 페이지로 구성되며 채팅 박스가 어디에도 없습니다.

1.3 왜 "agent that does a task"인가 (Rapid Agent 룰)

입력이 프롬프트가 아니라 아티팩트이며, plan → tool use → execute → 사용자 통제 유지 → 결과물 생성 패턴을 만족합니다. Google ADK on Cloud Run + Vertex AI Agent Builder에 등록 + Phoenix MCP 파트너 통합으로 partner-track 자격을 충족합니다.

1.4 fairthon 컨셉 계보 (코드 재사용 아님)

Glasshat의 방법론 — Six Thinking Hats + BMAD rubric + 75 기법 + 3D 그래프 — 는 fairthon.com(동일 팀)의 컨셉에서 파생됩니다. 그러나 그 방법론은 설계/스펙이지 코드가 아닙니다. fairthon 또는 어떤 이전 프로젝트의 소스 코드도 재사용하지 않으며, 본 레포의 모든 코드는 해커톤 기간(2026-05-13 첫 커밋) 내에 새로 작성되었습니다. 양쪽 해커톤의 "all code in period" 룰을 컴플라이언스 합니다.

§2듀얼 해커톤 컨텍스트

2.1 Qdrant VSD "Think Outside the Bot" 2026 (Primary)

항목	값
마감	`2026-06-01 23:59 PT` (= 2026-06-02 ~16:00 KST)
심사 기준	Functionality · Originality · User Experience (equal weight)
제출 채널	`https://forms.gle/YDQ2TDUi8MqS9Vx28` (Google Form · Devpost 아님)
상금	1st $5K · 2nd $3K · 3rd $2K + Best-in-Category 스폰서 보너스(2026 스폰서 미공개; 2025 = Mistral $3K · CrewAI 1yr · Neo4j credits · Superlinked $1K · TwelveLabs $1K)
제출 요건	Public/private GitHub repo + README.md + 데모영상 ≤3분 (Loom/YouTube/Dropbox) + basic code comments
Hard rules	No chatbots · Qdrant DB 필수 (material part) · All code in period · 팀 1-4명 · 18+ · 팀 간 작업 공유 금지
2025 우승작 패턴	🥇 Vector Vintage (3D terrain commerce · R3F · Qdrant+Mistral+Neo4j) · 🥈 RoboBank (robot trajectory memory) · 🥉 Spatio-Temporal NPCs (game memory). R3F가 top winners 중 3건에 등장.

2.2 Google Cloud Rapid Agent Hackathon — Arize Track (Secondary, repackaging)

항목	값
기간	2026-05-05 12:00 PT → `2026-06-11 14:00 PT` (= 2026-06-12 ~06:00 KST)
심사 (4축 동등)	(1) Tech Implementation ← 타이브레이커 1순위 (2) Design (3) Potential Impact (4) Quality of Idea 타이 발생 시 위 순서로 비교
Stage 1 pass/fail	Gemini 3 + Agent Builder + Partner MCP + Google Cloud 모두 사용 증빙
Arize 트랙 specifics	OpenInference instrumentation + Phoenix traces 송신 + Phoenix MCP runtime introspection + LLM-as-judge evals + (보너스) self-improvement loop. Runtime: Cloud Run / ADK / Gemini CLI / Agent Runtime / Gemini Enterprise SDK
상금	1st $5K · 2nd $3K · 3rd $2K per track ($60K total · 6 tracks: Arize, Elastic, Dynatrace, Fivetran, GitLab, MongoDB)
제출 요건	Hosted URL · public OSS repo (Apache-2.0 detectable in About sidebar) · 데모영상 ≤3분 (YouTube/Vimeo만) · English subtitles · Devpost 제출 form
자격	제외국 명단 ≠ Korea ✓ · 18+ · 팀 1-4 · Google/Partner 직원 제외
경쟁 제약	"선택한 파트너와 직접 경쟁하는 서비스 사용 불가" → Arize(observability) ≠ Qdrant(vector DB) 카테고리 다름, README §7.2에서 명시적 구분

2.3 듀얼 제출 전략 — 단일 엔진, 두 narration

🎯 Qdrant 데모 강조점 Visual + theatrical · 3D evaluation graph · vector-anchored 점수 자가-교정 · 524-corpus 메타 narrative ("우리는 이 해커톤의 4,499개 제출작 중 524개를 평가해봤다")

🔧 Arize 데모 강조점 Technical depth · Phoenix MCP 4개 호출 trace tree 가시화 · self-improvement loop 점수 before/after 델타 · 아키텍처 stack 10개 badge

📌 동일 엔진의 ~80% 코드 경로 공유. 차이는 narration + foreground UI + Phoenix UI side-by-side 녹화 여부.

§3의사결정 히스토리

매 결정의 출처(누가) 및 근거를 함께 기록합니다. 사용자(app.2weeks@gmail.com)가 최종 결정자, 5-전문가 패널(Porter/Christensen/Godin/Doumont/Drucker)이 자문, Claude Code가 실행.

2026-05-13

Repo scaffold · 컨셉 록업

Two-Weeks-Team/glasshat 신규 repo, Apache-2.0 · fairthon 컨셉 계보 README 공개 · 양쪽 해커톤 단일 제출 결정

2026-05-13/14

GCP 풀 셋업 + Gemini 3-tier 라이브 검증

panelyst-hackathon 프로젝트(916178791322), SA panelyst-dev, 13 APIs. 6 모델 × 7 location 라이브 측정. gemini-3.1-pro-preview / 3-flash-preview / 3.1-flash-lite GA 모두 global 엔드포인트에서 작동 확인. 자세히는 docs/gcp-setup.md

2026-05-14 (오전)

사용자 지시: "수상 가능성과 양쪽 룰과 양쪽 수상에 최대치까지 계획을 완성"

Phase 1 코드 진입 보류 결정. 7차원 max-wins-plan + multi-expert 분석 + 라이브 웹 리서치 실행.

2026-05-14 (오후)

Max-Wins Plan v1 완성 (docs/max-wins-plan.md · 599 lines)

5-전문가 패널 권고: (1) Qdrant primary, Arize repackaging (2) Glasshat 리네이밍 권고 (3) 3D graph stretch → must-build (4) KO i18n 컷 (5) audit-the-auditor 모먼트가 양쪽 wow factor. 사용자 모든 권고 승인.

2026-05-14 (저녁)

Wow Moment Design + Gemini 3 코퍼스 결정

5단계 wow 모먼트 기술 분해 (docs/wow-moment-design.md). 사용자가 gemini3.devpost.com/project-gallery 4,499 제출작 활용 제안 → 524 stratified (24+ winners + 500 random) 결정.

2026-05-14 (밤)

Technical Apex Pass — 47 features 명시적 결정

사용자 질문 "기술적 정점 모두 활용했나?" → 33 APPLY + 9 STRETCH + 5 CUT 결정 매트릭스 (docs/technical-apex-features.md). 데모 스크립트도 7-8 wow/kick beat으로 densify. P(Qdrant) 28-35% → 35-42%, P(Arize) 50-55% → 58-65%.

2026-05-14/15 (자정~새벽)

7-Spike 기술 검증 — 모두 PASS

A (Phoenix MCP smoke) · B (ADK LoopAgent) · C (ADK+MCPToolset wiring) · D (calibration policy 35.4% MAE↓) · E (SSE 802ms) · F (Phoenix Online Eval OSS) · G (Annotation R/W). 결과: docs/spike-results.md. P(Qdrant) → 38-45%, P(Arize) → 62-68%.

2026-05-15 (오전)

Glasshat 리네이밍 마이그레이션

이름 충돌 체크: GitHub Two-Weeks-Team/glasshat 비어있음 ✓, SaaSWorthy SEO 도구 존재 (다른 카테고리, 사용자 진행 결정). docs/.env/.env.example/README 모두 업데이트, branch chore/rename-glasshat. gh repo rename + 폴더 rename 대기 중.

2026-05-15 (현재)

신규 팀원 온보딩 자료 생성 (본 문서)

계획·검증·결정 모두 정리. Phase 1 빌드 진입 준비됨.

3.1 주요 결정 12건 (locked)

#	결정	출처
1	Qdrant VSD = primary 제출. Arize = 7일 repackaging.	Panel + User 2026-05-14
2	Panelyst → Glasshat 리네이밍	Panel + User 2026-05-14
3	3D evaluation graph = must-build (날짜 게이트 없음)	User 2026-05-14
4	Audit-the-auditor 모먼트 = 양쪽 wow factor	Panel 2026-05-14
5	KO i18n 컷 · fairthon 히스토리 컷 · 17 BMAD 풀 표시 컷 (데모만, 빌드는 유지)	Panel + User
6	Phoenix MCP runtime consultation = Arize 모트 기능	Panel + 룰 분석
7	Hosting = Cloud Run (frontend + backend 단일 런타임)	Resource allocation
8	past_evals 시드 = fairthon 히스토리 NO, Gemini 3 hackathon 524 stratified만	User 2026-05-14
9	Agent runtime = Google ADK on Cloud Run (Agent Builder는 등록만)	Spike+research 2026-05-14
10	524 corpus = 24+ winners + 500 stratified random	User 2026-05-14
11	메타-narrative = Qdrant 데모에만 명시, Arize는 Phoenix experiment 관점만	User 2026-05-14
12	모든 데모 영상: on-screen 영어 캡션, no live voice-over	Doumont 권고

§4기술 아키텍처

4.1 스택 한눈에

레이어	선택	비고
Intelligence	Gemini 3 on Vertex AI 3-tier	3.1 Pro (Blue/Black, thinking=high) · 3 Flash (White/Red/Yellow, thinking=medium) · 3.1 Flash-Lite (Code Grader/corpus seed Batch, thinking=minimal)
Agent runtime	Google ADK on Cloud Run	LoopAgent + ParallelAgent + Custom BaseAgent + MCPToolset(Stdio) + 6 callback types + session state Firestore + native streaming. Spike C로 검증됨.
Agent Builder 등록	Vertex AI Agent Builder	Blue planner 등록 → Rapid Agent partner-track 컴플라이언스. 실행 자체는 ADK가 담당.
Partner MCP	Arize Phoenix (Cloud free tier / self-host fallback)	27 tools via `npx @arizeai/phoenix-mcp@latest`. OpenInference 자동 instrument. Online Evals + Custom Evaluators + Annotations + Datasets + Experiments. Spike A/F/G 검증.
Vector DB	Qdrant Cloud (local docker-compose for dev)	6 컬렉션 hybrid dense+sparse RRF 융합 + Recommendation API (anti-pattern 검색) + group-by 집계 + Scalar Quantization on past_evals.
Document of record	Firestore Native	Run records, technique registry, users, audit trail (서명).
Compute	Cloud Run + Cloud Run Jobs	pipeline-orchestrator service + code-grader-job
Ingest pipeline	Cloud Storage + Eventarc	GCS 업로드 → Eventarc 트리거 → ingest-svc
Web search	Vertex AI Grounding with Google Search	White hat 인용. URL snapshot UI 가시화.
Auth	Firebase Authentication	Google sign-in. (Phase 2부터)
Frontend	Next.js on Cloud Run	드롭존 + 플랜 카드 + 모니터 + 리포트 + 3D 그래프 (react-three-fiber) + 벡터 검색 페이지 + 영어 SSE + 800ms-paced wow beats. KO 컷.

4.2 BMAD Rubric — 17 항목 / 100 점

A. Problem & Vision (25) — A1 clarity 8 · A2 target users 5 · A3 differentiation 7 · A4 market impact 5
B. Tech & Architecture (25) — B1 stack fit 7 · B2 system design 6 · B3 scalability 6 · B4 feasibility 6
C. Implementation & Code (30) — C1-C5
D. Documentation & Presentation (20) — D1-D4

버전 관리된 외부 config (packages/rubric/bmad-rubric.yaml). LLM이 즉흥 채점하지 않음.

BMAD Rubric 분포

4.3 에이전트 토폴로지 (audit loop 포함)

topologyGlasshatRootAgent  // CustomAgent — orchestrator
├─ IngestAgent  // PDF 파서 + 레포 클로너
├─ BluePlannerAgent  // LlmAgent · Gemini 3.1 Pro · thinking=high
├─ HatsPanel  // ParallelAgent (6 hats 병렬)
│  ├─ WhiteAgent    // + Qdrant hybrid + Vertex Grounding
│  ├─ RedAgent
│  ├─ YellowAgent   // + Qdrant hybrid
│  ├─ BlackAgent    // + Qdrant hybrid · must_cite_precedent
│  ├─ GreenAgent
│  └─ BlueSynthesisAgent
├─ AuditLoop  // LoopAgent · max_iterations=2
│  ├─ InconsistencyDetectorAgent  // 3-redundant 감지 union
│  │   ├─ Path 1: Phoenix Online Eval annotation
│  │   ├─ Path 2: Phoenix Custom Python Evaluator
│  │   └─ Path 3: Black hat counter-claim
│  ├─ PhoenixConsultantAgent  // + Phoenix MCPToolset + Qdrant Recommend
│  │   ├─ get-experiment-by-id // 드리프트 집계
│  │   ├─ get-span-annotations // 증빙 체인
│  │   ├─ get-dataset-examples // 앵커 프로필
│  │   └─ qdrant.recommend(positive, negative, "average_vector")
│  └─ ScoreCalibrationAgent  // clip(pred - 0.8*delta, p25, p75)
├─ BMADScorerAgent  // + group_by + context cache
└─ ReportAssemblerAgent  // Firestore audit + SSE 800ms-paced

전체 토폴로지·시퀀스·페이즈별 배포 매트릭스는 docs/architecture.md 및 docs/wow-moment-design.md §4 참조.

§5Audit-the-Auditor 모먼트

전체 전략의 린치핀입니다. 양쪽 데모의 wow factor가 같은 엔지니어링을 다른 narration으로 렌더링합니다.

📌 한 문장 "Black hat이 Yellow의 A1 점수를 evidence_depth와 불일치하다고 플래그 → Blue planner가 Phoenix MCP에 과거 드리프트 조회 → Qdrant Recommendation API로 anti-pattern 앵커 검색 → Yellow의 A1 점수가 9.0 → 7.6으로 화면에서 자가-교정, 3D 그래프 재구성"

5.1 5단계 분해 + 타당성 (모두 spike-검증)

단계	책임 에이전트	구현	타당성
1. Detection	InconsistencyDetectorAgent	3-redundant: Phoenix Online Eval LLM-as-judge / Phoenix Custom Python rule / Black hat counter-claim	✓ Spike F (5/5 accuracy)
2. Phoenix consultation	PhoenixConsultantAgent	`get-experiment-by-id` (drift) + `get-span-annotations` (proof) + `get-dataset-examples` (anchors)	✓ Spike A (27 tools, 6-27ms)
3. Qdrant anchor retrieval	(같은 에이전트)	`recommend(positive, negative, "average_vector")`	✓ 표준 vector search
4. Score correction	ScoreCalibrationAgent	`new = clip(pred - 0.8 × delta, p25, p75)`, ±2.0 cap	✓ Spike D (35.4% MAE↓)
5. UI animation	Frontend SSE consumer	6 events · 800ms gaps · r3f node migration	✓ Spike E (802ms ±20)

5.2 MCP 호출 chain (consultation 모먼트, 4 parallel calls, <800ms)

mcp · python# Phoenix MCP via ADK MCPToolset (Stdio transport)
from google.adk.tools.mcp_tool import MCPToolset, StdioConnectionParams
from mcp import StdioServerParameters

phoenix_mcp = MCPToolset(
    connection_params=StdioConnectionParams(
        server_params=StdioServerParameters(
            command="npx",
            args=["-y", "@arizeai/phoenix-mcp@latest",
                  "--baseUrl", os.environ["PHOENIX_BASE_URL"],
                  "--apiKey", os.environ["PHOENIX_API_KEY"]],
        ),
        timeout=30.0,
    ),
)

# Consultation 모먼트의 4 parallel calls (Spike A 측정: 17-27ms each)
results = await asyncio.gather(
    phoenix_mcp.call("get-experiment-by-id",
        experiment_id="glasshat-calibration-v1",
        filters={"hat": "yellow", "criterion": "A1",
                 "evidence_depth_bucket": "<0.4"}),
    phoenix_mcp.call("get-span-annotations",
        span_ids=[yellow_a1_span_id], project_identifier="glasshat-prod"),
    phoenix_mcp.call("get-dataset-examples",
        dataset="calibration_corpus_v1", limit=3,
        filter={"hat": "yellow", "evidence_depth_bucket": "<0.4"}),
    qdrant_client.recommend(
        collection="past_evals",
        positive=anchor_overconfident_ids,
        negative=anchor_accurate_ids,
        strategy="average_vector",
        limit=3),
)

5.3 Calibration 공식 (Spike D 검증)

pythondef calibrate(predicted: float, phoenix_findings: dict, anchors: list) -> float:
    mean_delta = phoenix_findings["mean_delta"]    # e.g. 1.2 (over-confident bias)
    raw = predicted - 0.8 * mean_delta            # conservative pull
    delta = raw - predicted
    delta = max(-2.0, min(2.0, delta))           # ±2.0 absolute cap
    candidate = predicted + delta
    p25, p75 = anchors["p25"], anchors["p75"]
    return max(p25, min(p75, candidate))      # clip to anchor band

# Spike D synthetic test (50 train, 50 holdout):
#   Uncalibrated MAE: 0.517
#   Calibrated MAE:   0.334  → 35.4% improvement
#   Yellow A1 low-evidence bucket: 1.476 → 0.505 (66% reduction in the target bucket)

5.4 사전-시드 의존성 (★ 데모 시연 차단요인)

⚠ 가장 큰 운영 리스크 Phoenix corpus에 과거 데이터가 없으면 consultation 단계가 빈손 → 모든 detection은 fired되지만 정량적 drift 데이터가 없어 데모가 평탄해짐. 따라서 Phase 1.12 (524 Gemini 3 코퍼스 스크레이핑) + Phase 1.13 (Phoenix Experiment 시드) 반드시 데모 녹화 전 완료 필요. 자세히는 docs/wow-moment-design.md §6.

§633 APPLY · 9 STRETCH · 5 CUT — Apex Pass 결정

Qdrant / Phoenix / Gemini 3 / Google ADK 각 스택의 고급 기능 47개를 명시적으로 결정. 자세히는 docs/technical-apex-features.md.

Feature 결정 분포

6.1 APPLY 33개 (v1 필수)

Qdrant (6)

1.1 dense+sparse hybrid + RRF fusion
1.2 weighted RRF (v1.17+)
1.3 Recommendation API ★ (audit anti-pattern)
1.5 group-by/aggregations (anchor 1-call)
1.6 payload index high-cardinality fields
1.7 Scalar Quantization on past_evals

Phoenix (9)

2.1 OpenInference auto-instrumentation
2.2 semantic conventions for custom spans
2.3 Online Evals Task ★★
2.4 Custom Python Evaluators
2.5 Datasets API
2.6 Experiments + MCP get-experiment-by-id ★
2.7 Annotations (LLM/CODE/HUMAN) — human override 폐쇄 루프
2.9 built-in templates (groundedness, hallucination)
2.11 self-hosted fallback

Gemini 3 (8)

3.1 thinking_level per agent tier
3.2 thinking tokens visible UI ★
3.3 context caching (90% 비용 절감, 4096+ tokens)
3.4 responseSchema strict on all hat outputs
3.6 Batch prediction (524 corpus seed)
3.8 Vertex Grounding with Google Search
3.9 citation snapshot UI
3.10 responseMimeType strict JSON

ADK (8)

4.1 LoopAgent (AuditLoop)
4.2 ParallelAgent (HatsPanel)
4.3 Custom BaseAgent (orchestrator)
4.4 MCPToolset (Stdio) for Phoenix MCP
4.5 before/after tool callbacks
4.6 before/after model callbacks
4.8 session state with Firestore
4.9 native streaming (SSE 800ms)

6.2 STRETCH 9개 (Phase 1/2 일정 허용 시)

Qdrant: 1.4 Discovery API · 1.8 ColBERT multi-vector · 1.10 cross-collection lookup
Phoenix: 2.8 Prompt versioning · 2.10 Agent trajectory evaluator
Gemini: 3.5 multi-modal single-call ingestion · 3.7 Code execution tool (메모리 잘 남음)
ADK: 4.7 before/after agent callbacks · 4.12 Plugins for guardrails

6.3 CUT 5개 (v1 명시적 제외)

Qdrant: 1.9 DBSF (RRF 충분) · 1.11 HA/snapshots/sharding (운영 우려, 심사 무관)
Gemini: 3.11 long-context stuffing (anti-pattern)
ADK: 4.10 ADK Eval framework (Phoenix evals와 중복) · 4.11 A2A protocol (May 2026 immature)

§77-Spike 검증 결과 (모두 PASS)

Phase 1 코드 작성 전, 선택된 기술 스택이 실제로 작동하는지 검증한 7개 spike test. 자세히는 docs/spike-results.md, 스크립트는 spikes/0[1-7]_*.py.

#	Spike	결과	핵심 metric
A	Phoenix MCP smoke test	✓ PASS	27 tools 검색 / list-projects 27ms / get-spans 6ms
B	ADK LoopAgent + escalation	✓ PASS	Convergence 2 iter + max_iter cap 작동, state_delta 영구화
C	ADK + Phoenix MCPToolset wiring	✓ PASS	LlmAgent 1 run = 7 Phoenix spans (MCP tool call 자동 캡처)
D	Calibration policy on toy data	✓ PASS	35.4% MAE↓ held-out, Yellow A1 bucket 66%↓, 0 catastrophic over-correction
E	SSE animation latency	✓ PASS	802ms 평균 간격 (목표 800±100), 16ms 최대 전달 지연
F	Phoenix Online Eval OSS	✓ PASS	5/5 accuracy, eval+5 writes in 30ms, MCP read 1.72s
G	Phoenix Annotation R/W	✓ PASS	Write 12ms · SDK read 10ms · MCP read 2.1s · full fidelity

7.1 결정적 발견

ADK MCPToolset 정확한 와이어링 발견 (문서엔 미명시): MCPToolset(connection_params=StdioConnectionParams(server_params=mcp.StdioServerParameters(...))) — 두 단계 wrap이 정답
Spike C에서 7 Phoenix spans 자동 캡처 — 데모에서 "에이전트가 자기 reasoning을 본다" narrative가 OpenInference 자동 instrumentation으로 추가 코드 없이 시각화
Calibration 수학 실제 작동 — Yellow A1 bias +1.4 합성 → mean_delta 1.453 정확히 회수
Event는 EventActions() 필수 (None 거부됨)
Annotations은 dict 반환 (SDK signature와 다름) → helper로 안전 접근
MCP cold-start 1.7-2.1s (npx spin-up). 한 세션 재사용 시 sub-100ms — AuditLoop가 같은 toolset 세션 유지

7.2 환경 + 비용

Python

3.12.4

uv venv at spikes/.venv/

Node

v24

npx phoenix-mcp@latest

Phoenix

15.9.0

in-process (no Docker)

총 비용

$0.0001

Spike C Vertex Flash-Lite 1회

§8524 Gemini 3 코퍼스 전략

gemini3.devpost.com의 4,499개 공개 제출작에서 stratified 샘플 524개를 추출, Glasshat의 calibration corpus로 사용.

Total submissions

4,499

참가자 35,580명

Sampled (stratified)

524

24 winners + 500 random

Prize pool

$100K

Grand $50K + 2nd $20K + 3rd $10K + 10 honorable $2K

예상 시드 비용

$20-50

$500 GCP credit 내

8.1 왜 이 코퍼스인가

30-90× 더 큰 통계 power (기존 "50-150 random Devpost projects" 대비)
모든 프로젝트가 Gemini 3 스택 — Glasshat 자체와 in-distribution
2025-Dec/2026-Feb 최근 데이터 — calibration 신선
required 자산: Devpost 설명 + public GitHub repo + 데모영상 — Glasshat이 평가하는 형식과 정확히 동일
공개된 심사 가중치 — Tech 40% · Innovation 30% · Impact 20% · Presentation 10% → Glasshat BMAD 4축에 거의 1:1 매핑
24+ winners ribbon으로 ground-truth 라벨 확보

8.2 스크레이핑 파이프라인 (Phase 1.12)

robots.txt + 1 RPS throttle 확인
188 페이지 × 24 = 4,499 인덱스 페치 → seed/gemini3-index.jsonl
Stratified sampling: 24+ winners + 500 random non-winners → seed/gemini3-sample-524.jsonl
524개 디테일 페이지 페치 (1 RPS, ~9분)
GitHub repo URL 있는 것만 필터 (예상 70-90%)
각 repo shallow clone + README + 정적 휴리스틱
Glasshat 파이프라인 실행 (Flash-Lite, batch prediction, ~6-12시간)
OpenInference 자동 → Phoenix Cloud glasshat-prod
Phoenix Experiment glasshat-calibration-v1 집계
Held-out 검증: calibrated MAE ≤ 0.85 × uncalibrated MAE

8.3 컴플라이언스

✓ 양쪽 해커톤 룰 컴플라이언트 공개 데이터를 calibration corpus로 소비하는 것은 "modification or extension of existing work"가 아닙니다. Glasshat 자체 코드는 100% 신규. README §7.2에 명시적으로 disclosure: "Glasshat's calibration corpus is seeded from 524 public submissions to the Gemini 3 Hackathon ... used as evaluation calibration data only. No code or content from those submissions is reused in Glasshat."

8.4 데모 메타-narrative (Qdrant 데모만)

🎬 Qdrant 데모 2:30-2:50 캡션 "Glasshat evaluated 524 of the Gemini 3 Hackathon's 4,499 submissions — including all 24+ winners — to calibrate its own bias before scoring yours."

이 한 줄이 추상적인 "meta-evaluation" 주장을 구체적인 provenance 숫자로 변환. Arize 데모는 이를 사용하지 않고 Phoenix Experiment 데이터셋 관점으로만 framing (user decision 2026-05-14).

§9데모 스크립트 — 양쪽 3:00

두 데모 모두 ≤3분 / 영어 캡션 / no voice-over / 800ms-paced wow beats. 자세히는 docs/max-wins-plan.md §5.

9.1 Qdrant VSD 데모 — 7 wow/kick beats

시간	유형	내용
0:00-0:10	WOW#0 hook	"Who audits the AI evaluator?" 텍스트 페이드인
0:10-0:30	KICK#1	Cost dashboard 시작, hybrid 검색 (dense+sparse) 시각화, Blue planner thinking 표시
0:30-1:00	KICK#2	Blue planner thinking trace + 6 hats parallel + White hat citation URL with favicon
1:00-1:30	★★ WOW#1A	Phoenix Online Eval auto-fire on Yellow A1, Black hat counter-claim parallel (backup beat)
1:30-1:45	★★ WOW#1B	Phoenix MCP consultation + Qdrant Recommendation API, Yellow score 9.0 → 7.6 animation, contradicting chunks highlight
1:45-2:30	★ WOW#2	3D evaluation graph 524-anchor constellation 회전 (2D radar fallback 준비됨)
2:30-2:50	KICK#3	group_by anchor comparison + 524-corpus 메타-narrative caption + signed audit trail
2:50-3:00	close	Logo + "The panel that audits itself."

9.2 Rapid Agent / Arize 데모 — 8 wow/kick beats

시간	유형	내용
0:00-0:15	WOW#0 hook	"Most agents claim self-improvement. Watch ours mid-correction." + 10 stack badges fade
0:15-0:45	KICK#1	Full stack visible, Phoenix UI split view, 6 ADK ParallelAgent cards with model name + thinking_level
0:45-1:00	KICK#2	Blue thinking trace + structured plan JSON (responseSchema), Phoenix spans with OpenInference attributes
1:00-1:30	★★ WOW#1A	Phoenix Online Eval auto-fire (with Custom Python Evaluator backup parallel)
1:30-2:00	★★ WOW#1B	4 MCP calls visible in trace tree, score delta caption "Pre-Phoenix: 9.0 / Post-Phoenix: 7.6 / Δ=-1.4"
2:00-2:30	KICK#3	Self-improvement loop completes, Phoenix Annotation written (human-in-loop closes), trace tree 전체 보임
2:30-2:50	KICK#4	Score delta panel + cost dashboard 최종 + 10-stack architecture card slide
2:50-3:00	close	"Glasshat. An agent that reads its own mistakes." + LICENSE-in-About 1-frame proof (Stage 1 pass/fail)

🎬 공통 production rules ≤3:00 (Rapid Agent: 3:00 초과분 무시), 영어 캡션 burn-in, YouTube/Vimeo upload (Rapid Agent), Phoenix UI live during recording 권장, demo 녹화 1주일 전 5명 비-팀원 user-test (≥4/5가 "AI가 자기 catch 후 fix" 30초 내 재진술 가능 시 통과)

§10리스크 + Mitigation

#	리스크	심각도 × 가능성	Mitigation
1	Rapid Agent 경쟁 제약 — 심사관이 Qdrant를 Arize 경쟁자로 해석	High × Low	README §7.2에 명시적 카테고리 구분, 데모 narration도 "Phoenix vector search" 표현 회피
2	듀얼 제출 분산 → 양쪽 mediocre	High × High	Lock: Qdrant 제출 전 zero Arize-only 코드. Arize는 narration + Phoenix UI side-by-side + README 확장
3	3D 그래프 시각적 under-deliver	High × Medium	2D radar 동일 데이터 보장. 3D는 cluster-reveal-on-rotation을 discriminator로. 시각 검증 후 죽이기 (사용자 "날짜 무시" 지시)
4	Audit 모먼트가 3분 데모에 read 못함	Critical × Medium	5명 비-팀원 user-test ≥4/5 통과 acceptance. 미통과 시 on-screen caption 추가, 페이싱 5-8초 slow
5	Cloud Run cold-start mid-demo	High × Medium	녹화 전 pre-warm, 로컬 cached demo as YouTube upload (라이브는 backup), "Try a sample" pre-cached
6	Phoenix Cloud outage at judging	High × Low	Self-hosted Phoenix in Cloud Run as failover (`MONITOR_BACKEND=phoenix-local`). Phoenix UI 스크린샷도 backup
7	Gemini 3.1 Pro preview API 변경	Medium × Medium	LLM 어댑터가 `gemini-2.5-pro` 자동 폴백 (`docs/gcp-setup.md` 측정)
8	Stage 1 pass/fail rejection (Rapid Agent)	Critical × Low	Hosted URL "Demo run" 버튼 + Phoenix UI 링크 + Cloud Run URL + GitHub About sidebar 모두 README §7에서 스크린샷
9	Glasshat 이름 충돌 (이미 확인됨: SaaSWorthy SEO 도구)	Medium × Low	다른 카테고리. 해커톤 제출엔 영향 0. 상업화 시 trademark 재검토.
10	2026 Qdrant Best-in-Category 스폰서 미공개	Low × Medium	주간 모니터링. 매칭 스폰서 발견 시 secondary narrative angle 추가
11	★ Phoenix corpus 사전 시드 누락 (실패 모드 #1)	Critical × Medium	Phase 1.12+1.13가 데모 녹화 전 완료 게이트. Bucket 누락 시 Path 2 (deterministic) 항상 가능하도록 prepared demo deck 설계 (evidence_depth ≈ 0.31 fixed)
12	스크레이핑 ToS 위반 (Devpost)	Medium × Medium	robots.txt 확인, 1 RPS throttle, descriptive User-Agent. 차단 시 다음 날 재시도

§11정직한 수상 확률 추정

5-전문가 패널 + 룰 분석 + 과거 우승작 패턴 + spike 검증 기반의 정직한 추정. Marketing 표현 금지.

P(top-3) 추이 — 단계별 누적

단계	P(Qdrant top-3)	P(Arize top-3)	주요 driver
Initial (계획 전)	~13%	~33%	일반 베이스라인. 듀얼 제출 전략 미정립.
Post Max-Wins Plan	28-35%	50-55%	Qdrant primary 전략, 3D must-build, audit-the-auditor wow 정의, KO 컷.
Post Apex Pass	35-42%	58-65%	Qdrant Recommendation API + Phoenix Online Evals + Annotations + 3-redundant detection.
Post 7-Spike validation	38-45%	62-68%	모든 architectural risk 거의 0. 실행 risk만 남음.

11.1 더 끌어올릴 단일 가장 큰 레버

💡 최고 ROI 행동 Audit-the-auditor 모먼트의 데모 품질 — 기능적 안정성 + 30초 가독성 + 연극적 timing. 다른 모든 기능을 컷할 수 있어도 이것은 컷할 수 없습니다. 5명 비-팀원 user-test로 측정 + 반복.

11.2 의식적으로 인정하는 한계

top-3 ≠ 1st-place. 위 확률은 "심사관 top-3 안에 들 가능성"이며, 1등 자체 확률은 더 낮음.
Combined expected value (1st-place equiv): pre-plan ≈ $1.7K, post-plan ≈ $4.2K — order-of-magnitude reasoning이지 financial forecast 아님.
2026 Qdrant Best-in-Category 스폰서 미공개 → upside potential 미포함.
심사관의 주관적 lineage 평가(fairthon)는 확률에 반영 안 됨.

§12Phase 1 빌드 진입점

모든 계획 + 검증 완료. Phase 1 (로컬 E2E) 코드 작성 진입 준비됨. 우선순위 + 의존성 명시:

#	작업	의존성	예상 LoC
1.1	Python 프로젝트 셋업 (pyproject.toml, ruff/mypy/pytest, services/ 구조)	—	~200
1.2	LLM 어댑터 (Vertex global/regional 라우팅 + 3-tier + thinking_level + context cache + responseSchema + Phoenix span 자동)	1.1	~400
1.3	Qdrant 6 컬렉션 + local docker-compose (hybrid + payload index + Quantization)	—	~300
1.4	Phoenix 통합 (auto-instrument + Online Eval Task script + Custom Evaluator + Annotations endpoint + self-host fallback)	1.2	~250
1.5	PDF ingest (Cloud Run service, Gemini multimodal)	1.2, 1.3	~300
1.6	Code Grader Job (clone + 15-20 휴리스틱 + 샘플 코드)	1.2, 1.3	~400
1.7	BMAD rubric YAML + 75-technique registry	—	~600 (config)
1.8	6 Hats system prompts	1.7	~300 (prompts)
1.9	Pipeline orchestrator (ADK) — GlasshatRootAgent + HatsPanel + AuditLoop + BMADScorer + ReportAssembler	1.2-1.8	~800
1.10	Next.js 프론트 (드롭존+플랜+모니터+리포트+3D+search+thinking panel+cost dashboard)	1.9	~1,500
1.11	`make demo` · `make doctor`	1.9, 1.10	~100
1.12	Gemini 3 코퍼스 스크레이핑 (524 stratified)	1.5 (재사용 가능 시)	~400
1.13	Phoenix calibration experiment 시드 + validation	1.9, 1.12	~200

총 예상 ~6,000 lines. 사용자 "시간 무시" 지시이므로 캘린더 제약은 적용 안 함 — 단 Qdrant 6/1 마감과 Rapid Agent 6/11 마감은 자연스럽게 ordering 결정.

12.1 Phase 1 완료 기준

✓ Phase 1 done = cp .env.example .env && docker compose up && make demo → 90초 내 점수 리포트 + 3D 그래프 + 라이브 모니터 트레이스 + audit-the-auditor 모먼트 시연 가능.

§13신규 팀원 첫날 체크리스트

이 보고서를 받고 처음 90분에 해야 할 액션. 한 항목씩 따라가면 됩니다.

13.1 환경 (15분)

본 보고서 끝까지 읽기 (15분)
다음 파일도 읽기:
- README.md — 한 페이지 컨셉 요약
- docs/max-wins-plan.md §0 (one paragraph) + §12 (decisions) — 5분
- docs/wow-moment-design.md §1 + §4 (agent topology) — 5분

13.2 로컬 환경 확인 (15분)

bash# Working directory (Glasshat 리네이밍 진행 중 — 폴더는 아직 panelyst)
cd ~/Documents/GitHub/glasshat

# 1) 환경 도구 확인
python3 --version  # 3.12.x 권장
node --version     # v24+
uv --version       # https://docs.astral.sh/uv/

# 2) .env 파일 확인 (이미 .env.backup-pre-rename 백업 존재)
grep -c "GLASSHAT_" .env  # 14 (post-rename)

# 3) GCP credentials 확인
ls -l ~/.config/gcloud/panelyst-dev-sa-key.json  # mode 600
export GOOGLE_APPLICATION_CREDENTIALS=~/.config/gcloud/panelyst-dev-sa-key.json
export GOOGLE_CLOUD_PROJECT=panelyst-hackathon  # GCP 프로젝트 ID는 KEEP

13.3 Spike test 재현 (30분)

bashcd spikes/
uv sync                       # 약 60-90초

# 가장 짧은 spike부터:
uv run python 02_spike_b_adk_loop.py        # ~5초, no API calls
uv run python 04_spike_d_calibration_toy.py # ~1초, no API calls
uv run python 05_spike_e_sse_animation.py   # ~5초
uv run python 01_spike_a_phoenix_mcp_smoke.py    # ~30초 (Phoenix in-process + npx)
uv run python 07_spike_g_phoenix_annotations.py  # ~30초
uv run python 06_spike_f_phoenix_online_evals.py # ~30초
uv run python 03_spike_c_adk_mcptoolset.py       # ~60초 (~$0.0001 Vertex call)

# 모든 결과: spikes/results/spike_*.json (overall_pass=true 확인)

13.4 핵심 의사결정 이해 확인 (15분)

다음 질문에 즉답할 수 있어야 합니다:

Glasshat의 wow 모먼트를 30초 안에 설명할 수 있는가?
왜 Qdrant primary이고 Arize secondary인가?
왜 ADK인가, Agent Builder가 아니라?
3-redundant detection의 세 경로는 무엇인가?
524 코퍼스는 어디서 오는가, 왜 stratified인가?
Phoenix MCP 4-call chain은 어떤 도구들을 호출하는가?
Rapid Agent Tech Implementation이 왜 타이브레이커 1번인가?
Gemini 3 thinking_level이 hat tier마다 어떻게 설정되었는가?

답 못 하는 항목이 있으면 해당 문서 다시 읽기 (위 13.1 참조).

13.5 의문 + 의견 (15분)

다음 영역에서 자유롭게 push back / 추가 아이디어 제시:

Apex Pass 9 STRETCH 기능 중 Phase 1에 promotion할 만한 것?
Wow 모먼트의 추가 backup beat 제안?
Qdrant 데모의 0:30-1:00 슬롯이 충분히 dense한가?
524 corpus 외에 추가 calibration 데이터 소스?
3D 그래프의 r3f 디자인 제안?
리스크 #4 (audit 모먼트 가독성)에 대한 추가 mitigation?

13.6 첫 작업 (옵션)

다음 중 하나를 선택해서 Phase 1 진입 — 모두 Phase 1.X의 자기 완결 작업:

A. 1.7+1.8 (콘텐츠) BMAD rubric YAML + 6 hats system prompts. 코드 의존성 없음. 1일.

B. 1.3 (Qdrant) Local docker-compose + 6 컬렉션 스키마 + hybrid 검증 스크립트. ~1-2일.

C. 1.12 (Corpus 스크레이핑) Gemini 3 hackathon 524 stratified scrape. 파이프라인 코드 없이도 진행 가능. ~1-2일.

D. 1.10 partial (Frontend scaffold) Next.js 스캐폴드 + 드롭존 + SSE consumer 골격. 백엔드 mock으로 동작. ~2일.

§14참고 파일 인덱스

14.1 권위 문서 (반드시 읽기)

파일	역할	크기
`README.md`	1페이지 컨셉 요약 + 양쪽 데모 narration + compliance disclosure	~9KB
`docs/max-wins-plan.md`	듀얼 제출 winning thesis + 12개 lock된 결정 + 데모 스크립트	780 lines · 68KB
`docs/wow-moment-design.md`	Audit-the-auditor 5단계 분해 + 토폴로지 + 사전-시드 의존성 + 12 fallback	515 lines · 38KB
`docs/technical-apex-features.md`	47 features 결정 매트릭스 (33 APPLY + 9 STRETCH + 5 CUT) + 심사축 커버리지 맵	225 lines · 27KB
`docs/spike-results.md`	7-spike 결과 + 발견 + Phase 1 진입 권고	251 lines · 17KB
`docs/architecture.md`	토폴로지 + 에이전트 그래프 + 시퀀스 + 페이즈별 배포 + 인터페이스 추상화	~12KB
`docs/gcp-setup.md`	GCP 부트스트랩 + Gemini 3 패널 측정 + 5 gotchas	~8KB
`PLAN.md`	엔지니어링 인벤토리 (umbrella mirror); §1 ADDENDUM이 max-wins-plan.md를 가리킴	~32KB
`HANDOFF.md`	2026-05-14 세션 핸드오프 (folder rename 후 갱신 예정)	~7.6KB

14.2 Spike 스크립트 (실행 + 재현 가능)

spikes/01_spike_a_phoenix_mcp_smoke.py — Phoenix in-process + MCP discovery
spikes/02_spike_b_adk_loop.py — LoopAgent + escalation
spikes/03_spike_c_adk_mcptoolset.py — ADK + Phoenix MCPToolset
spikes/04_spike_d_calibration_toy.py — Calibration policy
spikes/05_spike_e_sse_animation.py — SSE timing
spikes/06_spike_f_phoenix_online_evals.py — Online Eval OSS
spikes/07_spike_g_phoenix_annotations.py — Annotation R/W
spikes/README.md — spike 인덱스
spikes/results/*.json — 실행 결과 (모두 overall_pass=true)

14.3 영구 메모리 (Claude Code 세션 간 자동 로드)

경로: /Users/kimsejun/.claude/projects/-Users-kimsejun-Documents-GitHub-hackathon-submissions/memory/

MEMORY.md — 인덱스
panelyst-project.md — Glasshat (formerly Panelyst) 마스터 (post-rename 상태 반영)
glasshat-max-wins-decisions.md — 12 lock된 결정
glasshat-technical-apex.md — 33 APPLY decision 결정
glasshat-spike-validation.md — 7-spike 결과
gemini-model-panel-verified.md — Gemini 3 모델 측정 + gotchas
gcp-panelyst-hackathon.md — GCP 셋업 (KEEP "panelyst-hackathon" name)
qdrant-vsd-hackathon.md · rapid-agent-hackathon.md · qdrant-collection-design.md · fairthon-lineage.md · hackathon-pipeline-2026-may-jun.md
user-*.md · production-safety-rules.md — 사용자 작업 선호 + 안전 룰

14.4 외부 출처 (URL)

Qdrant VSD: try.qdrant.tech/hackathon-vsd · 제출 form forms.gle/YDQ2TDUi8MqS9Vx28
Rapid Agent: rapid-agent.devpost.com · 룰 /rules · Arize /details/arize-resources
Qdrant 2025 winners: qdrant.tech/blog/vector-space-hackathon-winners-2025
Phoenix MCP README: github.com/Arize-ai/phoenix
Arize Gemini hackathon starter kit: github.com/Arize-ai/gemini-hackathon
ADK LoopAgent: adk.dev/agents/workflow-agents/loop-agents
ADK Custom agents: adk.dev/agents/custom-agents
Gemini 3 hackathon (corpus source): gemini3.devpost.com

질문이 있으면 사용자(app.2weeks@gmail.com)에게 직접 또는 새 세션에서 cd ~/Documents/GitHub/glasshat && /handon으로 이어가세요.
본 보고서는 2026-05-15 KST 기준 상태이며, 이후 진행은 git log + 메모리 + handoff doc으로 추적됩니다.