Six Thinking Hats
RubricSynthesizer · Pro · thinking_high
▍3D Evaluation Constellation · UMAP(768d Gemini emb → 3d)
503 anchors · color = outcome tier · size = evidence depth
Qdrant Live: recommend() 0 · positive 0 · negative 0
Winner Gravity:
72% similar to winner cluster — but pulled toward non-winner pattern by anchor "Aegis" (a 2025 finalist with similar evidence depth that did not win). Threshold for likely winner: 80%.
⚠ Anti-Pattern Radar
37 of 503 past Gemini 3 submissions matched this profile. Winners: 0.
Common failure: vague user definition, weak repo evidence, no working demo.
Common failure: vague user definition, weak repo evidence, no working demo.
Phoenix Monitor (recorded trace)
glasshat-demo
⚠ Phoenix Online Eval fired
span:hat_yellow_score_a1
eval.calibration.label:over_confident
eval.calibration.score:0.31
evidence_depth_bucket:shallow
predicted_score:9.0
Why -1.4? · Score Receipt
From rubric A1 (Problem Clarity, weight 8/100):
new = clip(9.0 - 0.8×1.75, p25=6.8, p75=8.1) = 7.6
3 anchors retrieved via qdrant.recommend():
· Globot (winner) → A1=8.2
· Aegis (finalist) → A1=7.5
· Netra (winner) → A1=7.2
|Δ| 1.4 ≤ 2.0 cap · n=3 ≥ 3 ✓ · LoopAgent iter 1/2
Score Receipts · Dual-Rubric Variance · EU AI Act Art. 12 ready
Globot · Multi-Agent · 2M-token compliance analysis
Qdrant rubric
Functionality · Originality · UX
–
Functionality–
Originality–
User Experience–
Rapid Agent rubric
Tech 40 · Inn 30 · Imp 20 · Pres 10 · Tech tie-break
–
Tech (★)–
Innovation–
Impact–
Presentation–
Rubric Sanity Layer (always-on):
① Reproducibility ✓ ② Inter-hat consistency ✓ ③ Calibration vs 13 known winners ✓ ④ Evidence depth threshold ✓