Click anywhere to advance the demo →

Six Thinking Hats

RubricSynthesizer · Pro · thinking_high

▍3D Evaluation Constellation · UMAP(768d Gemini emb → 3d)

503 anchors · color = outcome tier · size = evidence depth
Qdrant Live: recommend() 0 · positive 0 · negative 0
Winner Gravity: 72% similar to winner cluster — but pulled toward non-winner pattern by anchor "Aegis" (a 2025 finalist with similar evidence depth that did not win). Threshold for likely winner: 80%.
⚠ Anti-Pattern Radar 37 of 503 past Gemini 3 submissions matched this profile. Winners: 0.
Common failure: vague user definition, weak repo evidence, no working demo.
Winners (13) Honorable (50) Non-winners (440) This submission
x: rubric-final · y: tech-depth · z: evidence-depth

Phoenix Monitor (recorded trace)

glasshat-demo
⚠ Phoenix Online Eval fired
span:hat_yellow_score_a1
eval.calibration.label:over_confident
eval.calibration.score:0.31
evidence_depth_bucket:shallow
predicted_score:9.0
Why -1.4? · Score Receipt
From rubric A1 (Problem Clarity, weight 8/100):
new = clip(9.0 - 0.8×1.75, p25=6.8, p75=8.1) = 7.6
3 anchors retrieved via qdrant.recommend():
· Globot (winner) → A1=8.2
· Aegis (finalist) → A1=7.5
· Netra (winner) → A1=7.2
|Δ| 1.4 ≤ 2.0 cap · n=3 ≥ 3 ✓ · LoopAgent iter 1/2

Score Receipts · Dual-Rubric Variance · EU AI Act Art. 12 ready

Globot · Multi-Agent · 2M-token compliance analysis
Qdrant rubric
Functionality · Originality · UX
Functionality
Originality
User Experience
Rapid Agent rubric
Tech 40 · Inn 30 · Imp 20 · Pres 10 · Tech tie-break
Tech (★)
Innovation
Impact
Presentation
Δ = 14 points between the two rubrics. Rubric-faithful variance, not bias.
Rubric Sanity Layer (always-on): ① Reproducibility ✓   ② Inter-hat consistency ✓   ③ Calibration vs 13 known winners ✓   ④ Evidence depth threshold ✓