WhyC · Phoenix Convergence Dashboard

WhyC / runs / r-20260506-122526 / phoenix-convergence

live · streaming traces Phoenix MCP · OpenInference Gemini ADK · Cloud Run

Spec-fit (current)

96.2%+25.1 vs t0

LLM-as-judge · 6-flow weighted

Iterations

73 regen-only

stop rule: Δ<2pt for 2 cycles

Token spend

$4.18+$0.62 vs budget

2.41M in · 0.87M out · Gemini 2.x

Time to deploy

11m 04svs YC: 6mo

first Cloud Run URL live · loop bg

Spec-fit over time · per iteration

spec-fit % judge confidence regen events

Token spend per iteration

input output

Regenerated-flow heatmap · iteration × flow

0 1 2+

Flow-by-flow funnel · final iteration (i7)

judge: gemini-2.x · weight = traffic share

Flow	Spec-fit	Regens	Judge
/landing	98.4%	0	PASS
/pricing	96.1%	2	PASS
/onboarding	94.8%	2	PASS
/api-doc	91.2%	2	SOFT
/dashboard	97.0%	0	PASS
/auth	99.1%	1	PASS

Opus 4.7 trace narrative · 220K-token span tree summarized: Run started at spec-fit 41% — Gemini extractor missed the pricing-tier comparison that the JD's "vertical-AI for ops" framing implied. Phoenix MCP flagged a 12-point judge-confidence drop on /pricing at i2; the planner regenerated only that flow (cost: $0.58, not a full rebuild). The same pattern repeated on /onboarding (i4, judge cited missing role-selector) and /api-doc (i6, missing auth-token snippet). At i7 the stop rule triggered (Δ<2pt across two cycles). Net: 7 iterations, $4.18, 11m to first deploy, 4h12m to convergence. YC comparable: 0 deploys in 6 months — see header banner.

Run target: anonymized YC-shaped JD · "B2B vertical-AI for ops teams" → spec-fit converged in 7 iterations