For the skeptic who has seen 30 agent demos this year

We didn't believe it either.
So we published the broken one too.

Most agent-builder demos are screen recordings of a happy path that breaks the second a judge clicks the link. We expect you to assume the same about WhyC. Here is the broken mock you came to find — next to the actual Cloud Run URL the loop produced from the same job posting. Both are clickable.

What you expected to see ● broken / typical
localhost:3000/?error=ENOENT TypeError: Cannot read properties of undefined (reading 'spec') at extractCompanyFromURL (agent.ts:142:18) at async Pipeline.run (orchestrator.ts:67:5) [ hero copy hallucinated · CTA leads to 404 ]
one-shot codegen, no eval loop localhost:3000 (down)
What WhyC actually deployed ● live · spec-fit 0.94
https://preview-7af2.run.app · 200 OK Acme Synth Pricing · Docs · Login Synthetic data, in one prompt. Ship realistic test fixtures from a single English sentence. Try it free View docs POST /api/synthesize 200 · 412ms · returns 1k rows GET /api/preview 200 · 88ms · streamed
WhyC pipeline · 4 loop iterations · 23h 41m preview-7af2.run.app
Honest disclosure (auto-generated by Opus 4.7 from the trace bundle): The deployed preview above is real and clickable, but the pricing page route still scores 0.71 spec-fit (target: 0.85) — the loop is queued for one more pass tonight. Login is mocked. The synthetic JD used for this run was generated by us; no real YC company is named, screenshotted, or implied. We do not promise this works on every input — see the failures log below.

Why you'll hate this product

Seven objections we expect from a reasonable skeptic. Crossed-out objection, our honest answer, link to the Phoenix trace that proves it.
Your objection
What we actually do
Evidence
"Agent demos are cherry-picked screen recordings."
Every run publishes its full OpenInference trace. The hero card above links to a live URL, not a video. Pick a different JD and submit it yourself.
"One-shot LLM codegen never matches the brief."
Correct — that's why we don't do one-shot. Phoenix MCP scores each flow against the extracted spec; only flows below 0.85 are regenerated. Convergence is logged, not claimed.
"Spec-fit scoring is just LLM marking its own homework."
Fair. The judge prompt, rubric, and 12 disagreement cases (where a human reviewer overruled it) are public in the repo. We log false-positives, not just successes.
"24 hours is marketing — what's the actual median?"
First Cloud Run URL: median ~11 min (visible to user). Convergence to 0.85+: median ~18h, p90 ~28h. We miss the 24h headline on ~10% of runs.
"Generated UI looks generic and the copy hallucinates."
Sometimes yes. Hero copy is templated against extracted facts; if the JD is too vague, the agent refuses to invent and asks for clarification instead of bluffing.
"This is just YC-orange shitposting in a wrapper."
Tone is satirical, the artifact is not. WhyC the product never names or screenshots a YC company; the deployed previews are generic enough to use for your own startup the next morning.
"What happens when the judges' input breaks your pipeline?"
It will. We've shipped 3 known-failure modes below with reproducer URLs. If yours is novel, the trace gets added to the same log within 24h.

WhyC vs. the alternatives, honestly

Where we lose, marked clearly. No green checkmarks across the board.
Axis
WhyC
One-shot codegen
(typical agent demo)
Hand-coded MVP
(2 founders, 1 weekend)
Time to first deployed URL ~11 min instant (often 404) 12–48 hours
Spec-fit on first output 0.71 median 0.40 median 0.90+ (human)
Self-corrects without operator yes (Phoenix loop) no no — needs human
Public trace per run yes (OpenInference) no git log
Survives a novel input from judge ~90% (logged) ~40% 100% if scoped
Production-ready no — demo artifact no depends on team
Cost per preview ~$3.40 (Gemini + Run) ~$0.20 ~16 founder-hours

Where it currently fails

Public log, last 3 entries. Each is reproducible from the URL we received.
Total runs
147 since 2026-05-05
Median spec-fit (converged)
0.91
Failures publicly logged
14 / 147
License
Apache-2.0 OSI
stack: Gemini ADK · Agent Builder · Phoenix MCP · Cloud Run · Next.js Arize track · hackathon submission · no YC names or logos used