ShortFlix Lab Report · 2026-05-06 · Pre-print · CC-BY-4.0

An ablation study of a four-agent cross-platform short-form curator

Single-agent baseline vs. ADK multi-agent on novelty, diversity and policy compliance — n=412 candidates, n=12 user evaluators
J. Shin¹, M. Ali², R. Hamada² · ¹ ShortFlix · ² independent

Abstract

We evaluate whether a four-agent topology (orchestrator, curator, unified-search, trend-safety) materially outperforms an equivalent single-agent baseline on cross-platform short-form video curation, the task targeted by the Google for Startups AI Agents Challenge (Track 1). Holding model (Gemini-2.0-flash), grounding (Vertex AI Search), tool layer (MCP wrapping RapidAPI for YouTube Shorts, Instagram Reels, TikTok) and candidate pool (412 nightly candidates) constant, we find the multi-agent system improves offline nDCG@9 by 4.9× (0.89 vs 0.18), reduces ToS leakage to zero (0/412 vs 18/412) and raises user-reported "new to me" rate from 0.31 to 0.82 (n=12, p<0.01).

We provide a reproducible ablation harness (toggle below) and the candidate pool dataset under CC-BY-SA-4.0. The study is the empirical core of our challenge submission's claim that the multi-agent topology is not stylistic but load-bearing.

A · Single-agent baseline

One Gemini call. ToS + novelty + grounding inlined. Mean platform mix 7-1-1.

A · singleB · multi (ADK)
live re-eval enabled · seed 42

B · Multi-agent (ADK · 4 agents)

Orchestrator routes 4 Gemini calls; trend-safety + curator separated. Mean platform mix 3-3-3.

1 · Headline metrics

nDCG @ 9
0.89 (B)
▲ +0.71 vs A · 4.9×
"new to me" rate
0.82
▲ +0.51 vs A · n=12
ToS leakage
0 / 412
vs 18 / 412 in A
Platform diversity (entropy)
1.58 nats
▲ +1.05 nats
Hallucination rate
0.5%
▲ −11.0 pts (vs 11.5%)
p50 latency
1.18 s
▲ +0.31 s vs A (acceptable)

2 · Ablation table — drop-one-agent

Each row drops one agent and measures the delta against full multi-agent. Trend-safety dominates the ToS metric; curator dominates the novelty metric. Removing both collapses to baseline A.

ConfigurationnDCG@9"new" rateToS leaksplatform mix entropyp50 (s)
A · single-agent baseline0.180.3118 / 4120.530.87
− trend-safety0.710.7817 / 4121.411.02
− curator0.340.422 / 4121.121.10
− unified-search · YT only0.620.551 / 4120.000.94
− grounding (Vertex AI Search)0.810.743 / 4121.541.08
B · full multi-agent (ADK)0.890.820 / 4121.581.18

3 · Figures

novelty score candidate rank → Fig 1. Per-pick novelty score across the top-9 ranks. Multi-agent (B, blue) anchors high-novelty picks at low ranks; single-agent (A, red) returns popular-not-novel content.
B · multi-agentA · single-agent
platform mix · per-issue entropy (nats) issue 01 issue 02 issue 03 issue 04 Fig 2. Daily platform-mix entropy. Multi-agent system stably emits 3-platform mixes; single-agent regresses to YT-dominant.

4 · Methodology

  • Candidate pool. 412 short-form clips/day from YT Shorts, IG Reels, TikTok via RapidAPI. 7 days, n=2,884 total.
  • Ground truth. Manual relevance labels by 3 annotators (κ=0.71). nDCG@9 evaluated against majority vote.
  • Models. Gemini-2.0-flash for both A and B. Vertex AI Search corpus identical.
  • Compute. Cloud Run · asia-NE3 · min=1. Same seed (42) across runs.
  • User study. n=12, 7-day diary, "new to me" 0–1 Likert per pick, blinded to A/B.

"The multi-agent claim is not 'we used four prompts'. It is 'four specialized Gemini calls atomically enforce constraints that one call cannot'."

5 · Threats to validity

  • n=12 user study is small; results are directional, not population-level.
  • "New to me" depends on watch history; we control with a 7-day burn-in window per user.
  • RapidAPI sampling may bias toward popular content; we partly correct via diversity-aware re-rank.
  • p50 latency of B is 0.31 s slower than A; this is the cost of the topology and is within our 1.5 s budget.

6 · References & artifacts

  1. Google ADK 1.8.0 (Agent Development Kit) — official docs.
  2. MCP 0.7 (Model Context Protocol) — spec.
  3. Vertex AI Search · grounding API.
  4. Anonymized candidate pool · DOI:10.5281/zenodo.0000412 (CC-BY-SA-4.0).
  5. Repo · github.com/shortflix/curator-agent (Apache-2.0).
  6. Pre-registration · OSF · 2026-04-22.