1 · Flash over Pro
Use
gemini-2.0-flash for curator + search. Reserve
gemini-2.0-pro only for trend-safety reasoning step (5% of calls).
saves ≈ $140 / month
2 · Cache > Recompute
Curator runs
nightly on Cloud Scheduler, not per-request. Per-user re-rank is cheap; embedding is expensive.
saves ≈ $90 / month
3 · Cloud Run min=1
Avoid GKE entirely (over-engineered for demo). Single Cloud Run service, scale-to-1, not zero (judge cold-start).
saves ≈ $200 vs GKE
4 · MCP tool call dedup
Hash query → 60s in-memory cache for RapidAPI fanout. Identical search hits return free.
saves ≈ $30 / month
5 · Vertex Search 1-corpus
Single grounding corpus (cross-cultural metadata) instead of 3. Cheaper, still kills hallucination.
saves ≈ $60 / month
6 · Free-tier defaults
Cloud SQL f1-micro · Logging 50GB free · CDN signed-URL caching · all default-on.
stays under credit ceiling