Why we cache every LLM call — the economics under the dossier
Operator feedback was direct: '请求返回的overview要落库,避免每次请求,造成浪费'. We audited every call_llm site, fixed the two that weren't cached, and added a refresh flag everywhere. Per-KOL LLM spend dropped to whatever the operator deliberately chose.
The principle
The audit that started it
Operator feedback came in two sentences: "请求返回的 overview 要落库,避免每次请求,造成浪费. 其他的所有 文案如果是有调用 LLM 进行转换的,都要落盘". So we grepped call_llm + generateText across the repo. Eight LLM-driven features. Six were already cached. Two weren't.
Already cached (no change)
kol_overview—DBKolOverviewSummary, 30 dayskol_background—DBKolBackgroundResearch, 30 dayskol_deep_overview—DBKolDeepOverview, 7 dayskol_activity—DBKolActivitySummary, 24 hourstranslate— writesbio_zh/display_name_zh/category_zhdirectly to author columns; only runs once per author per field- The 4 insight cards —
DBSearchInsightvia theinsight_*services
Newly cached
kol_outreach— newkol_outreach_draftstable (migration 013), 30 days, locale-gatedkol_audience_insights— newkol_audience_insightstable (migration 013), 30 days, locale-gated + snapshot-hash gate
Three call shapes per feature
Each cached LLM feature exposes the same three entry points so the calling layers (HTTP route, MCP tool, scheduler) can choose:
read_cached(...)— never calls the LLM. ReturnsNoneon miss. Used for cache-only paths like the watchlist's brand-fit star rating.get_or_generate(...)— cache-first. LLM only on miss or stale.regenerate(...)— always calls. Maps to the UI's Refresh button + the?refresh=truequery.
Locale as a first-class cache key dimension
Adding zh-CN to the dossier in May 2026 surfaced a subtle cache bug: an existing English row served the zh-CN caller until they hit Refresh. So locale moved into the cache row's JSON payload, and _read_cached now treats a locale mismatch as a miss.
Backwards-compatible: pre-feature rows have no locale field and default to "en". Existing English caches keep serving English users — only the new languages need a fresh generation pass.
Snapshot-hash gate (audience insights only)
Audience insights is the odd one out: its input is a sampled- followers snapshot the operator can re-run. We SHA-256 the snapshot dict on write; on read, if the caller supplies a snapshot whose hash differs from the cached row, it's an automatic invalidation — no manual refresh required.
Charging happens at the route, not the service
The service layer doesn't know about credits. The route layer calls charge() first (workspace pool → personal fallback), and only on success invokes the regenerate path. On AI Gateway 503 after the debit, the route refunds the workspace half immediately.
Why this split: the cache row should only exist when the operator has been billed and the LLM has returned valid content. Two-step commits would deadlock on retries; the debit-then-refund pattern is simpler and matches the way /api/scrape already worked.
Effective LLM spend per workspace
Order-of-magnitude estimate, for a workspace that touches 50 KOL dossiers per week:
- Pre-cache: 50 dossier opens × 2 LLM-driven cards on the page = ~100 LLM calls per week
- Post-cache: 50 first-time generations the first week, ~5-10 refreshes per week thereafter
- 5x-10x reduction on the steady state; first week is identical because everything misses
The deep-analysis tier (5 credits, opt-in button) doesn't change — that one is meant to be deliberate.
Frequently asked
- Which features were uncached before this fix?
- Two: cold outreach drafts (kolens-web/app/api/kols/[u]/outreach) and audience-insight interpretation (kolens-web/app/api/kols/[u]/audience/insights). Both lived on Vercel BFFs that called the AI Gateway directly via generateText() — no DB persistence layer. Operators hitting the 'Generate' button twice paid twice.
- How is the locale gate implemented?
- Every cache table stashes the generation locale inside the JSON payload. read_cached compares that string to the caller's locale; a mismatch counts as a cache miss. Rows that predate the locale-gate feature default to 'en' so existing English caches keep serving English users unchanged.
- What about the audience-snapshot hash gate?
- Audience insights is special — the input is a sampled-followers snapshot that can change. We SHA-256 the snapshot dict at write time and re-check at read time. Same locale + same snapshot bytes → cache hit. Same locale + different snapshot (e.g. you re-sampled) → automatic invalidation, no manual refresh.
- Do you ever charge twice for the same cache row?
- Never on cache hits. Only the regenerate / refresh paths charge. Refunds fire on the failure path (AI Gateway 503 after the debit) for the workspace side; personal-side refunds need the operator to claim because Supabase write paths from Railway are limited.
- How is the cache row invalidated when I update my brand profile?
- Honestly: it isn't yet — that's an explicit out-of-scope on PRs #255 and #256. For now the operator clicks Refresh on the dossier after a brand-profile edit. Cache-key-includes-profile-hash is a clean follow-up.
- What's the 30-day TTL story?
- Creator Overview: 30d — themes are stable. Background Research: 30d — identity facts drift slowly. Cold Outreach: 30d — the creator's profile doesn't change in a week. Audience Insights: 30d (hash gate handles the case where the snapshot did change). Activity Summary: 24h — explicitly tracks recent posts so shorter window. Deep Analysis: 7d — negotiation / partnership insights drift faster.
Read next
The brand-fit dossier — Creator Overview, Deep Analysis, and Background Research
Three layered LLM reads on every creator: a free scored Creator Overview gated on the brand profile, a 5-credit Deep Analysis premium tier, and web-grounded Background Research with citations. All cached, all locale-aware.
Audience Persona is live — sampled-follower analytics for any TikTok creator
KOLens' Audience Snapshot ships today: where a creator's followers actually live, what language they speak, how active they are, which niches they care about. Statistical sample with explicit 95% CI / ±error.