The corpus meets its org — a content_items ledger connects 500 fetched files to canonical entities, and a publisher / about / mentions model says how
528 markdown files on a laptop and 129 organizations in the cloud database knew nothing about each other. Tonight a reconcile pass joins them: every fetched URL becomes a content_items row that records who published it, who it's about, and (soon) everyone it mentions — 500 rows, zero leakage, idempotent and re-runnable.