← Changelog

Corpus chips stop lying — the records sheet column the operator was already editing becomes the join key; and Jina's `Published Time:` stops sitting unread in the body of every capture, lifting to top-level frontmatter on every write going forward and backfilled into 244 .md files retroactively

Tonight started with a small symptom — three rows in the Sort & Filter Lens on the v10 record set showing `corpus 0` despite per-funder corpus directories full of files on disk: hewlett-foundation (2 files), howard-schultz-foundation (6), kellogg-foundation (6). The yesterday-night issue file had named four others (sobrato, stand-together, cohen, todd-fisher) and proposed a per-client `corpus-overrides.yaml` as the defense-in-depth fix. Hard-refresh resolved those four cleanly — but exposed these three. A second NATS probe confirmed the backend was returning the correct counts (2 / 6 / 6) end-to-end for the new symptoms too. The actual bug was in the wire, not the join: `corpus.list_for_record` had no `CAPABILITY_TIMEOUTS_MS` entry so dispatch fell through to the 5000ms default, the content-ingest handler is a serial `for await` loop, and each call walked every funder directory under `clients/<id>/corpus/` (~40 dirs × ~300 .md files). With 96 visible rows firing 96 parallel requests on view load, late ones in the burst timed out and the lens swallowed the error silently — race timing explained why sobrato et al. recovered post-refresh and hewlett/schultz/kellogg didn't. The operator's framing: *'shouldn't there be a pretend join where the column in the records sheet for corpus references a folder name/relative path?'* The records sheet already populates `corpus_funder_slug` per row. Wiring that into `listForRecord` as the primary join key dissolved THREE things at once: the bug (operator's explicit assertion routes around every lineage edge case), the timeout race (per-request walk went from 300 files to ≤13), and the planned `corpus-overrides.yaml` proposal (the cell IS the override surface — no new file format needed). Lineage stays as the fallback for rows without a slug. While the door was open we noticed something else: Jina's response body has a `Published Time:` preamble line on every capture, and we'd been writing it to the body but never lifting it into frontmatter. The sort/filter UIs had no first-class authored-date field. A six-line addition to `jina.ts` parses the preamble + a `liftPublishedAt` helper in `corpus.ts` promotes it between `fetched_at:` and `record_id:` on both writers. A sibling backfill script rescued 244 dates from existing files (some going back to Wikipedia article creation dates from 2004 — the corpus carries deep history we couldn't see). Also tonight: 226 hand-curated captures across 20 funder directories + 86 inbox triages from the operator's v10 hand-search rhythm, including the first `manual-local-pdf` convention (operator drops a ResearchGate PDF from a local download; sidecar .md flags the missing canonical URL for later replacement). The slug-join means each of those 20 new funder dirs surfaces on its row's chip the moment the dir exists, no record_uuid plumbing required.