perplexed 2026-05-10

Directory templates — per-folder content generation paradigm with auto-seeding

Ships the cf/cft codefence DSL, four directory templates (concept, vocabulary, source, toolkit), streaming + cleanup pipeline with image marker placement and fallback, anti-incumbent editorial stance, cite-wide-compatible sources footer, frontmatter run-stamps, Google Books URL harvesting for book sources, and a first-run seeder that drops shipped templates plus a README into the user's vault on plugin load.

All ship notes

Why Care?

Perplexed already had useful one-shot commands — Generate One-Page Article, Enhance Selected Text, Find Images for Selection — but each was a manual editor-callback that ran on whatever file the user happened to open. Filling out 1600 nearly-empty profile files in Tooling/, hundreds of concepts in concepts/, the vocabulary, sources, etc., one file at a time, was untenable. The directory-template paradigm closes that gap: one template per directory describes how to fill its files, and the runtime applies the template across one file or a whole folder via Perplexity research, with streaming writes, image embedding, citation hygiene aligned to cite-wide's spec, and frontmatter stamps so files can be queried for staleness later.

The paradigm has three primitives:

Templates — markdown files with frontmatter (applies-to-paths glob), a fenced cft configuration block (model, return flags, system prompt with interpolation tokens), and a heading skeleton terminated by ***. The skeleton's bullets are model-facing instructions, not literal output.
Commands — Apply directory template to current file (auto-matches via glob) and Apply directory template to folder (batch). Streaming writes land in the file as Perplexity returns them, so failures surface within seconds instead of after a 60-second silent wait.
Cleanup pipeline — post-stream, the runtime wraps <think> blocks, swaps [IMAGE N: …] markers for real embeds (with a fallback # Images section when the model didn't emit markers but Perplexity returned images), strips unreplaced placeholder bullets, appends a cite-wide-compatible # Sources footer, and stamps run metadata (cf_last_run, cf_last_run_model) into frontmatter.

Four shipped templates ship with the plugin and seed into the user's vault on first plugin load — so a fresh install doesn't require the user to copy-paste templates from the source tree.

What Was Built

The cf/cft codefence DSL

A template file has three zones:

MARKDOWN

---
title: My Template
applies-to-paths: ["MyDir/**"]
---

# Free-form intro (ignored)

```cft
provider: perplexity
model: sonar-pro
return-citations: true
return-images: true
system: |
  System prompt with {{basename}}, {{frontmatter}}, {{today}}, {{frontmatter.<key>}} tokens.
```

# Heading skeleton (the user prompt)
- Bullet instructions to the model.

***

# User Notes (excluded from request)

The cft codefence block carries the runtime configuration as YAML; the heading skeleton between the closing cft fence and the first *** divider is the user prompt. Anything above cft is documentation; anything below *** is excluded from the request entirely.

Four shipped templates

File	Targets	Use case
`concept-profile.md`	`concepts/**`	Encyclopedia-style entries on ideas, patterns, mental models
`vocabulary-profile.md`	`Vocabulary/**`	Term definitions with disambiguation through innovation-consulting lens
`source-profile.md`	`Sources/**`	Profiles of trusted sources — books, people, podcasts, magazines, journals, reports, events. Type-aware: emphasis adapts based on detected type
`toolkit-profile.md`	`Tooling/**`	Profiles of tools, products, platforms, frameworks

source-profile is the trickiest because Sources/ is heterogeneous. Solution: one outline, type-conditional content. The system prompt enumerates seven canonical types and tells the model to pick one from frontmatter signals (youtube_channel_url → channel, aliases → likely book, etc.); each section has per-type bullet shapes the model picks from. Books additionally get Google Books URL handling — frontmatter google_books_url if present, otherwise the model finds it; either way the URL gets harvested into frontmatter via post-processor regex so subsequent runs skip the search.

Anti-incumbent editorial stance

Concept-profile, vocabulary-profile, and source-profile all embed an "editorial stance — attribute innovation correctly" block in their system prompts. Rules: tech giants (Microsoft, Google, Amazon, Apple, Meta, Oracle, Salesforce, IBM post-1990s) treated as adopters/popularizers, not innovators, unless documented heyday-era origination or a research-lab paper supports innovator framing. Origins favor founder/paper/originating-startup attribution; "Best Real-World Examples" caps big tech at 1–2 of 5–7 entries; case studies prefer narratives of smaller innovators outpacing incumbents. Saved as project memory so future content templates inherit the rule.

Streaming + cleanup pipeline

directoryTemplateService.ts streamPerplexityToFile opens a real fetch() SSE stream (Obsidian's request() buffers), accumulates the response into a local string, captures search_results and images arrays as they arrive, and flushes intermediate writes to the target file every 500ms so the user sees progress. After the loop:

wrapThinkBlocks converts <think>...</think> to `think-output fenced blocks.
processContentWithImages swaps [IMAGE N: …] markers for ![desc](image_url) using the images array; permissive regex matches [Image …], [IMAGE …], and the markdown-image-shaped ![IMAGE N](…)/[IMAGE N](…) variants.
Fallback when the regex misses but images.length > 0: buildFallbackImagesSection emits a # Images block before the sources footer (mirrors the article-generator's existing fallback so images never silently vanish).
stripUnreplacedImagePlaceholders removes any [Image embed placeholder …] lines the model didn't replace, so instruction text doesn't leak into the document.
buildSourcesFooter emits *** divider, # Sources h1, and [N]: [Title](URL) reference definitions in the canonical Lossless format that cite-wide's REFDEF_NUM_RE accepts. Run provenance lives in frontmatter only — no in-body provenance line, since cf_last_run and cf_last_run_model already carry that data.

Frontmatter stamps

Every successful run stamps three keys into the target's frontmatter via processFrontMatter:

cf_last_run — ISO timestamp.
cf_last_run_model — Provider model label (e.g., Perplexity sonar-pro).
google_books_url — for book sources, harvested from generated body via regex; only stamped when the field isn't already present, so user-curated URLs are never overwritten.

First-run seeder

templateSeederService.ts ships the four templates plus a README inlined into main.js via esbuild's text loader for .md files (loader: { '.md': 'text' }). On onload, seedTemplatesIfMissing writes the bundled files to the configured templates root with a two-tier policy:

README (docs) — always ensured present if missing, regardless of folder state. Covers cf/cft anatomy, interpolation tokens, commands, image markers and fallback, citation behavior, frontmatter stamps, writing custom templates, re-seeding semantics.
Templates (user-managed content) — only seeded when the folder has no non-README markdown. A folder with even one shipped template is treated as user-managed and left alone, so a user who deleted concept-profile.md intentionally won't get it resurrected.

A Re-seed templates button in the Directory templates settings section fills any shipped file whose filename doesn't already exist — for pulling in new templates after a plugin update without overwriting edits to existing ones.

Verification

pnpm run build (eslint + tsc + esbuild production) green at every commit. The flow was exercised end-to-end on a vocabulary entry (sonar-pro returning images that embedded correctly), a concept entry (model emitting markers, fallback exercised), and a source-profile run on a book (Google Books URL harvest stamped to frontmatter as expected). Seeder verified by deleting the README from the vault — next plugin load wrote it back without touching the four templates.

What Changed in Approach

Pattern this rejects	Pattern this adopts
One mega-prompt per command, hand-tuned per file	Per-directory template with `applies-to-paths` glob — one outline serves a whole folder
Buffer the entire response, then write at the end	Stream to file as chunks arrive (500ms flushes), so failures surface within seconds
Discard SSE metadata after the prose ends	Capture `search_results` and `images` arrays during the loop, return alongside the streamed body
Tell the model "if you can't find X, describe failure modes"	Never offer the model permission to be lazy — saved as a project memory
Treat tech giants as the canonical example for any concept	Editorial stance: incumbents are adopters/popularizers, not innovators, unless documented heyday or research-lab origin
When the image-marker regex misses, lose the images silently	Fallback `# Images` section when markers don't replace but `images.length > 0` (mirrors article-generator's existing behavior)
Frontmatter for output metadata, body for provenance	Frontmatter is canonical home for run-stamps and IDs (`cf_last_run`, `cf_last_run_model`, `google_books_url`) — no duplication in body
Distribute templates as documentation that users must copy by hand	Inline templates into `main.js` via esbuild text loader, seed the user's vault on first plugin load — README always written, templates only when folder is fresh
Return the model to the same broken default after every change	Switch concept-profile from `sonar-deep-research` to `sonar-pro` — deep research is unstable for image return per the article-generator's existing compatibility warning, sonar-pro has been reliable

The generalizable point: a content-generation paradigm is the template + the runtime + the editorial stance, not just the prompt. Each piece reinforces the others — the runtime's marker-placement contract means templates can use [IMAGE N: …] syntax confidently; the editorial stance means concept and vocabulary entries don't have to fight model bias one prompt at a time; the seeder means a user opening the plugin for the first time has working templates without copy-paste; the citation footer alignment with cite-wide's REFDEF_NUM_RE means the same conversion command that handles other LLM-generated content also handles ours.

Open Items

Image quality on abstract concepts. Perplexity image search keyword-matches against alt text, so it favors marketing heroes over feature/dashboard screenshots even when the latter would be more illustrative. For concept and vocabulary entries with no canonical visual referent, image search often returns nothing useful. Tracked in context-v/issues/Nudgeing-AI-Search-to-Return-Contextually-Appriate-Images.md. Tier 1 mitigation (port the article-generator's processContentWithImages pattern + fallback) is shipped here. Tier 2 (multimodal re-rank, URL-path preference) and Tier 3 (Ideogram-generated illustrations, headless screenshot service for Lottie/SVG sites, Gemini with Google Search grounding) are deferred until Tier 1 results are evaluated.
Auto-hyperlink feature names in tables. When toolkit-profile generates a clean feature table, the leftmost column (feature names) is plain text. Each named feature usually has a dedicated page on the entity's site (/features/zapi, etc.); rewriting cells as markdown links would let readers jump to source. Plan documented at context-v/plans/Auto-Hyperlink-Feature-Names-In-Tables.md. Defer until headless screenshot service exists (shares crawler infrastructure).
<think> block streaming UX. When the model uses <think> blocks (sonar-deep-research does), the raw blocks land in the file during streaming and only get wrapped to a fenced think-output block at the end. A live wrap during streaming would be cleaner. Held — separate intent from the paradigm work.
Multi-cft per template. Right now each template has one cft block. A multi-block template would let a single template define per-section refresh prompts (e.g., a "freshen the Examples section" block separate from the "regenerate the whole entry" block). Unblocked but not designed; a follow-up.
Pretty-name mapping for model labels. cf_last_run_model records the raw provider/model strings (Perplexity sonar-pro). A mapping to human-readable labels (Perplexity Sonar Pro) would read better in frontmatter and downstream queries. Trivial; not done because the raw form is still grep-friendly.
Defensive model capture from API response. Currently the stamped model name comes from the cft config. If Perplexity silently substitutes a different model (e.g., on rate-limit fallback), we won't reflect that. Capturing the model name from the SSE response and stamping that instead would close the loop.
Citation/backlink preservation post-filter. The image-placeholder strip and [IMAGE N: …] replacement run on the streamed string before any cite-wide processing. If a user runs cite-wide convert-all afterward, citations move through cleanly — but we don't have a pre-flight check for that ordering. Worth a smoke-test pass next time we touch the pipeline.

Files Touched

TEXT

perplexed/
├── main.ts                                                  (commands + settings UI for directory templates; seeder hook in onload; Re-seed templates button)
├── esbuild.config.mjs                                       (added `.md` text loader so markdown files are inlined into main.js)
├── src/
│   ├── docs/templates/
│   │   ├── README.md                                        (new — end-user docs for the directory-template system)
│   │   ├── concept-profile.md                               (new — anti-incumbent editorial stance, sonar-pro)
│   │   ├── vocabulary-profile.md                            (new — innovation-consulting lens, sonar-pro)
│   │   ├── source-profile.md                                (new — type-aware, Google Books rule, sonar-pro)
│   │   └── toolkit-profile.md                               (new — Tooling/** profiles)
│   ├── services/
│   │   ├── directoryTemplateService.ts                      (new — template loader, payload builder, SSE streamer, post-stream cleanup pipeline, frontmatter stamps, Google Books URL harvest)
│   │   ├── findImagesService.ts                             (new — selection-anchored image search with on-domain restriction)
│   │   └── templateSeederService.ts                         (new — first-run + re-seed logic, two-tier policy: README always, templates only when fresh)
│   ├── modals/
│   │   ├── DirectoryTemplatePickerModal.ts                  (new — fuzzy template picker)
│   │   ├── FolderPickerModal.ts                             (new — folder picker for batch runs)
│   │   └── BatchConfirmModal.ts                             (new — pre-batch confirmation with file count)
│   └── types/
│       └── markdown.d.ts                                    (new — TS shim for `*.md` raw imports)
└── changelog/
    └── 2026-05-10_01.md                                     (this file)

Reference

Predecessor changelog: changelog/2026-05-02_01.md — dependency refresh + streaming-citations bug fix; this paradigm builds on the streaming primitive that was hardened there.
Cross-cutting docs in content-farm:
- context-v/specs/Per-Directory-Profile-Templates.md — v0.1 spike spec.
- context-v/explorations/Moving-Beyond-Simple-API-Calls.md — full architecture exploration with locked decisions.
- context-v/issues/Nudgeing-AI-Search-to-Return-Contextually-Appriate-Images.md — image acquisition issues and tiered mitigations.
- context-v/plans/Auto-Hyperlink-Feature-Names-In-Tables.md — deferred feature plan.
cite-wide compatibility: cite-wide/src/services/llmCitationParserService.ts:91 REFDEF_NUM_RE — [N]: [Title](URL) form is what Convert All Citations to Hex Format accepts. Sources footer aligned to this.
Project memories saved this session:
- feedback_anti_incumbent_bias.md — for concept/practice templates, treat tech giants as adopters/popularizers; cap big tech in Examples.
- feedback_prompts_no_lazy_outs.md — never tell a model it can describe failure modes or disclose "limited info."