perplexed 2026-04-30

Claude provider integration: Anthropic SDK + Ask Claude modal + two-pass citation extraction

Claude is now a first-class research provider in Perplexed — wired via the Anthropic SDK with server-side web_search, adaptive thinking, streaming, an Ask Claude command, and a citations pipeline that handles both per-claim text-block citations and bare web_search_tool_result fallbacks.

All ship notes

Why Care?

Before this pass, Perplexed had two providers — Perplexity (cloud, web-grounded, Sonar models) and Perplexica (local, self-hosted, SearXNG-backed). Both work well for their respective use cases, but neither covers the case the user actually wanted next: server-side web search by a frontier model with native per-claim citations, the pattern Anthropic ships with web_search_20250305.

The motivation isn't "have one more provider." It's that Claude's research output, when it works, is structurally different from Perplexity's:

Per-claim citations attached to text blocks, not appended at the end. Each sentence the model considers source-grounded gets its own web_search_result_location citation with cited_text (the verbatim quote) and url. That maps cleanly onto Lossless Citation Spec refdefs with the > {cited_text} blockquote suffix.
Adaptive thinking — Claude reasons before answering on hard questions, which produces noticeably different results from one-shot answers on agentic / multi-source synthesis.
Larger context window, which matters when piping in long prior-conversation context or a draft to research against.

The strategic point: Claude isn't a replacement for Perplexity. It's a different shape of research — slower, deeper, with citations bound to specific claims rather than appended as a list. Adding Claude as a peer provider lets the user pick the right tool per question type.

The integration also surfaced a non-obvious gotcha that ate most of the session: Anthropic ships two web-search tool versions (web_search_20250305 and the newer web_search_20260209), and the newer one breaks per-claim citations as a side effect of its dynamic-filtering pass. The older tool is the right choice for this plugin's use case, even though it consumes more tokens. That's documented at the call site and in context-v/issues/ so the next person doesn't have to rediscover it.

What Was Built

`ClaudeService` — the provider plumbing

A new service module wrapping @anthropic-ai/sdk's Anthropic client. ~330 lines. Public surface:

export interface ClaudeOptions {
    enableThinking?: boolean;
    enableWebSearch?: boolean;
    effort?: 'low' | 'medium' | 'high' | 'xhigh' | 'max';
    maxTokens?: number;
}

export interface ClaudeSettings {
    anthropicApiKey: string;
    promptsService?: PromptsService | null;
    headerPosition?: 'top' | 'bottom';
}

export class ClaudeService {
    constructor(settings: ClaudeSettings);
    public updateApiKey(apiKey: string): void;
    public async queryClaude(
        query: string,
        model: string,
        stream: boolean,
        editor: Editor,
        options?: ClaudeOptions
    ): Promise<void>;
}

Construction-time defenses:

dangerouslyAllowBrowser: true is required because Obsidian runs Electron with a browser-like global, and the SDK's default refuses to instantiate without that flag. The flag's name is the Anthropic SDK's deliberate "you should not be calling our API from a user's browser" warning; it's appropriate here because the Obsidian plugin runs locally on the user's machine with the user's own key, not in a public-web context.
API key sourced from settings, which fall back to process.env.ANTHROPIC_API_KEY (loaded via dotenv at plugin init). Lets the user keep keys in .env instead of pasting into Obsidian settings, which is the friendlier dev workflow.
updateApiKey rebuilds the client in place — no need to re-instantiate the service when the user pastes a new key in settings.

Web search tool — `web_search_20250305`, deliberately

The non-obvious choice that drove most of this session. The call site is annotated:

const tools: Anthropic.Messages.ToolUnion[] = [];
if (options?.enableWebSearch !== false) {
    // Use web_search_20250305 (older tool, no dynamic filtering) instead of
    // web_search_20260209. The newer tool's dynamic-filtering pass post-processes
    // search results in a code-execution sandbox, and per-claim
    // web_search_result_location citations don't survive that round-trip — text
    // blocks come back with citations: null. The 20250305 tool reliably attaches
    // per-claim citations to text blocks, which is what gives us inline [N] markers
    // in the rendered prose. Trade-off: more tokens consumed (no result filtering),
    // but traceable per-claim citations are the whole point of this plugin.
    tools.push({ type: 'web_search_20250305', name: 'web_search' });
}

This is the kind of decision that's invisible from the API surface and obvious-in-hindsight from observing the response shape. The trade-off is real (more token consumption) but the alternative — losing per-claim citations — defeats the purpose of using Claude for research at all.

Two-pass citation extraction with priority merge

Claude's response is a content array of typed blocks. Citations can appear in two different places, and a robust extractor has to walk both:

private extractWebCitations(message: Anthropic.Messages.Message): ClaudeWebCitation[] {
    const byUrl = new Map<string, ClaudeWebCitation>();

    // Pass 1: collect raw search results as fallback — URL + title only
    for (const block of message.content) {
        if (block.type !== 'web_search_tool_result') continue;
        const content = block.content;
        if (!Array.isArray(content)) continue; // error case
        for (const result of content) {
            if (result.type !== 'web_search_result') continue;
            if (byUrl.has(result.url)) continue;
            byUrl.set(result.url, {
                url: result.url,
                title: result.title || 'Source',
                citedText: '',
            });
        }
    }

    // Pass 2: overwrite with text-block citations (they carry cited_text)
    for (const block of message.content) {
        if (block.type !== 'text') continue;
        const blockCitations = block.citations;
        if (!blockCitations) continue;
        for (const citation of blockCitations) {
            if (citation.type !== 'web_search_result_location') continue;
            byUrl.set(citation.url, {
                url: citation.url,
                title: citation.title ?? 'Source',
                citedText: citation.cited_text ?? '',
            });
        }
    }
    // …diagnostic console.log of block types + counts (see Open Items)…
    return Array.from(byUrl.values());
}

Priority order is deliberate: pass 1 collects every search result as a URL+title-only fallback, pass 2 overwrites with the richer per-claim text-block citations (which carry cited_text — the verbatim quote needed for Lossless-spec refdefs). Dedupe key is the URL.

The fallback exists because Claude sometimes uses the search tool but writes its answer narratively — citing sources in prose rather than attaching formal web_search_result_location citations to text blocks. Without the fallback, those answers arrive with empty citation lists and the user sees no ### Citations section even though the model clearly searched. With the fallback, the user at least gets URL + title for every source the model consulted, even if the cited-quote field is empty.

Streaming via SDK + `finalMessage`

Uses the Anthropic SDK's stream helper rather than rolling SSE parsing by hand:

const stream = this.client.messages.stream(params);
const currentPos = { ...responseCursor };

stream.on('text', (delta: string) => {
    if (!delta) return;
    editor.replaceRange(delta, currentPos);
    // …advance currentPos by delta length / newlines…
    editor.scrollIntoView({ from: currentPos, to: currentPos }, true);
});

const finalMessage = await stream.finalMessage();
await this.afterMessage(finalMessage, editor, headerText);

Two design notes:

The text event fires per-delta (not per-chunk) so the cursor advance is always over real prose, not SSE framing. The per-chunk decode + JSON parse loop that PerplexityService hand-rolls is unnecessary here because the SDK does it.
Non-streaming path also uses stream + finalMessage rather than messages.create. Reason: long responses (Opus + adaptive thinking + web search) can take 60-90 seconds, and the SDK's one-shot create call hits HTTP timeouts on those. The stream helper keeps the connection warm and waits for completion. From the user's perspective the UX is identical (the editor doesn't update until the end) — the difference is just whether the request survives.

Citations rendering — Lossless Citation Spec, Claude flavor

The ### Citations section follows the same shape Perplexity service uses, with the verbatim quote suffix from the Lossless spec extension:

// Per-citation refdef — collapsed to single line, blockquote suffix optional
const collapsedCitedText = c.citedText.replace(/\s*\n\s*/g, ' ').trim();
const citedTextSuffix = collapsedCitedText ? ` > ${collapsedCitedText}` : '';
const safeTitle = c.title.replace(/\]/g, '\\]');  // escape ] in titles
newCitationsText += `[${citationNumber + index}]: [${safeTitle}](${c.url}).${citedTextSuffix}\n\n`;

The format:

[1]: [The Title of the Source](https://example.com/path). > The verbatim cited text from the source, on a single line.

Why single-line refdefs: Lossless Citation Spec requires the refdef stay on one line so downstream parsers (cite-wide and others) can match the standard [N]: [title](url). > text shape with a single regex. Internal newlines in cited text are collapsed to spaces.

Why numeric identifiers: matches what PerplexityService emits, so a vault containing both Claude and Perplexity research notes has homogeneous citation markup. cite-wide can later promote the numbers to hex codes via its substitution pass.

Append-vs-create logic: if a previous query already wrote a ### Citations section in the same note, the next query appends to it (with continued numbering) rather than creating a duplicate section. The regex-driven detection finds the existing section, parses out the highest-numbered refdef, and starts the new citations from max + 1.

A dedicated modal class so users don't have to remember command parameters. ~195 lines. Surfaces every meaningful knob:

Question — multiline textarea for the prompt.
Model — dropdown over the four current Claude models (Opus 4.7, Opus 4.6, Sonnet 4.6, Haiku 4.5).
Effort — low / medium / high / xhigh / max. xhigh and max are model-bound (Opus only); the modal documents that in the option labels.
Enable Web Search — toggle.
Adaptive Thinking — toggle.
Stream Response — toggle (default on; recommended for long answers).

Submit is wired both to the Ask Claude button and to Cmd/Ctrl+Enter in the textarea. Cancel closes without submitting. Every option is captured into a ClaudeOptions object and forwarded to claudeService.queryClaude.

(Note: this initial modal lands as a functional but visually plain UI. The wide-modal CSS unlock and the unified sectioned layout that turns this into the polished form land the next day — see 2026-05-01_01.md.)

Settings + commands wiring (in `main.ts`)

Two new commands registered at plugin init:

ask-claude / "Ask Claude" — editorCallback opens the Ask Claude modal. Notice with actionable copy if the service isn't initialized: "Set ANTHROPIC_API_KEY in .env or settings, then reinitialize services."
claude-service-status / "Check Claude Service Status" — diagnostic command. Three states: initialized + key present / initialized + no key / not initialized.

Settings UI gets a new "Claude (Anthropic)" section with the Anthropic API Key input. The setting is bound to the same anthropicApiKey field that defaults from .env — so changing it in the UI overrides the env var, which matches the "settings-take-priority-over-env" convention the plugin already uses for Perplexity.

Typed error handling at the SDK boundary

Surfacing Anthropic SDK errors as actionable Notices instead of raw stack traces:

if (error instanceof Anthropic.AuthenticationError) {
    userMessage = 'Invalid Anthropic API key';
} else if (error instanceof Anthropic.RateLimitError) {
    userMessage = 'Claude rate limit exceeded — wait and retry';
} else if (error instanceof Anthropic.BadRequestError) {
    userMessage = `Claude request invalid: ${error.message}`;
} else if (error instanceof Anthropic.APIError) {
    userMessage = `Claude API error ${error.status}: ${error.message}`;
} else if (error instanceof Error) {
    userMessage = `Claude error: ${error.message}`;
} else {
    userMessage = `Claude error: ${String(error)}`;
}

Each surfaces as both a new Notice(...) (transient toast) and an **Error:** ... line written to the editor at the cursor — so the failure is visible whether or not the user was looking at the toast at the moment it fired.

Verification

pnpm run build green after the service + modal landed and again after the citation-extraction fallback was added.
Claude SDK request shape confirmed against the Anthropic Messages API reference.
Diagnostic console logging added to extractWebCitations to print block types + counts, so the user can copy that into later sessions when triaging "why no citations" cases.

What Changed in Approach (the meta-lesson)

Pattern this rejects	Pattern this adopts
Pick the newest version of a tool ID by default (`web_search_20260209`)	Read the response-shape contract end to end before locking in a tool version. Tool versions in Anthropic's Messages API are not strictly forward-compatible — the newer web_search post-processes results in a code-execution sandbox and silently drops per-claim citations. The right version depends on what output you need, not which is most recent
Roll SSE parsing by hand for every provider	Use the SDK's stream helper when one exists (`messages.stream` + `stream.on('text', …)` + `finalMessage()`). The hand-rolled `getReader()` + `TextDecoder` + per-line JSON parse loop in `PerplexityService` exists because that provider's SDK doesn't emit text events the same way; if Anthropic's helper exists, use it and don't reinvent
Use `messages.create` for non-streaming UX	Use `stream + finalMessage` even when the UX is "wait then write all at once" — the streaming HTTP keeps the connection alive past the timeout window that one-shot `create` hits on long responses (Opus + thinking + web_search can run 60-90s)
Walk one place in the response for citations	Walk both places: the formal text-block `web_search_result_location` citations and the raw `web_search_tool_result` blocks. Priority-merge by URL with the richer source winning. The fallback covers the failure mode where Claude searches but doesn't formally cite, which happens often enough to matter
Treat `dangerouslyAllowBrowser: true` as a smell	Read the actual constraint: the flag name is Anthropic's warning to public-web apps, not to Obsidian-plugin contexts. The Obsidian plugin runs locally on the user's machine with the user's own key. Setting the flag is the right choice; the warning doesn't apply
Hardcode the API key into the service	Read from settings; settings default from `process.env.ANTHROPIC_API_KEY` via `dotenv`. Lets the user pick the workflow that fits — UI for casual setup, `.env` for dev — and the env var doesn't leak into UI state if the user later prefers settings
Surface SDK errors as raw stack traces	Type-narrow against `Anthropic.AuthenticationError`, `RateLimitError`, `BadRequestError`, `APIError` and emit actionable Notices for each. The user sees "Invalid Anthropic API key," not a 12-line stack

The generalizable point: a provider integration is not done when the service compiles. It's done when the response shape is actually being read end-to-end and the failure modes (no citations, timed-out long requests, invalid key, rate-limit) are each traced to a user-visible signal. Half of this session was the failure-mode coverage.

Open Items

Tool-choice is auto, not any. The model decides whether to actually invoke web_search. With Opus 4.7's January 2026 training cutoff, recent factual questions sometimes get answered from training without a search call — and no search call means no web_search_tool_result blocks and no citations, even if the answer is correct. Two possible fixes, neither shipped this pass:
- tool_choice: { type: 'any' } — forces at least one tool call. Since web_search is the only tool, this forces a search every time.
- System prompt: "ALWAYS use the web_search tool to verify and ground every factual claim. Do not answer from training knowledge." Held until the diagnostic logs show how often this actually happens in practice. Documented in context-v/issues/Getting-Claude-to-Respond-With-Research.md.
Diagnostic logging is console-only. The [ClaudeService] extractWebCitations — block types: [...]; text blocks with citations: N; extracted: N line goes to Obsidian's dev console (Cmd+Opt+I). Users who hit the "no citations" failure mode have to know to open the console. Either move it to a Notice when extracted count is 0, or surface a "Why no citations?" inline note in the rendered output. Held.
Other parseable response data is not used. The response contains server_tool_use blocks (the actual query string Claude searched for), code_execution_tool_result blocks (output of the dynamic-filter pass on the newer tool — not applicable here), page_age per result. All currently ignored. Could be surfaced in the rendered output: "Claude searched for: …" preamble, date-aware refdefs using page_age. Held — scope creep on this pass.
Effort xhigh / max are model-bound but not validated. The modal's effort dropdown lists xhigh (Opus 4.7) and max (Opus only, highest cost) in the labels, but pairing Haiku with max doesn't fail until request time. Could dynamically filter the effort options based on the model selection. Held.
Modal is functional but visually plain. This first version uses Obsidian's narrow default modal width and a stack-of- Settings layout. The unified sectioned design with the wide-modal CSS unlock lands the next day; tracked in 2026-05-01_01.md.
No retry logic for rate limits. A RateLimitError surfaces a Notice and an editor error line, but doesn't auto-retry with backoff. For research workflows where the user hit limit during a streaming response, manual retry is fine for now. Held.

Files Touched

perplexed/
├── package.json                                       (added @anthropic-ai/sdk to dependencies)
├── main.ts                                            (Settings: claudeDefaultModel, anthropicApiKey defaulting from process.env; service init at plugin onload, re-init on settings change; registerClaudeCommands wiring 'ask-claude' + 'claude-service-status' commands; settings UI Claude section with API key input)
├── src/
│   ├── services/
│   │   └── claudeService.ts                           (created — ~330 lines; ClaudeService class wrapping @anthropic-ai/sdk; queryClaude with stream / non-stream paths both via messages.stream + finalMessage; extractWebCitations two-pass priority-merge; addCitations writing Lossless-spec refdefs with cited_text blockquote suffix; typed error handling for Anthropic SDK error classes)
│   └── modals/
│       └── ClaudeModal.ts                             (created — ~195 lines; functional but visually plain initial version: model dropdown, effort dropdown, web-search/thinking/stream toggles, textarea with Cmd/Ctrl+Enter submit; redesigned the next day per 2026-05-01_01.md)
└── context-v/
    └── issues/
        └── Getting-Claude-to-Respond-With-Research.md (created — diagnosis of the two failure modes for missing citations: tool_choice gives Claude discretion to skip search; or Claude searches but emits only web_search_tool_result without per-claim text-block citations. Includes the diagnostic-logging recipe, the tool_choice: any vs system-prompt fix options, and notes on other parseable response data — server_tool_use, page_age, code_execution_tool_result — that could be surfaced)

Reference

The web-search tool decision: src/services/claudeService.ts:79-90 — the inline comment is the canonical record of why web_search_20250305 is the right choice for this plugin even though web_search_20260209 is newer.
The two-pass extractor: src/services/claudeService.ts:204-249 — extractWebCitations with the priority-merge logic and diagnostic logging.
Citation rendering: src/services/claudeService.ts:257-305 — addCitations emitting Lossless-spec refdefs with cited-text blockquote suffix and append-to-existing logic.
The struggle, in detail: context-v/issues/Getting-Claude-to-Respond-With-Research.md — written during this pass; the two failure modes, the diagnostic logging recipe, the proposed tool_choice: any / system-prompt fixes, and the broader catalog of parseable response data that's currently ignored.
Commands: main.ts:682-712 — registerClaudeCommands for ask-claude and claude-service-status.
Settings UI: main.ts:1107+ — Claude (Anthropic) section with the API key input; defaults to process.env.ANTHROPIC_API_KEY.
Anthropic SDK reference: @anthropic-ai/sdk v0.92.0 (the version pinned in package.json); Anthropic.Messages.MessageStreamParams, messages.stream(), stream.on('text', …), stream.finalMessage().
Lossless Citation Spec — Claude extension: the [N]: [title](url). > {cited_text} shape that addCitations emits. cite-wide can later promote numeric identifiers to hex codes via its substitution pass; the numeric form here is intentional for cross-provider homogeneity with PerplexityService.
Successor changelog: 2026-05-01_01.md — modal redesign pass that brings Ask Claude (and Ask Perplexity, Ask Perplexica) to the unified sectioned layout with the wide-modal CSS unlock.
Recent commits on development (this pass): 9d5bc80 progress(model-provider): attempt to add new model, claude. Struggling with getting returned research objects. (initial ClaudeService + ClaudeModal + main.ts wiring), b5a70c7 stuck(claude): claude not returning research citations or reference definitions, just plain text (diagnosing the missing-citations failure mode), d7551d7 stuck(model): stuck trying to get model Claude API working (refinements to claudeService + extractor fallback).