← Corpus / cite-wide / blueprint

cite-wide/blueprints/parse-common-citation-formats

Path
blueprints/Parse-Common-Citation-Formats.md

Troubleshooting Implementation

[1, 2, 3] [^01xra2] [^i52jf3]

turned to

[1] [^ozc1mc] [3] [^01xra2] [^i52jf3]

Problem Statement

In more files than we care to count, we have copy and pasted content from multiple sources with different citation formats. One after another. To make matters more complicated, in many of those multiple-citation-format files, we may have started but not worked through transforming citations to our standards. (It’s arduous, and takes sustained use of the Cite Wide plugin we are iterating on.) The existing commands/scripts will likely leave gaps, or cause errors on these files because they cannot reason about the content and cannot perform on a selection within a file. To boot, Obsidian does not have a save system, so if we run a script that alters the whole file’s set of citations, we may not be able to see it, much less undo it if we make a mistake. (We do use github on our content repository, but it’s quite the hassle when trying to quickly move from relevant content identification, LLM response output, copy paste and move on.)

Existing Scripts likely won’t work on a Messy File

I am not sure exactly how it works, but when I run it feels like one fell swoop through the whole file. That works when there are not multiple sections of the file with numeric citation/ref def collisions. I’m pretty sure it won’t if there are.

citationService.convertAllCitations(content) converts numeric singles + reference defs to hex. ✓

  • The new dedup-by-URL command consolidates same-URL-different-hex collisions. ✓
  • The two gaps for the script: (a) multi-citation inline expansion ([1, 2, 3] → three hex markers), (b) Perplexity-style adjacent multi ([1][2] → [^a] [^b]).

Challenge: The existing script cannot reason. It has two modes: perform on text selection (Obsidian API) or perform on entire file (CLI). It cannot perform on a selection within a file, nor can it analyze for potential issues across copy-pasted content from multiple sources.

An LLM Provider specific parser, possibly with AI Model oversight

This LLM provider parser would be able to

  1. identify clusters of citations — ignoring duplicate numerics in another part of the section if it does not fit the pattern — by identifying the pattern that indicates a cluster done by a specific LLM provider.
  2. group those clusters and handle them independently.
  3. Ignore human improvements if they meet Lossless Standards, (could be a few citations that are already converted, in the middle of those that are not).
  4. Transform what can be transformed, and flag what cannot because it escapes patterns that the LLM provider uses.

Parse Analysis on One File

It’s a mixed-format chaos file — six different citation patterns coexisting:

┌───────────────────────────┬──────────────────────┬──────────────────────────────────────────┐ │ Pattern │ Example │ Existing code handles? │ ├───────────────────────────┼──────────────────────┼──────────────────────────────────────────┤ │ Already-hex inline │ [^znfkr0] │ ✓ (preserved) │ ├───────────────────────────┼──────────────────────┼──────────────────────────────────────────┤ │ Already-hex reference def │ [^znfkr0]: │ ✓ │ │ │ Title │ │ ├───────────────────────────┼──────────────────────┼──────────────────────────────────────────┤ │ Numeric single inline │ [12] │ ✓ via convertAllCitations │ ├───────────────────────────┼──────────────────────┼──────────────────────────────────────────┤ │ Numeric multi inline (the │ [1, 2, 3], [11, 13, │ ✗ regex only matches [1]-like │ │ LLM output style) │ 14, 15] │ │ ├───────────────────────────┼──────────────────────┼──────────────────────────────────────────┤ │ Numeric reference def │ [2] Title │ partial — pattern matches but coexists │ │ without caret │ │ with hex refs in same list │ ├───────────────────────────┼──────────────────────┼──────────────────────────────────────────┤ │ Orphan refs / dangling │ [^7b69gi] alone on │ ✗ │ │ markers │ its own line │ │ └───────────────────────────┴──────────────────────┴──────────────────────────────────────────┘

Plus URL-bound issues: same source cited with different titles in different LLM responses (one says “AGIX Technologies”, another shortens to “agixtech”); the dedup-by-URL command we just shipped covers that.

Patterns from Major Providers

  1. Google AI — inline [1, 2, 3] (multi); reference list [1] Title plain bracket. Sometimes hybrid [1, 2] form.
  2. Perplexity — inline [1][2] adjacent (no comma, no space); reference list [1] Title https://url (URL not always linkified). /* Claude doesn’t provide citations in Native App, need to see API responses for a later iteration */ 3. Claude — prose-style with bolded source names + attribution paragraphs; no bracket markers at all. (Need to use API with citations set to true, for another iteration. Here as placeholder)

Example Citations

Example citations from Google AI Chat:

The vector database market is currently characterized by explosive growth and a shift from "hype" to becoming the "external brain" for AI agents and enterprise applications. As of 2026, these databases have evolved into critical infrastructure that provides the long-term memory necessary for [Retrieval-Augmented Generation (RAG)](https://aws.amazon.com/blogs/database/the-role-of-vector-datastores-in-generative-ai-applications/) and complex autonomous reasoning. [1, 2, 3] 
## Market Research & Growth Predictions
Analysts project continued high-velocity expansion, driven by the massive volumes of unstructured data and the universal mandate to integrate AI into enterprise workflows. [1, 4] 

* Market Size & CAGR:
* The global market was valued at approximately $3.0 billion in 2025.
   * Forecasts for 2030 range from $8.7 billion to $10.6 billion.
   * Compound Annual Growth Rates (CAGR) are consistently projected between 22% and 27.5% through 2030.
* Regional Dominance: North America held the largest market share (approx. 37%) in 2025, while Asia-Pacific is expected to be the fastest-growing region due to aggressive digital transformation.
* Sector Highlights: Retail & e-commerce is projected to grow the fastest (33.8% CAGR), utilizing vector search for hyper-personalized shopping experiences. [4, 5, 6, 7, 8, 9, 10, 11] 

## The "Hype" vs. Reality: Role in AI Ecosystems
While the early hype focused on simple semantic search, the 2026 reality centers on Agentic AI—autonomous digital workers that require persistent, stateful memory to execute multi-step tasks. [1, 2] 

* Memory for AI: Vector databases act as a "hippocampus," allowing AI to remember past interactions and use successful past strategies for current problems.
* Reduction of Hallucinations: By grounding [Large Language Models (LLMs)](https://omdia.tech.informa.com/om122887/market-landscape-vector-databases-powering-generative-ai) in real-time private data via RAG, vector databases are the primary tool for making AI "hallucination-free" and enterprise-ready.
* Bifurcation of the Industry: The market has split into Native Vector Databases (e.g., [Pinecone](https://www.pinecone.io/), [Weaviate](https://weaviate.io/), [Milvus](https://milvus.io/)) built specifically for vector math, and Multimodal/General Databases (e.g., [PostgreSQL](https://dev.to/actiandev/whats-changing-in-vector-databases-in-2026-3pbo) with pgvector, [MongoDB Atlas](https://www.mongodb.com/products/platform/atlas-vector-search)) that have added vector search as a feature. [1, 2, 3, 12] 

## Investment Opportunity Analysis
Whether this is a "good" investment depends on where the capital is deployed, as the market is undergoing significant consolidation. [13] 

* The Bull Case: The 24-30% CAGR represents one of the fastest-growing segments in data infrastructure. Specialized providers that offer unique capabilities—like GPU acceleration for trillion-scale vectors or edge-native search for offline devices—continue to see rapid capital injection.
* The Bear Case: Many analysts warn that "vector search" is becoming a standard feature (plumbing) rather than a standalone product category. Growth is increasingly flowing to established giants like Microsoft, MongoDB, and Elastic who have integrated vector support, potentially squeezing out niche startups that lack deep enterprise ecosystems.
* Key Risk - "SaaS Tax": In 2026, enterprises are hitting a "financial ceiling" with usage-based managed services. High-volume users are increasingly moving to self-hosted or open-source solutions (like [Qdrant](https://qdrant.tech/) or Weaviate) to avoid exponential scaling costs, which could impact the margins of pure-SaaS providers. [1, 2, 4, 13, 14, 15, 16] 

Would you like to see a comparison of pricing models for the top vector databases to better evaluate their unit economics?

[1] [https://www.researchandmarkets.com](https://www.researchandmarkets.com/reports/6216016/vector-database-global-market-insights)
[2] [https://www.zignuts.com](https://www.zignuts.com/blog/vector-database-guide-gvg64)
[3] [https://dev.to](https://dev.to/actiandev/whats-changing-in-vector-databases-in-2026-3pbo)
[4] [https://www.marketsandmarkets.com](https://www.marketsandmarkets.com/Market-Reports/vector-database-market-112683895.html)
[5] [https://www.meticulousresearch.com](https://www.meticulousresearch.com/product/vector-database-market-6450)
[6] [https://www.thebusinessresearchcompany.com](https://www.thebusinessresearchcompany.com/report/vector-database-global-market-report)
[7] [https://www.thebusinessresearchcompany.com](https://www.thebusinessresearchcompany.com/report/vector-database-global-market-report)
[8] [https://finance.yahoo.com](https://finance.yahoo.com/news/vector-database-market-reach-usd-150000060.html)
[9] [https://finance.yahoo.com](https://finance.yahoo.com/news/vector-database-market-8-945-150100035.html)
[10] [https://www.thebusinessresearchcompany.com](https://www.thebusinessresearchcompany.com/report/vector-databases-for-financial-search-global-market-report?utm_source=OpenPR&utm_medium=Paid&utm_campaign=Feb_PR)
[11] [https://www.grandviewresearch.com](https://www.grandviewresearch.com/industry-analysis/vector-database-market-report)
[12] [https://omdia.tech.informa.com](https://omdia.tech.informa.com/om122887/market-landscape-vector-databases-powering-generative-ai)
[13] [https://medium.com](https://medium.com/data-science-collective/vector-databases-are-dying-heres-the-production-evidence-8c17b54687e2)
[14] [https://www.researchandmarkets.com](https://www.researchandmarkets.com/reports/6216016/vector-database-global-market-insights)
[15] [https://ranksquire.com](https://ranksquire.com/2026/03/04/vector-database-pricing-comparison-2026/)
[16] [https://medium.com](https://medium.com/data-science-collective/vector-databases-are-dying-heres-the-production-evidence-8c17b54687e2)

I have also seen Google use inline:

[[1](https://www.researchandmarkets.com/reports/6216016/vector-database-global-market-insights), [2](https://www.zignuts.com/blog/vector-database-guide-gvg64), [3](https://dev.to/actiandev/whats-changing-in-vector-databases-in-2026-3pbo)] 

Perplexity

A control server in Tailscale (officially called a "coordination server") is a centralized component that manages network coordination and device registration, but doesn't handle actual data traffic. It acts as the control plane in Tailscale's hybrid architecture, where control is centralized but data flows directly between devices in a peer-to-peer mesh.[1][2]

## Core Functions

The coordination server performs several essential tasks:[2]

- **Device registry and discovery**: Maintains a complete registry of all devices (nodes) in your tailnet, including IP addresses, client versions, public keys, locations, and operating systems
- **Key distribution**: Exchanges WireGuard public keys between nodes so they can establish encrypted connections[3]
- **Authentication**: Handles user authentication and device authorization
- **Policy enforcement**: Distributes security policies, access control lists, and firewall rules to all devices[4][1]
- **NAT traversal coordination**: Manages endpoint information between devices and selects optimal DERP relay servers when direct peer-to-peer connections aren't possible[2]

## Control Plane vs. Data Plane

Tailscale separates its architecture into two distinct planes:[1]

- **Control plane** (hub-and-spoke): The coordination server exchanges tiny encryption keys and policies with minimal traffic
- **Data plane** (mesh): Actual encrypted traffic flows directly between devices peer-to-peer using WireGuard, not through the coordination server

This design means the coordination server is never a bottleneck for your data, and network performance scales with the number of nodes rather than being limited by a central gateway.[1]

## Self-Hosted Alternative

**Headscale** is an open-source, self-hosted reimplementation of Tailscale's coordination server. It allows you to run your own control plane on infrastructure you control while still using official Tailscale clients on your devices. You can configure Tailscale clients to connect to a custom control server URL instead of Tailscale's default `https://controlplane.tailscale.com`.[5][6][7][3]

Sources
[1] Tailscale: How it works https://tailscale.com/blog/how-tailscale-works
[2] Control and data planes https://tailscale.com/docs/concepts/control-data-planes
[3] How to Set Up Headscale (Self-Hosted Tailscale) on Ubuntu https://oneuptime.com/blog/post/2026-03-02-how-to-set-up-headscale-self-hosted-tailscale-on-ubuntu/view
[4] What happens if the coordination server is down? https://tailscale.com/docs/reference/coordination-server-down
[5] Configure Tailscale clients to use a custom control server https://tailscale.com/docs/how-to/set-up-custom-control-server
[6] Open source at Tailscale https://tailscale.com/opensource
[7] Headscale https://headscale.net
[8] juanfont/headscale: An open source, self-hosted ... https://github.com/juanfont/headscale
[9] Tailscale and the OSI model https://tailscale.com/docs/concepts/tailscale-osi
[10] An open source, self-hosted implementation of the ... https://news.ycombinator.com/item?id=43563396
[11] Setting up Tailscale for remote control ((Kali Linux + ... https://www.reddit.com/r/Tailscale/comments/1rj1aus/setting_up_tailscale_for_remote_control_kali/
[12] How doesn't it go through Tailscale's servers? https://www.reddit.com/r/Tailscale/comments/1hv2u8i/how_doesnt_it_go_through_tailscales_servers/
[13] Tailscale vs. Headscale : r/selfhosted https://www.reddit.com/r/selfhosted/comments/1lnnc4e/tailscale_vs_headscale/
[14] tailscale-rs/ARCHITECTURE.md at main https://github.com/tailscale/tailscale-rs/blob/main/ARCHITECTURE.md
[15] Headscale & Tailscale https://www.lucasjanin.com/2025/01/03/headscale-tailscale-in-a-self-hosted-environment/

Claude

I'll research this thoroughly for you, Michael — given the Avalanche pivot and your thought leadership work, I want to make sure I'm pulling from primary sources rather than the recycled hot takes. Let me gather the latest credible analyses.Here's a synthesis of the most credible voices and primary sources on AI-driven labor displacement, organized by perspective. I've leaned on primary research and named thought leaders rather than aggregator commentary.

## The Headline Institutional Forecasts

**World Economic Forum (Future of Jobs Report 2025)** — surveyed over 1,000 employers covering 14M+ workers across 22 industries. Projects 170 million new roles created and 92 million displaced by 2030, a net gain of 78 million but with churn equal to 22% of current jobs. The fastest-declining roles cluster in clerical, secretarial, and routine administrative categories.

**Goldman Sachs (Jan Hatzius, Joseph Briggs, Devesh Kodnani)** — the most-cited macro estimate. Roughly two-thirds of US occupations are exposed to some degree of AI automation, and a quarter to half of the workload of exposed occupations could be replaced, equivalent to about 300 million full-time jobs globally. Highest task-automation exposure: office and administrative support (46%), legal (44%), architecture and engineering (37%), life/physical/social sciences (36%), and business and financial operations (35%). The bottom of the exposure list is physical, hands-on work (cleaning, maintenance, construction).

**McKinsey Global Institute** — the most granular sector analysis. Their generative-AI-accelerated forecast projects fewer jobs in customer service, office support, and food services, with growth concentrated in healthcare, STEM, and managerial roles. Up to 30% of US work hours could be automated by 2030 in their midpoint scenario. Their November 2025 update raised eyebrows by suggesting the most automatable roles — concentrated in legal/administrative services and physically routine work like driving and machine operation — make up roughly 40% of total US jobs, though they stress most evolve rather than disappear.

## The AI-Insider Warning

**Dario Amodei (Anthropic CEO)** — May 2025 *Axios* interview. Amodei warned that AI could eliminate half of entry-level white-collar jobs and push unemployment to 10–20% within one to five years, naming technology, finance, law, and consulting as the most exposed sectors. He frames this as a duty-to-warn obligation given that Anthropic itself is building the capability. He proposes a "token tax" on AI revenues to fund redistribution. The reason this carries weight: Amodei is the rare CEO whose financial incentive runs *opposite* to his public warning.

**Aneesh Raman (LinkedIn, Chief Economic Opportunity Officer)** — has been making a parallel argument that AI is "breaking first" the bottom rung of the career ladder, with LinkedIn data showing collapsing entry-level demand.

## The Empirical Evidence (Most Important for VC Diligence)

This is where the conversation moved in 2025 — from forecasts to measured displacement.

**Erik Brynjolfsson, Bharat Chandar, Ruyu Chen (Stanford Digital Economy Lab)** — "Canaries in the Coal Mine?" (August 2025). Using ADP payroll records covering ~25M US workers, they found early-career workers (ages 22-25) in the most AI-exposed occupations have experienced a 16 percent relative decline in employment since the widespread adoption of generative AI, even after controlling for firm-level shocks. Entry-level software engineering and customer service roles fell by nearly 20% between late 2022 and mid-2025, while employment for older workers in those same roles actually rose. Critically, declines were concentrated where AI automates rather than augments tasks. This is the first large-scale, near-real-time confirmation that the forecasts are starting to land.

**Anthropic Economic Index (Jan 2025–Mar 2026 reports)** — uses anonymized Claude usage data mapped to O*NET tasks. The latest reports show coding remains the dominant use case, and a notable directional shift: directive automation has risen from 27% to 39% of conversations since December 2024, with businesses automating far more than consumers. The April 2026 update flagged customer service representatives as most exposed in the near term, with business sales reps and market researchers next as API workflows in those areas doubled between November 2025 and February 2026. Notably, when adjusted for Claude's task success rate, data entry keyers and radiologists score much higher on AI exposure than raw task coverage suggests, while teachers and software developers score lower.

## The Credible Skeptic

**Daron Acemoglu (MIT, 2024 Nobel laureate in Economics)** — the sharpest pushback on the catastrophist view. In his paper "The Simple Macroeconomics of AI" and in commentary across *Project Syndicate* and MIT Sloan, he argues the AI productivity story is overstated. He projects AI will increase US GDP by 1.1% to 1.6% over the next decade with a roughly 0.05% annual productivity gain — orders of magnitude below Goldman's 7% global GDP claim. His view of which jobs get hit is narrower: a bounded set of office work involving data summary, visual matching, and pattern recognition — roughly 5 percent of the economy. Worth weighing against Goldman/McKinsey if you're sizing a thesis.

## Synthesis — Where Sources Converge

Despite wide disagreement on magnitude, sources converge on the same *categories*:

- **Entry-level white-collar** (the strongest empirical signal — Brynjolfsson's data, Amodei's warning, LinkedIn's data, Anthropic's index all point here)
- **Customer service and support** (highest near-term automation in Anthropic's API data; Goldman/McKinsey both flag)
- **Office and administrative support** (Goldman's #1 exposure category)
- **Legal services**, especially document review and paralegal work (Goldman, McKinsey)
- **Financial operations and analysis** (Goldman, Amodei)
- **Software engineering — junior tier specifically** (Stanford ADP data shows this most starkly; senior engineers actually growing)
- **Routine knowledge work**: data entry, translation, copywriting, basic research, content moderation

For your Avalanche LP narrative, the most defensible framing is probably Brynjolfsson's empirical work paired with the Anthropic Economic Index — actual labor data beats forecasts, and both point to a structural reshaping of the entry-level pipeline rather than a generalized white-collar collapse. That tension (junior eroding, senior holding) has real implications for how Deep Tech founders should think about team composition, which is a thesis hook worth developing.

Want me to pull this into a formatted memo or one-pager for LP/founder distribution?