← Corpus / memopop-orchestrator / other

Improve Investment Memo Output

Improve the quality and depth of investment memos generated by the Investment Memo Orchestrator.

Path
issue-resolution/Improving-Memo-Output.md
Authors
Michael Staton, Tugce Ergul
Augmented with
Claude Code (Sonnet 4.5)
Tags
Workflow · Investment-Analysis · Content-Generation · Venture-Capital · AI-Assisted-Writing

Improving Memo Output: Section Improvement & Key Information Rewrite

Status: Feature #1 Implemented ✅ | Feature #2 Planned Date: 2025-11-20 Last Updated: 2025-11-20 Author: AI Labs Team Related: Multi-Agent-Orchestration-for-Investment-Memo-Generation.md

Implementation Status

Feature #1: Section Improvement with Sonar Pro ✅

Status: COMPLETED (Steps 1-2)

Completed Work:

  • ✅ Step 1: Sonar Pro Integration (commit: 6fbafe5)
    • Replaced Claude with Perplexity Sonar Pro in improve-section.py
    • Added comprehensive citation instructions to prompt
    • Citations now added during improvement (one-step process)
  • ✅ Step 2: Automatic Final Draft Reassembly (commit: 6fbafe5)
    • Implemented reassemble_final_draft() function
    • Automatic reassembly after section improvement
    • Includes header.md (company trademark) and all sections
  • ✅ Testing: Successfully tested on Avalanche Team section
    • 11 citations added automatically
    • Section quality improved significantly
    • Final draft reassembled correctly

Test Results (Avalanche Team Section, 2025-11-20):

Input: output/Avalanche-v0.0.1/2-sections/04-team.md (existing weak section)
Command: python improve-section.py "Avalanche" "Team" --version v0.0.1
API: Perplexity Sonar Pro
Time: ~60 seconds

Results:
  ✓ Citations added: 11
  ✓ Obsidian-style formatting: Correct
  ✓ Section saved: 2-sections/04-team.md
  ✓ Final draft reassembled: 4-final-draft.md
  ✓ Quality improvement: Significant (specific metrics, company names, concrete details)

Pending Work:

  • ⏳ Step 3: Before/After Preview Mode (not yet implemented)
  • 🔄 Step 4: Documentation & Testing (IN PROGRESS)

Documentation:

  • ✅ README.md: Section improvement usage documented
  • 🔄 CLAUDE.md: Update in progress
  • 🔄 This spec: Update in progress

Feature #2: Key Information Rewrite Agent 📋

Status: PLANNED (Steps 5-10)

Next Steps:

  1. Create src/agents/key_info_rewrite.py
  2. Create rewrite-key-info.py CLI script
  3. Test on Avalanche $50M → $10M correction

Executive Summary

This document specifies two complementary features for improving memo quality without regenerating entire memos:

  1. Section Improvement: Enhance individual sections with better research and citations using Perplexity Sonar Pro
  2. Key Information Rewrite: Correct crucial facts that appear across multiple sections (e.g., fund size, dates, names)

Both features leverage the section-by-section architecture introduced in the 2025-11-20 refactor, allowing targeted improvements while preserving the artifact trail.


Problem Statement

Current Limitations

Issue #1: No Targeted Section Improvements

  • When one section is weak, users must regenerate the entire memo
  • Full regeneration is expensive (10 LLM calls + research)
  • Good sections may degrade during regeneration
  • No way to iteratively improve specific sections

Issue #2: No Global Fact Correction

  • Factual errors often appear in multiple sections
  • Example: Avalanche memo states “$50M fund” in 7 different sections, but actual size is “$10M”
  • Manually editing each section is error-prone
  • Citations may reference the wrong information

Requirements

Feature #1 Requirements:

  • Improve a single section without touching others
  • Use Perplexity Sonar Pro for real-time research
  • Add citations automatically during improvement
  • Preserve existing artifact structure
  • Allow reassembly of final draft

Feature #2 Requirements:

  • Identify all sections affected by a correction
  • Apply corrections consistently across sections
  • Preserve citations and formatting
  • Update research data if needed
  • Reassemble final draft automatically

Feature #1: Section Improvement with Sonar Pro

Current Implementation Review

Existing File: improve-section.py (created 2025-11-18)

Current Behavior:

  • Loads artifacts (state, research, other sections)
  • Uses Claude to improve section content
  • Saves to 2-sections/ directory
  • Missing: Citations must be added separately

What Exists:

def improve_section_with_agent(
    section_name: str,
    artifacts: dict,
    artifact_dir: Path,
    console: Console
) -> str:
    """Use agents to improve or create a specific section."""
    # Uses ChatAnthropic (Claude)
    # Does NOT add citations
    # Requires separate citation enrichment step

What’s Needed:

  • Replace Claude with Perplexity Sonar Pro
  • Citations added during improvement (not after)
  • One-step process instead of two-step

Target Architecture

Improved Function:

def improve_section_with_sonar_pro(
    section_name: str,
    artifacts: dict,
    artifact_dir: Path,
    console: Console
) -> str:
    """Use Perplexity Sonar Pro to improve section with citations."""
    from openai import OpenAI

    # Initialize Perplexity client
    perplexity_client = OpenAI(
        api_key=os.getenv("PERPLEXITY_API_KEY"),
        base_url="https://api.perplexity.ai"
    )

    # Build comprehensive improvement prompt
    prompt = build_improvement_prompt(
        section_name=section_name,
        existing_content=artifacts["sections"].get(section_file, ""),
        company_name=artifacts["state"]["company_name"],
        research_data=artifacts["research"],
        other_sections=artifacts["sections"],
        investment_type=artifacts["state"]["investment_type"],
        memo_mode=artifacts["state"]["memo_mode"]
    )

    # Call Sonar Pro with improvement + citation instructions
    response = perplexity_client.chat.completions.create(
        model="sonar-pro",
        messages=[{"role": "user", "content": prompt}]
    )

    improved_content = response.choices[0].message.content

    # Save improved section
    save_section_artifact(artifact_dir, section_num, section_name, improved_content)

    return improved_content

Prompt Design

Sonar Pro Improvement Prompt Structure:

You are improving the '{section_name}' section for an investment memo about {company_name}.

INVESTMENT TYPE: {investment_type.upper()}
MEMO MODE: {memo_mode.upper()} ({'retrospective justification' if justify else 'prospective analysis'})

CURRENT SECTION CONTENT:
{existing_content}

RESEARCH DATA AVAILABLE:
{research_data_json}

CONTEXT FROM OTHER SECTIONS:
{other_sections_summary}

TASK: Significantly improve this section by:
1. Adding specific metrics and data from authoritative sources
2. Removing vague or speculative language ("could potentially", "might be", etc.)
3. Strengthening analysis with concrete evidence
4. Adding inline citations [^1], [^2], [^3] for ALL factual claims
5. Including a comprehensive Citations section at the end

REQUIREMENTS:
- Use Obsidian-style citations: [^1], [^2], etc.
- Place citations AFTER punctuation: "text. [^1]" not "text[^1]."
- Always include ONE SPACE before each citation: "text. [^1] [^2]"
- Use quality sources:
  * Company websites, blogs, press releases
  * TechCrunch, The Information, Sifted, Protocol, Axios
  * Crunchbase, PitchBook (for funding data)
  * SEC filings, investor letters
  * Industry analyst reports (Gartner, CB Insights, McKinsey)
  * Bloomberg, Reuters, WSJ, FT (for news)
- Match the analytical tone of professional VC memos
- Be specific, not promotional or dismissive
- For {memo_mode} mode: {'justify the investment decision' if justify else 'objectively assess'}

CITATION FORMAT:
[^1]: YYYY, MMM DD. [Source Title](https://full-url-here.com). Publisher Name. Published: YYYY-MM-DD | Updated: YYYY-MM-DD

IMPROVED SECTION CONTENT:

Key Differences from Citation Enrichment Agent:

  • Citation Enrichment: Preserves narrative, only adds citations
  • Section Improvement: Rewrites for quality AND adds citations
  • Both use same citation format (Obsidian-style)

CLI Interface

Usage:

# Activate venv first (recommended)
source .venv/bin/activate

# Basic usage: improve section
python improve-section.py "Avalanche" "Team"

# Specify version
python improve-section.py "Avalanche" "Team" --version v0.0.1

# With final draft reassembly
python improve-section.py "Avalanche" "Team" --rebuild-final

# Direct path to artifact directory
python improve-section.py output/Avalanche-v0.0.1 "Market Context"

New Flags:

  • --rebuild-final: Reassemble 4-final-draft.md after improvement
  • --preview: Show before/after comparison without saving

Output:

✓ Loading artifacts from: output/Avalanche-v0.0.1/
✓ Loaded state.json
✓ Loaded research data
✓ Loaded 10 existing sections

🔧 Improving section: Team
  Using Perplexity Sonar Pro for real-time research...

✓ Section improved with 8 new citations added
✓ Saved to: output/Avalanche-v0.0.1/2-sections/04-team.md

📊 Changes Summary:
  - Original length: 850 words
  - Improved length: 1,200 words
  - Citations added: 8
  - Vague claims removed: 5
  - Specific metrics added: 12

✓ Reassembled final draft: 4-final-draft.md

Next steps:
  1. Review improved section in: output/Avalanche-v0.0.1/2-sections/
  2. Export to HTML: python export-branded.py output/Avalanche-v0.0.1/4-final-draft.md

Implementation Steps

Step 1: Update improve-section.py for Sonar Pro

Files Modified:

  • improve-section.py

Changes:

  1. Replace improve_section_with_agent() with improve_section_with_sonar_pro()
  2. Import OpenAI client for Perplexity
  3. Update prompt to include citation instructions
  4. Test with PERPLEXITY_API_KEY

Testing:

# Test on weak section
python improve-section.py "Avalanche" "Team" --version v0.0.1

# Verify:
# - Section has inline citations [^1], [^2]
# - Citations section at end with URLs
# - Content quality improved
# - Vague language removed

Step 2: Add Reassembly Feature

Changes:

  1. Add --rebuild-final flag
  2. Implement reassemble_final_draft() function:
    • Load header.md if exists
    • Load all sections from 2-sections/ in order
    • Concatenate with proper spacing
    • Save as 4-final-draft.md

Code:

def reassemble_final_draft(artifact_dir: Path, console: Console):
    """Reassemble 4-final-draft.md from section files."""
    console.print("\n[bold]Reassembling final draft...[/bold]")

    # Load header if exists
    header_file = artifact_dir / "header.md"
    if header_file.exists():
        with open(header_file) as f:
            content = f.read() + "\n"
    else:
        content = ""

    # Load sections in order
    sections_dir = artifact_dir / "2-sections"
    section_files = sorted(sections_dir.glob("*.md"))

    for section_file in section_files:
        with open(section_file) as f:
            content += f.read() + "\n\n"

    # Save final draft
    final_draft = artifact_dir / "4-final-draft.md"
    with open(final_draft, "w") as f:
        f.write(content.strip())

    console.print(f"[green]✓ Final draft reassembled:[/green] {final_draft}")

Step 3: Add Before/After Comparison

Changes:

  1. Add --preview flag
  2. Show diff before saving
  3. Require confirmation

Output Example:

📊 Section Improvement Preview:

BEFORE (850 words):
  "The team has extensive experience in the industry..."

AFTER (1,200 words):
  "The founding team brings 40+ years of combined experience. [^1]

   CEO Jane Doe previously scaled Acme Corp from $5M to $150M ARR
   over 6 years (2015-2021). [^2] CTO John Smith led engineering at..."

Changes:
  ✓ Removed 5 vague claims
  ✓ Added 12 specific metrics
  ✓ Added 8 citations
  ✓ Increased depth by 41%

Save improved section? [y/N]:

Step 4: Error Handling & Edge Cases

Handle:

  • Missing PERPLEXITY_API_KEY
  • Invalid section names
  • Missing artifact directories
  • Network errors during API calls
  • Malformed citations in response

Code:

def validate_environment():
    """Check required environment variables."""
    if not os.getenv("PERPLEXITY_API_KEY"):
        console.print("[red]Error: PERPLEXITY_API_KEY not set[/red]")
        console.print("[yellow]Set it in .env file or export it[/yellow]")
        sys.exit(1)

def validate_section_name(section_name: str) -> bool:
    """Validate section name against known sections."""
    if section_name not in SECTION_MAP:
        console.print(f"[red]Error: Unknown section '{section_name}'[/red]")
        console.print("\n[yellow]Available sections:[/yellow]")
        for name in sorted(SECTION_MAP.keys()):
            console.print(f"  • {name}")
        return False
    return True

Step 5: Documentation & Testing

Update Files:

  • CLAUDE.md: Add Section Improvement section
  • README.md: Add to “Remaining Enhancements” → “Completed”
  • Create examples in docs/EXAMPLES.md

Test Cases:

  1. ✅ Improve existing weak section
  2. ✅ Create missing section from scratch
  3. ✅ Handle section with existing citations (preserve them)
  4. ✅ Error: Invalid section name
  5. ✅ Error: Missing artifacts
  6. ✅ Reassemble final draft after improvement

Feature #2: Key Information Rewrite Agent

Use Cases

Scenario 1: Fund Size Correction

  • Error: Memo states “$50M fund” in 7 sections
  • Correction: Actual size is “$10M”
  • Impact: Affects deployment strategy, check sizes, portfolio construction, economics

Scenario 2: Person Title Correction

  • Error: “Katelyn Donnelly, Partner at Avalanche”
  • Correction: “Katelyn Donnelly, Managing Partner and Founder at Avalanche”
  • Impact: Affects GP Background, Track Record, decision-making authority

Scenario 3: Date Correction

  • Error: “Company founded in 2020”
  • Correction: “Company founded in 2019”
  • Impact: Affects traction timeline, milestones, growth metrics

Scenario 4: Investment Stage Correction

  • Error: “Series B company”
  • Correction: “Series A company”
  • Impact: Affects valuation expectations, metrics benchmarks, competitive positioning

YAML-Based Correction System

Rationale

Why YAML over CLI flags?

The original design used simple CLI corrections: --correction "Fund size is $10M, not $50M"

Problems with CLI approach:

  • Can only handle one correction at a time
  • No way to provide source verification
  • Cannot distinguish between inaccurate, incomplete, and narrative guidance
  • Difficult to track/audit corrections
  • Not reusable across memo versions

YAML Template Benefits:

  • Structured corrections: Explicit categories (inaccurate vs incomplete vs narrative)
  • Batch corrections: Multiple corrections in one file
  • Source references: Authoritative sources for verification
  • Auditable: Corrections file becomes part of project history
  • Reusable: Save correction templates for common issues
  • Version control: Track changes to correction guidance over time
  • Rich guidance: Narrative shaping comments guide tone and framing

YAML Template Structure

Template Location: data/{CompanyName}-corrections.yaml

Schema:

# Correction template for investment memo improvements
company: "Avalanche"

# VERSION MANAGEMENT
source_version: "v0.0.3"  # Which version to correct (required, can be path or version tag)
# source_version: "output/Avalanche-v0.0.3"  # Alternative: full path
output_mode: "new_version"  # "new_version" or "in_place"
# output_mode: "in_place"  # Overwrites source version artifacts

date_created: "2025-11-20"

corrections:
  # Correction Object 1: Inaccurate Information
  - type: "inaccurate"
    inaccurate_information: |
      The memo states that Avalanche VC Fund II is raising $50M, appearing
      in multiple sections (Fund Strategy, Economics, Portfolio Construction).
    correct_information: |
      Avalanche VC Fund II is raising $10M, not $50M. The fund targets
      $10M with a hard cap at $12M.
    affected_sections:
      - "Fund Strategy & Thesis"
      - "Portfolio Construction"
      - "Fee Structure & Economics"
      - "Executive Summary"
    sources:
      - "https://avalanche.vc/fund-ii"
      - "data/Avalanche-v0.0.1/0-deck-analysis.json"
    narrative_shaping_comments:
      - "Emphasize that the $10M fund size is intentional for boutique, high-touch approach"
      - "Connect fund size to check size strategy ($250K-$500K initial)"
      - "Frame smaller fund as competitive advantage for emerging EdTech companies"

  # Correction Object 2: Incomplete Information
  - type: "incomplete"
    incomplete_information: |
      The Team section mentions Katelyn Donnelly but doesn't specify her
      previous role at Pearson Ventures or the fund's performance metrics.
    additional_information: |
      Katelyn Donnelly was Managing Director at Pearson Ventures, where she
      oversaw a $65M fund that delivered an 18% IRR. She's also a Kauffman
      Fellow (Class 21) and was featured on Forbes 30 Under 30 in 2014.
    affected_sections:
      - "GP Background & Track Record"
      - "Executive Summary"
    sources:
      - "https://www.linkedin.com/in/katelyndonnelly/"
      - "https://avalanche.vc/team"
    narrative_shaping_comments:
      - "Highlight the 18% IRR as significantly above industry average"
      - "Connect Pearson Ventures experience to EdTech sector expertise"
      - "Emphasize operational experience (co-founded Delivery Associates, $40M revenue)"

  # Correction Object 3: Narrative Shaping Only
  - type: "narrative"
    section: "Investment Thesis"
    narrative_shaping_comments:
      - "Reduce promotional language about 'revolutionary' and 'game-changing'"
      - "Add more balanced risk discussion alongside opportunity"
      - "Include specific competitive comparisons (Reach Capital, Learn Capital)"
      - "Quantify claims wherever possible (e.g., 'market leader' → 'top 3 in sector')"
    sources:
      - "https://www.crunchbase.com/organization/reach-capital"
      - "https://www.crunchbase.com/organization/learn-capital"

  # Correction Object 4: Multiple Facts + Narrative
  - type: "mixed"
    inaccurate_information: "Portfolio construction assumes 25 investments"
    correct_information: "Portfolio will include 15-20 core investments, not 25"
    incomplete_information: "No mention of reserve strategy for follow-on rounds"
    additional_information: |
      The fund reserves 50% of capital for follow-on investments in top performers.
      Average initial check: $400K. Reserve per company: $300-500K.
    affected_sections:
      - "Portfolio Construction"
      - "Fund Strategy & Thesis"
    sources:
      - "data/Avalanche-deck.pdf"
    narrative_shaping_comments:
      - "Frame reserve strategy as deliberate capital deployment discipline"
      - "Compare concentration to industry norms (seed funds typically 30-40 companies)"
      - "Connect to ownership targets (8-12% initial, 10-15% after reserves)"

Version Management Options

Critical Design Decision: Should corrections modify the existing version or create a new version?

Option 1: output_mode: "new_version" (Recommended for most cases)

Creates a new version directory with corrected content, preserving the original.

How it works:

source_version: "v0.0.3"
output_mode: "new_version"

Behavior:

  1. Reads all artifacts from output/Avalanche-v0.0.3/
  2. Applies corrections to sections
  3. Creates output/Avalanche-v0.0.4/ with:
    • Corrected section files (2-sections/)
    • Updated final draft (4-final-draft.md)
    • Copied artifacts from v0.0.3 (state.json, research, validation)
    • New corrections-log.json documenting changes
    • Updated state.json with correction metadata
  4. Increments version: v0.0.3 → v0.0.4
  5. Updates output/versions.json

Use when:

  • You want to preserve the original memo for comparison
  • Corrections might substantially change the narrative/recommendation
  • You want an audit trail of what changed between versions
  • Multiple stakeholders reviewing different versions
  • Experimenting with different correction approaches

Example: Fund size correction ($50M → $10M)

  • Creates v0.0.4 with corrected fund size
  • Original v0.0.3 remains unchanged
  • Can export both versions to compare side-by-side
  • If correction is wrong, v0.0.3 is still available

Option 2: output_mode: "in_place" (Use with caution)

Overwrites the source version’s artifacts directly. Original content is lost.

How it works:

source_version: "v0.0.3"
output_mode: "in_place"

Behavior:

  1. Reads all artifacts from output/Avalanche-v0.0.3/
  2. Applies corrections to sections
  3. Overwrites files in output/Avalanche-v0.0.3/:
    • Replaces section files in 2-sections/
    • Replaces 4-final-draft.md
    • Adds corrections-log.json
    • Updates state.json with correction metadata
  4. Version tag remains v0.0.3
  5. No new version created

Use when:

  • Minor corrections (typos, small factual updates)
  • You don’t need to preserve the original
  • Disk space is limited
  • Corrections are unambiguously correct
  • Internal draft that hasn’t been shared

Example: Typo correction (“Managign Partner” → “Managing Partner”)

  • Fixes typo directly in v0.0.3
  • No need to create v0.0.4 for a typo
  • Original v0.0.3 is overwritten

⚠️ Warning: This is destructive. Use --preview flag first to verify changes.


Version Specification Options

Option A: Version Tag (Recommended)

source_version: "v0.0.3"
  • Agent resolves to output/Avalanche-v0.0.3/
  • Validates version exists
  • Works with version history

Option B: Full Path

source_version: "output/Avalanche-v0.0.3"
  • Direct path to artifact directory
  • Useful if directory isn’t in standard location
  • Bypasses version resolution

Option C: Latest (via CLI, not YAML)

# Use --source-version latest flag
python rewrite-key-info.py "Avalanche" \
  --corrections data/Avalanche-corrections.yaml \
  --source-version latest
  • Agent finds latest version automatically
  • Useful for quick iterations

Version Comparison & Audit Trail

With new_version mode, the system creates a comparison log:

File: output/Avalanche-v0.0.4/corrections-log.json

{
  "source_version": "v0.0.3",
  "output_version": "v0.0.4",
  "output_mode": "new_version",
  "corrections_applied": 4,
  "sections_modified": 7,
  "timestamp": "2025-11-20T15:30:00Z",
  "corrections_file": "data/Avalanche-corrections.yaml",
  "changes": [
    {
      "correction_type": "inaccurate",
      "sections_affected": ["Fund Strategy & Thesis", "Portfolio Construction", "Fee Structure & Economics", "Executive Summary"],
      "instances_corrected": 11,
      "summary": "Corrected fund size from $50M to $10M"
    },
    {
      "correction_type": "incomplete",
      "sections_affected": ["GP Background & Track Record", "Executive Summary"],
      "facts_added": 5,
      "summary": "Added Katelyn's Pearson Ventures track record (18% IRR)"
    },
    {
      "correction_type": "narrative",
      "sections_affected": ["Investment Thesis"],
      "summary": "Reduced promotional language, added competitive comparisons"
    },
    {
      "correction_type": "mixed",
      "sections_affected": ["Portfolio Construction", "Fund Strategy & Thesis"],
      "instances_corrected": 6,
      "facts_added": 3,
      "summary": "Corrected portfolio size and added reserve strategy"
    }
  ],
  "narrative_impact": "Substantial - fund size correction changes check size strategy, portfolio construction, and economics sections. May affect overall recommendation.",
  "recommendation_changed": false,
  "recommendation_note": "Recommendation remains COMMIT but with updated rationale reflecting smaller fund size"
}

Comparison Command (Future Enhancement):

# Compare two versions
python compare-versions.py Avalanche v0.0.3 v0.0.4

# Output:
# Differences between v0.0.3 and v0.0.4:
#   7 sections modified
#   11 factual corrections
#   3 narrative improvements
#   Recommendation: COMMIT (unchanged)
#   Key changes:
#     - Fund size: $50M → $10M
#     - Portfolio: 25 → 15-20 investments
#     - Added reserve strategy details

Impact on Research Data

Important: Corrections do NOT re-run research or regenerate from scratch.

What happens to research artifacts:

New Version Mode:

  • 1-research.json and 1-research.md are copied from source version
  • Optionally updated if correction fundamentally conflicts (see “Research Conflicts” below)
  • 0-deck-analysis.json is copied unchanged

In-Place Mode:

  • Research artifacts remain unchanged
  • Only section files and final draft are modified

Research Conflicts (Optional --update-research flag):

If correction contradicts research data, the agent can optionally update research:

# In corrections YAML
corrections:
  - type: "inaccurate"
    inaccurate_information: "Fund size $50M"
    correct_information: "Fund size $10M"
    update_research: true  # Optional: update research artifacts

Behavior with update_research: true:

  1. Agent detects conflict: research mentions “$50M” but correction says “$10M”
  2. Updates 1-research.json to reflect $10M
  3. Updates 1-research.md narrative
  4. Logs research update in corrections-log.json

Default behavior (update_research: false or omitted):

  • Research artifacts unchanged
  • Only sections and final draft corrected
  • Potential discrepancy logged in corrections-log.json

Why this matters: If research says “$50M” but memo says “$10M”, future regenerations might reintroduce the error. Updating research ensures consistency.


Correction Types

Type 1: inaccurate - Factual errors that must be corrected

  • Required fields: inaccurate_information, correct_information, affected_sections
  • Optional: sources, narrative_shaping_comments

Type 2: incomplete - Missing information that should be added

  • Required fields: incomplete_information, additional_information, affected_sections
  • Optional: sources, narrative_shaping_comments

Type 3: narrative - Tone/framing improvements without factual changes

  • Required fields: section, narrative_shaping_comments
  • Optional: sources (for competitive research, benchmarking)

Type 4: mixed - Combination of inaccurate + incomplete

  • Required fields: All of the above
  • Most comprehensive correction type

Workflow with YAML Corrections

Step 1: User Creates Correction File

# Copy template
cp templates/corrections-template.yaml data/Avalanche-corrections.yaml

# Edit with corrections
# User fills in specific corrections based on feedback

Step 2: Agent Parses YAML

def load_corrections_yaml(corrections_file: Path) -> List[CorrectionObject]:
    """Load and validate corrections YAML file."""
    with open(corrections_file) as f:
        data = yaml.safe_load(f)

    # Validate schema
    validate_corrections_schema(data)

    # Parse into CorrectionObject list
    corrections = []
    for corr in data['corrections']:
        corrections.append(CorrectionObject(
            type=corr['type'],
            inaccurate_info=corr.get('inaccurate_information'),
            correct_info=corr.get('correct_information'),
            incomplete_info=corr.get('incomplete_information'),
            additional_info=corr.get('additional_information'),
            affected_sections=corr.get('affected_sections', []),
            sources=corr.get('sources', []),
            narrative_comments=corr.get('narrative_shaping_comments', [])
        ))

    return corrections

Step 3: Source Verification (Optional)

def verify_corrections_with_sources(
    corrections: List[CorrectionObject],
    use_sonar_pro: bool = True
) -> List[VerificationResult]:
    """
    Use Perplexity Sonar Pro to verify corrections against provided sources.

    For each correction with sources:
    1. Fetch source content (if URL)
    2. Use Sonar Pro to verify correctness
    3. Return confidence score + evidence
    """

    if not use_sonar_pro:
        return [VerificationResult(verified=True, confidence=1.0)]

    results = []
    for correction in corrections:
        if not correction.sources:
            results.append(VerificationResult(verified=True, confidence=0.8,
                note="No sources provided, assuming user is correct"))
            continue

        # Build verification prompt
        prompt = f"""Verify this correction using the provided sources:

CLAIMED INACCURATE INFO: {correction.inaccurate_info}
CLAIMED CORRECT INFO: {correction.correct_info}

SOURCES TO VERIFY:
{chr(10).join(correction.sources)}

TASK:
1. Check if the correction is accurate according to sources
2. Return confidence score (0.0-1.0)
3. Provide evidence from sources

Return JSON:
{{
    "verified": true/false,
    "confidence": 0.95,
    "evidence": "Quote or summary from sources",
    "concerns": "Any potential issues"
}}
"""

        # Call Sonar Pro
        response = perplexity_client.chat.completions.create(
            model="sonar-pro",
            messages=[{"role": "user", "content": prompt}]
        )

        result = parse_verification_result(response)
        results.append(result)

    return results

Step 4: Apply Corrections Section-by-Section

def apply_correction_to_section(
    section_file: Path,
    correction: CorrectionObject,
    company_name: str
) -> str:
    """Apply single correction to section with narrative guidance."""

    with open(section_file) as f:
        original_content = f.read()

    # Build correction prompt with narrative guidance
    correction_prompt = f"""You are correcting an investment memo section for {company_name}.

CORRECTION TYPE: {correction.type}

{"INACCURATE INFORMATION: " + correction.inaccurate_info if correction.inaccurate_info else ""}
{"CORRECT INFORMATION: " + correction.correct_info if correction.correct_info else ""}
{"INCOMPLETE - MISSING: " + correction.incomplete_info if correction.incomplete_info else ""}
{"ADDITIONAL INFORMATION: " + correction.additional_info if correction.additional_info else ""}

NARRATIVE SHAPING GUIDANCE:
{chr(10).join(f"• {comment}" for comment in correction.narrative_comments)}

SOURCES FOR REFERENCE:
{chr(10).join(correction.sources)}

CURRENT SECTION CONTENT:
{original_content}

TASK:
1. Apply factual corrections (inaccurate → correct)
2. Add missing information (incomplete → additional)
3. Follow narrative shaping guidance for tone and framing
4. Preserve ALL existing citations
5. Add NEW citations for newly added facts (use sources provided)
6. Maintain formatting and structure

Return ONLY the corrected section content with citations.
"""

    # Call Claude for correction
    response = anthropic_client.invoke(correction_prompt)
    corrected_content = response.content

    return corrected_content

Step 5: CLI Usage

# Basic usage: apply corrections from YAML
# (source_version and output_mode specified in YAML)
python rewrite-key-info.py --corrections data/Avalanche-corrections.yaml

# With source verification (uses Sonar Pro to verify corrections)
python rewrite-key-info.py \
  --corrections data/Avalanche-corrections.yaml \
  --verify-sources

# Preview mode (show what would change without saving)
python rewrite-key-info.py \
  --corrections data/Avalanche-corrections.yaml \
  --preview

# Override YAML output mode (force in-place even if YAML says new_version)
python rewrite-key-info.py \
  --corrections data/Avalanche-corrections.yaml \
  --output-mode in_place

# Override source version (use latest instead of YAML-specified version)
python rewrite-key-info.py \
  --corrections data/Avalanche-corrections.yaml \
  --source-version latest

# Direct path to artifact directory (bypasses company resolution)
python rewrite-key-info.py \
  --corrections data/Avalanche-corrections.yaml \
  --source-path output/Avalanche-v0.0.3

CLI Flag Priority:

  1. CLI flags override YAML settings
  2. YAML settings override defaults
  3. Defaults: output_mode: "new_version", source_version: "latest"

Step 6: Output

Example 1: New Version Mode

📋 Loaded corrections: data/Avalanche-corrections.yaml
  Company: Avalanche
  Source version: v0.0.3
  Output mode: new_version → v0.0.4
  Corrections: 4

🔍 Verifying corrections with sources...
  ✓ Correction 1: Verified (confidence: 0.95) - Fund size $10M confirmed
  ✓ Correction 2: Verified (confidence: 0.92) - Katelyn's track record confirmed
  ✓ Correction 3: No verification needed (narrative only)
  ✓ Correction 4: Verified (confidence: 0.88) - Portfolio construction confirmed

📝 Applying corrections...
  Correction 1 (inaccurate):
    ✓ Fund Strategy & Thesis (3 instances corrected)
    ✓ Portfolio Construction (2 instances corrected)
    ✓ Fee Structure & Economics (4 instances corrected)
    ✓ Executive Summary (2 instances corrected)

  Correction 2 (incomplete):
    ✓ GP Background & Track Record (added Pearson metrics)
    ✓ Executive Summary (added track record summary)

  Correction 3 (narrative):
    ✓ Investment Thesis (toned down promotional language, added comparisons)

  Correction 4 (mixed):
    ✓ Portfolio Construction (corrected count, added reserve strategy)
    ✓ Fund Strategy & Thesis (added reserve discussion)

📦 Creating new version: v0.0.4
  ✓ Copied artifacts from v0.0.3
  ✓ Applied corrections to 7 sections
  ✓ Updated state.json with correction metadata
  ✓ Created corrections-log.json

✅ Reassembled final draft: output/Avalanche-v0.0.4/4-final-draft.md

📊 Correction Summary:
  Source version: v0.0.3
  Output version: v0.0.4 (NEW)
  Total corrections: 4
  Sections modified: 7/10
  Instances corrected: 15
  Citations added: 8
  Narrative improvements: 1 section

📝 Correction log saved: output/Avalanche-v0.0.4/corrections-log.json

Next steps:
  1. Review corrections: output/Avalanche-v0.0.4/2-sections/
  2. View final draft: output/Avalanche-v0.0.4/4-final-draft.md
  3. Compare versions: diff output/Avalanche-v0.0.3/4-final-draft.md output/Avalanche-v0.0.4/4-final-draft.md
  4. Export to HTML: python export-branded.py output/Avalanche-v0.0.4/4-final-draft.md

Example 2: In-Place Mode

📋 Loaded corrections: data/Avalanche-corrections.yaml
  Company: Avalanche
  Source version: v0.0.3
  Output mode: in_place (⚠️ will overwrite v0.0.3)
  Corrections: 1

⚠️  WARNING: In-place mode will overwrite existing artifacts.
    Use --preview to see changes before applying.
    Original content will be lost. Continue? [y/N]: y

📝 Applying corrections...
  Correction 1 (inaccurate):
    ✓ GP Background & Track Record (1 instance corrected)

✅ Updated final draft: output/Avalanche-v0.0.3/4-final-draft.md

📊 Correction Summary:
  Version: v0.0.3 (MODIFIED IN-PLACE)
  Total corrections: 1
  Sections modified: 1/10
  Instances corrected: 1

📝 Correction log saved: output/Avalanche-v0.0.3/corrections-log.json

Next steps:
  1. Review corrections: output/Avalanche-v0.0.3/2-sections/
  2. View final draft: output/Avalanche-v0.0.3/4-final-draft.md
  3. Export to HTML: python export-branded.py output/Avalanche-v0.0.3/4-final-draft.md

Benefits of YAML Approach

1. Comprehensive Corrections

  • Single file can fix multiple issues across entire memo
  • Supports fact corrections, additions, and narrative guidance
  • Clear categorization of correction types

2. Source Integration

  • Reference authoritative sources for verification
  • Automatically verify corrections with Sonar Pro
  • Add citations to newly added facts

3. Narrative Control

  • Shape tone and framing with explicit guidance
  • Not just facts—control how facts are presented
  • Maintain analytical rigor vs promotional tone

4. Audit Trail

  • Correction YAML files tracked in version control
  • corrections-log.json records what was changed
  • Easy to understand what was corrected and why

5. Reusability

  • Save correction templates for common issues
  • Apply same corrections to multiple memo versions
  • Share correction patterns across projects

6. Batch Efficiency

  • Fix 10+ issues in one run
  • Fewer API calls than iterative corrections
  • Consistent application across all sections

Architecture Design

New Agent: src/agents/key_info_rewrite.py

Agent Function:

def key_information_rewrite_agent(state: MemoState) -> dict:
    """
    Correct crucial information that affects multiple sections.

    Args:
        state: Must contain:
            - correction_instruction: str
              Example: "The fund size is $10M, not $50M"
            - company_name: str
            - latest_output_dir: Path (optional, auto-detected if not provided)

    Process:
        1. Load final draft from latest version
        2. Analyze correction to identify affected sections
        3. For each affected section:
            a. Load section file from 2-sections/
            b. Apply correction via LLM
            c. Preserve citations and formatting
            d. Save corrected section
        4. Reassemble final draft
        5. Update metadata

    Returns:
        {
            "sections_corrected": int,
            "instances_found": int,
            "files_updated": List[str],
            "messages": List[str]
        }
    """

Correction Analysis Algorithm

Phase 1: Parse Correction Instruction

def analyze_correction(instruction: str, company_name: str) -> CorrectionAnalysis:
    """
    Use LLM to understand correction and identify search terms.

    Returns:
        CorrectionAnalysis:
            - incorrect_info: str ("$50M")
            - correct_info: str ("$10M")
            - semantic_variations: List[str] (["fifty million", "Fund II size", "10M fund"])
            - affected_section_types: List[str] (["Fund Strategy", "Economics", "Portfolio"])
    """

    analysis_prompt = f"""Analyze this correction instruction for {company_name}:

INSTRUCTION: {instruction}

TASK: Extract structured information:
1. What information is INCORRECT?
2. What is the CORRECT information?
3. What semantic variations might appear? (paraphrases, related concepts)
4. Which section types are likely affected?

Return JSON:
{{
    "incorrect_info": "exact text",
    "correct_info": "exact text",
    "semantic_variations": ["variant1", "variant2"],
    "affected_section_types": ["section name 1", "section name 2"]
}}
"""

    # Call Claude for analysis
    response = anthropic_client.invoke(analysis_prompt)
    return CorrectionAnalysis.parse(response.content)

Phase 2: Identify Affected Sections

def identify_affected_sections(
    correction_analysis: CorrectionAnalysis,
    artifact_dir: Path
) -> List[SectionInfo]:
    """
    Scan all section files to find which ones contain the error.

    Returns:
        List of SectionInfo:
            - section_name: str
            - section_file: Path
            - instances_found: int
            - sample_text: str (preview of error)
    """

    affected_sections = []
    sections_dir = artifact_dir / "2-sections"

    for section_file in sections_dir.glob("*.md"):
        with open(section_file) as f:
            content = f.read()

        # Check for exact match
        exact_count = content.count(correction_analysis.incorrect_info)

        # Check for semantic variations
        variation_count = 0
        for variation in correction_analysis.semantic_variations:
            variation_count += content.lower().count(variation.lower())

        total_instances = exact_count + variation_count

        if total_instances > 0:
            affected_sections.append(SectionInfo(
                section_name=extract_section_name(section_file),
                section_file=section_file,
                instances_found=total_instances,
                sample_text=extract_sample(content, correction_analysis.incorrect_info)
            ))

    return affected_sections

Phase 3: Apply Correction to Each Section

def correct_section(
    section_file: Path,
    correction_analysis: CorrectionAnalysis,
    other_sections_context: str,
    company_name: str
) -> str:
    """
    Use LLM to apply correction while preserving formatting and citations.
    """

    with open(section_file) as f:
        original_content = f.read()

    correction_prompt = f"""You are correcting a factual error in an investment memo section.

COMPANY: {company_name}

CORRECTION REQUIRED:
  Incorrect: {correction_analysis.incorrect_info}
  Correct: {correction_analysis.correct_info}

CONTEXT FROM OTHER SECTIONS:
{other_sections_context}

CURRENT SECTION CONTENT:
{original_content}

TASK:
1. Find ALL instances of the incorrect information (including paraphrases)
2. Replace with the correct information
3. Ensure consistency throughout the section
4. Update any dependent claims (e.g., if fund size changes, check sizes may change)
5. Preserve ALL citations - do not remove or modify them
6. Preserve all formatting (headers, lists, emphasis)
7. Do NOT change other content unrelated to the correction

CRITICAL:
- If a claim becomes unsupported after correction, flag it with [NEEDS CITATION]
- Maintain the analytical tone and depth
- Return ONLY the corrected section content

CORRECTED SECTION:
"""

    # Call Claude
    response = anthropic_client.invoke(correction_prompt)
    corrected_content = response.content

    # Save corrected section
    with open(section_file, "w") as f:
        f.write(corrected_content)

    return corrected_content

Phase 4: Reassemble Final Draft

def reassemble_after_correction(artifact_dir: Path) -> Path:
    """Reassemble 4-final-draft.md after corrections."""

    # Same logic as Feature #1 reassembly
    content = ""

    # Load header
    header_file = artifact_dir / "header.md"
    if header_file.exists():
        with open(header_file) as f:
            content = f.read() + "\n"

    # Load all sections in order
    sections_dir = artifact_dir / "2-sections"
    for section_file in sorted(sections_dir.glob("*.md")):
        with open(section_file) as f:
            content += f.read() + "\n\n"

    # Save final draft
    final_draft = artifact_dir / "4-final-draft.md"
    with open(final_draft, "w") as f:
        f.write(content.strip())

    return final_draft

CLI Interface

Standalone Script: rewrite-key-info.py

Usage:

# Activate venv first
source .venv/bin/activate

# Basic correction
python rewrite-key-info.py "Avalanche" \
  --correction "The fund size is $10M, not $50M"

# Specify version
python rewrite-key-info.py "Avalanche" \
  --correction "Katelyn Donnelly is Managing Partner, not Partner" \
  --version v0.0.1

# Direct path
python rewrite-key-info.py output/Avalanche-v0.0.1 \
  --correction "Company founded in 2019, not 2020"

# Preview mode (don't save)
python rewrite-key-info.py "Avalanche" \
  --correction "Series A, not Series B" \
  --preview

# Update research data too (deep mode)
python rewrite-key-info.py "Avalanche" \
  --correction "Fund size is $10M" \
  --update-research

Output Example:

🔍 Analyzing correction...
  Incorrect: "$50M"
  Correct: "$10M"
  Semantic variations: "fifty million", "Fund II target", "target size"

🔎 Scanning sections...
  ✓ Found errors in 7/10 sections:
    • Fund Strategy & Thesis (3 instances)
    • Portfolio Construction (2 instances)
    • Fee Structure & Economics (4 instances)
    • Value Add & Differentiation (1 instance)
    • Track Record Analysis (2 instances)
    • Risks & Mitigations (1 instance)
    • Executive Summary (2 instances)

📝 Applying corrections...
  ✓ Corrected: Fund Strategy & Thesis
  ✓ Corrected: Portfolio Construction
  ✓ Corrected: Fee Structure & Economics
  ✓ Corrected: Value Add & Differentiation
  ✓ Corrected: Track Record Analysis
  ✓ Corrected: Risks & Mitigations
  ✓ Corrected: Executive Summary

✅ Reassembled final draft

📊 Correction Summary:
  Sections modified: 7/10
  Total instances corrected: 15
  Files updated:
    • 2-sections/03-fund-strategy--thesis.md
    • 2-sections/04-portfolio-construction.md
    • 2-sections/07-fee-structure--economics.md
    • 2-sections/05-value-add--differentiation.md
    • 2-sections/06-track-record-analysis.md
    • 2-sections/08-risks--mitigations.md
    • 2-sections/01-executive-summary.md
    • 4-final-draft.md

Next steps:
  1. Review corrections in: output/Avalanche-v0.0.1/
  2. Export to HTML: python export-branded.py output/Avalanche-v0.0.1/4-final-draft.md
  3. Create new version: python -m src.main "Avalanche" --version-only

Implementation Steps (YAML-Based)

Step 0: Create YAML Template

New File: templates/corrections-template.yaml

Content:

# Investment Memo Correction Template
# Copy to data/{CompanyName}-corrections.yaml and fill in corrections

company: "CompanyName"

# VERSION MANAGEMENT (required)
source_version: "v0.0.3"  # Which version to use as source
# Alternatives:
#   source_version: "latest"  # Use latest version
#   source_version: "output/CompanyName-v0.0.3"  # Full path

output_mode: "new_version"  # "new_version" or "in_place"
# new_version: Creates v0.0.4 from v0.0.3 (preserves original)
# in_place: Overwrites v0.0.3 directly (DESTRUCTIVE - use with caution)

date_created: "YYYY-MM-DD"

corrections:
  # Example 1: Inaccurate information
  - type: "inaccurate"
    inaccurate_information: |
      Describe what's incorrect in the memo
    correct_information: |
      Provide the correct information
    affected_sections:
      - "Section Name 1"
      - "Section Name 2"
    sources:
      - "https://source-url.com"
      - "data/document.pdf"
    narrative_shaping_comments:
      - "Guidance on how to frame this correction"
      - "Additional context or emphasis"

  # Example 2: Incomplete information
  - type: "incomplete"
    incomplete_information: |
      Describe what's missing
    additional_information: |
      Provide the missing information
    affected_sections:
      - "Section Name"
    sources:
      - "https://source-url.com"
    narrative_shaping_comments:
      - "How to integrate this information"

  # Example 3: Narrative shaping only
  - type: "narrative"
    section: "Section Name"
    narrative_shaping_comments:
      - "Remove promotional language"
      - "Add balanced risk discussion"
      - "Quantify vague claims"
    sources:
      - "https://competitor-comparison.com"

  # Example 4: Mixed correction
  - type: "mixed"
    inaccurate_information: "What's wrong"
    correct_information: "What's correct"
    incomplete_information: "What's missing"
    additional_information: "What to add"
    affected_sections:
      - "Section 1"
      - "Section 2"
    sources:
      - "https://source.com"
    narrative_shaping_comments:
      - "How to present this holistically"

Step 1: Create YAML Parser & Schema

New File: src/corrections.py

Implement:

from dataclasses import dataclass
from typing import List, Optional
from pathlib import Path
import yaml

@dataclass
class CorrectionObject:
    """Represents a single correction from YAML."""
    type: str  # "inaccurate", "incomplete", "narrative", "mixed"
    inaccurate_info: Optional[str] = None
    correct_info: Optional[str] = None
    incomplete_info: Optional[str] = None
    additional_info: Optional[str] = None
    affected_sections: List[str] = None
    section: Optional[str] = None  # For narrative-only corrections
    sources: List[str] = None
    narrative_comments: List[str] = None

    def __post_init__(self):
        if self.affected_sections is None:
            self.affected_sections = []
        if self.sources is None:
            self.sources = []
        if self.narrative_comments is None:
            self.narrative_comments = []

def load_corrections_yaml(corrections_file: Path) -> dict:
    """Load and validate corrections YAML file."""
    with open(corrections_file) as f:
        data = yaml.safe_load(f)

    # Validate schema
    validate_corrections_schema(data)

    return data

def validate_corrections_schema(data: dict) -> None:
    """Validate YAML structure and required fields."""
    required_top = ["company", "corrections"]
    for field in required_top:
        if field not in data:
            raise ValueError(f"Missing required field: {field}")

    for i, corr in enumerate(data["corrections"]):
        if "type" not in corr:
            raise ValueError(f"Correction {i+1}: Missing 'type' field")

        corr_type = corr["type"]

        if corr_type == "inaccurate":
            required = ["inaccurate_information", "correct_information", "affected_sections"]
            for field in required:
                if field not in corr:
                    raise ValueError(f"Correction {i+1} (inaccurate): Missing '{field}'")

        elif corr_type == "incomplete":
            required = ["incomplete_information", "additional_information", "affected_sections"]
            for field in required:
                if field not in corr:
                    raise ValueError(f"Correction {i+1} (incomplete): Missing '{field}'")

        elif corr_type == "narrative":
            required = ["section", "narrative_shaping_comments"]
            for field in required:
                if field not in corr:
                    raise ValueError(f"Correction {i+1} (narrative): Missing '{field}'")

        elif corr_type == "mixed":
            required = ["affected_sections"]
            for field in required:
                if field not in corr:
                    raise ValueError(f"Correction {i+1} (mixed): Missing '{field}'")

        else:
            raise ValueError(f"Correction {i+1}: Invalid type '{corr_type}'")

def parse_corrections(data: dict) -> List[CorrectionObject]:
    """Parse validated YAML into CorrectionObject list."""
    corrections = []
    for corr in data["corrections"]:
        corrections.append(CorrectionObject(
            type=corr["type"],
            inaccurate_info=corr.get("inaccurate_information"),
            correct_info=corr.get("correct_information"),
            incomplete_info=corr.get("incomplete_information"),
            additional_info=corr.get("additional_information"),
            affected_sections=corr.get("affected_sections", []),
            section=corr.get("section"),
            sources=corr.get("sources", []),
            narrative_comments=corr.get("narrative_shaping_comments", [])
        ))
    return corrections

Testing:

def test_load_corrections_yaml():
    yaml_content = """
company: "TestCo"
corrections:
  - type: "inaccurate"
    inaccurate_information: "Wrong info"
    correct_information: "Right info"
    affected_sections: ["Team"]
"""
    # Test parsing and validation

Step 2: Create CLI Script with YAML Support

New File: rewrite-key-info.py

Structure:

#!/usr/bin/env python3
"""
Correct crucial information in investment memos using YAML correction files.

USAGE:
    python rewrite-key-info.py "Company" --corrections data/Company-corrections.yaml
    python rewrite-key-info.py "Company" --corrections data/Company-corrections.yaml --verify-sources
"""

import argparse
from pathlib import Path
from rich.console import Console
from rich.panel import Panel
from src.corrections import load_corrections_yaml, parse_corrections
from src.agents.key_info_rewrite import apply_corrections_to_memo
from src.utils import get_latest_output_dir

def main():
    parser = argparse.ArgumentParser(
        description="Apply YAML-based corrections to investment memos"
    )
    parser.add_argument("target", help="Company name or path to artifact directory")
    parser.add_argument("--corrections", required=True, help="Path to corrections YAML file")
    parser.add_argument("--version", help="Specific version (default: latest)")
    parser.add_argument("--verify-sources", action="store_true",
                       help="Verify corrections with Perplexity Sonar Pro")
    parser.add_argument("--preview", action="store_true", help="Preview without saving")

    args = parser.parse_args()

    console = Console()

    # Load corrections YAML
    corrections_file = Path(args.corrections)
    if not corrections_file.exists():
        console.print(f"[red]Error: Corrections file not found:[/red] {corrections_file}")
        sys.exit(1)

    console.print(f"[bold]Loading corrections:[/bold] {corrections_file}")
    data = load_corrections_yaml(corrections_file)
    corrections = parse_corrections(data)

    console.print(f"  Company: {data['company']}")
    console.print(f"  Corrections: {len(corrections)}")

    # Determine artifact directory
    # ... (similar to improve-section.py)

    # Apply corrections
    result = apply_corrections_to_memo(
        artifact_dir=artifact_dir,
        corrections=corrections,
        verify_sources=args.verify_sources,
        preview=args.preview,
        console=console
    )

    # Display summary
    # ...

Step 3: State Schema Updates

Update: src/state.py

Add Field:

class MemoState(TypedDict):
    # ... existing fields ...

    # NEW: For key information corrections
    correction_instruction: NotRequired[str]
    correction_metadata: NotRequired[Dict[str, Any]]  # Track what was corrected

Step 4: Workflow Integration (Optional)

Update: src/workflow.py

Add Conditional Node:

def build_workflow():
    workflow = StateGraph(MemoState)

    # ... existing nodes ...

    # NEW: Optional correction node
    workflow.add_node("correct_key_info", key_information_rewrite_agent)

    # Conditional routing
    def should_correct(state: MemoState) -> str:
        if state.get("correction_instruction"):
            return "correct_key_info"
        return "continue"

    workflow.add_conditional_edges(
        "validate",
        should_correct,
        {
            "correct_key_info": "finalize",
            "continue": "finalize"
        }
    )

CLI Support:

# Run memo generation with correction
python -m src.main "Avalanche" --correct "Fund size is $10M, not $50M"

Step 5: Handle Edge Cases

Scenarios:

  1. No instances found: Warn user, don’t modify anything
  2. Conflicting citations: Flag sections that need manual review
  3. Dependent claims: Identify claims that may be affected
  4. Research data conflicts: Warn if correction contradicts research

Code:

def validate_correction_safety(
    correction_analysis: CorrectionAnalysis,
    affected_sections: List[SectionInfo],
    research_data: dict
) -> List[str]:
    """Check for potential issues before applying correction."""

    warnings = []

    # No instances found
    if not affected_sections:
        warnings.append("⚠️  No instances of incorrect information found")

    # Check research data conflicts
    research_text = str(research_data)
    if correction_analysis.incorrect_info in research_text:
        warnings.append(
            "⚠️  Research data contains the incorrect information. "
            "Consider using --update-research flag."
        )

    # Check for many instances (may indicate systemic issue)
    total_instances = sum(s.instances_found for s in affected_sections)
    if total_instances > 20:
        warnings.append(
            f"⚠️  Found {total_instances} instances across {len(affected_sections)} sections. "
            "This may indicate a deeper issue. Review carefully after correction."
        )

    return warnings

Step 6: Research Data Updates (—update-research)

If Flag Set:

def update_research_data(
    artifact_dir: Path,
    correction_analysis: CorrectionAnalysis
) -> None:
    """Update research.json with corrected information."""

    research_file = artifact_dir / "1-research.json"
    if not research_file.exists():
        return

    with open(research_file) as f:
        research_data = json.load(f)

    # Apply correction to research data fields
    research_json = json.dumps(research_data)
    corrected_json = research_json.replace(
        correction_analysis.incorrect_info,
        correction_analysis.correct_info
    )
    research_data = json.loads(corrected_json)

    # Save updated research
    with open(research_file, "w") as f:
        json.dump(research_data, f, indent=2)

    # Also update 1-research.md
    research_md = artifact_dir / "1-research.md"
    if research_md.exists():
        with open(research_md) as f:
            content = f.read()

        corrected_content = content.replace(
            correction_analysis.incorrect_info,
            correction_analysis.correct_info
        )

        with open(research_md, "w") as f:
            f.write(corrected_content)

Step 7: Testing & Validation

Test Suite:

  1. ✅ Simple correction (fund size)
  2. ✅ Complex correction (person title + role)
  3. ✅ Date correction with timeline impact
  4. ✅ Multiple semantic variations
  5. ✅ Correction with citation conflicts
  6. ✅ No instances found (error case)
  7. ✅ Preview mode
  8. ✅ Research data update

Manual Testing Checklist:

  • Run on Avalanche $50M → $10M
  • Verify all 7 sections corrected
  • Check citations preserved
  • Verify formatting maintained
  • Review reassembled final draft
  • Export to HTML and verify
  • Test with —update-research flag
  • Test with —preview flag

Step 8: Documentation

Update Files:

  1. CLAUDE.md: Add Key Information Rewrite section
  2. README.md: Move to “Completed” ✅
  3. Create docs/CORRECTIONS.md: Guide with examples
  4. Add examples to docs/EXAMPLES.md

Documentation Structure:

# Key Information Rewrite Guide

## When to Use

Use key information rewrite when:
- A crucial fact appears in multiple sections
- The error affects related claims (e.g., fund size affects check sizes)
- Manual editing would be error-prone

Do NOT use when:
- Error is in only one section (use improve-section.py instead)
- You want to rephrase content (use improve-section.py)
- You need to add new information (use improve-section.py or regenerate)

## Common Scenarios

### Fund Size Correction
...

### Person Title/Role Correction
...

### Date/Timeline Correction
...

### Investment Stage Correction
...

Implementation Roadmap

Step 1: Feature #1 - Sonar Pro Integration ✅ COMPLETED

Objective: Update improve-section.py to use Perplexity Sonar Pro for one-step improvements with citations

Tasks:

  • Replace Claude with Sonar Pro in improve_section_with_agent()
  • Update prompt to include citation instructions
  • Test on Avalanche Team section
  • Verify citations properly formatted
  • Compare quality to Claude-only approach

Deliverables:

  • ✅ Updated improve-section.py (commit: 6fbafe5)
  • ✅ Test results: Avalanche Team section (11 citations added)
  • ✅ Quality verified: Significant improvement with concrete details

Completion Date: 2025-11-20


Step 2: Feature #1 - Reassembly ✅ COMPLETED

Objective: Add ability to reassemble final draft after section improvement

Tasks:

  • Implement reassemble_final_draft() function
  • Add —rebuild-final flag (automatic reassembly, no flag needed)
  • Test reassembly on improved sections
  • Verify formatting preserved

Deliverables:

  • ✅ Working reassembly feature (automatic after improvement)
  • ✅ Includes header.md (company trademark)
  • ✅ Verified on Avalanche final draft

Completion Date: 2025-11-20


Step 3: Feature #1 - Before/After Preview ⏳ PENDING

Objective: Show improvements before applying

Tasks:

  • Add —preview flag
  • Implement diff display
  • Add confirmation prompt
  • Show metrics (word count, citations, etc.)

Deliverables:

  • Preview mode implementation
  • User-friendly diff output

Step 4: Feature #1 - Documentation & Testing 🔄 IN PROGRESS

Objective: Document Feature #1 and complete testing

Tasks:

  • Test on 1 section (Avalanche Team) ✅
  • Test on 4 more sections from different memos
  • Handle edge cases (missing API key, invalid sections) ✅
  • Update README.md ✅
  • Update CLAUDE.md
  • Mark as complete in README “Remaining Enhancements” ✅

Deliverables:

  • 🔄 Test results: 1/5 memos tested
  • 🔄 Documentation: README done, CLAUDE in progress
  • ⏳ Feature marked complete

Step 5: Feature #2 - YAML Template & Parser

Objective: Create correction YAML template and parser

Tasks:

  • Create templates/corrections-template.yaml
  • Create src/corrections.py with CorrectionObject dataclass
  • Implement load_corrections_yaml()
  • Implement validate_corrections_schema()
  • Implement parse_corrections()
  • Write unit tests for YAML parsing

Deliverables:

  • Working YAML template
  • Validated YAML parser
  • Unit tests passing

Step 6: Feature #2 - Agent Core (YAML-Based)

Objective: Create key_info_rewrite agent with YAML corrections support

Tasks:

  • Create src/agents/key_info_rewrite.py
  • Implement apply_correction_to_section() with narrative guidance
  • Implement apply_corrections_to_memo() (batch processor)
  • Optional: Implement verify_corrections_with_sources() (Sonar Pro)
  • Implement reassemble_after_correction()
  • Handle all 4 correction types (inaccurate, incomplete, narrative, mixed)
  • Write unit tests

Deliverables:

  • Working agent module
  • Support for all correction types
  • Unit tests passing

Step 7: Feature #2 - CLI Script (YAML-Based)

Objective: Create standalone CLI for YAML-based corrections

Tasks:

  • Create rewrite-key-info.py
  • Implement —corrections flag (required, YAML path)
  • Implement —verify-sources flag (optional, uses Sonar Pro)
  • Add preview mode
  • Implement rich console output with progress
  • Save corrections-log.json for audit trail

Deliverables:

  • Working CLI script
  • Help documentation
  • Example YAML files

Step 8: Feature #2 - Testing & Validation

Objective: Comprehensive testing of correction feature

Tasks:

  • Test on Avalanche $50M → $10M
  • Test person title correction
  • Test date correction
  • Test with semantic variations
  • Test preview mode
  • Test research updates
  • Handle edge cases

Deliverables:

  • Test results for all scenarios
  • Edge case handling
  • Bug fixes

Step 9: Feature #2 - Workflow Integration (Optional)

Objective: Allow corrections during memo generation workflow

Tasks:

  • Update MemoState schema
  • Add conditional routing in workflow
  • Add —correct flag to main CLI
  • Test integrated workflow

Deliverables:

  • Workflow integration
  • Updated CLI interface

Step 10: Documentation & Examples

Objective: Complete documentation for both features

Tasks:

  • Create docs/CORRECTIONS.md guide
  • Add examples to docs/EXAMPLES.md
  • Update CLAUDE.md comprehensively
  • Update README.md
  • Mark both features complete ✅

Deliverables:

  • Complete documentation
  • Usage examples
  • Features marked complete in README

Success Criteria

Feature #1: Section Improvement

Must Have:

  • ✅ Uses Perplexity Sonar Pro (not Claude) - COMPLETED
  • ✅ Citations added during improvement (not after) - COMPLETED
  • ✅ Obsidian-style citation format - COMPLETED
  • ✅ Preserves artifact structure - COMPLETED
  • ✅ Can reassemble final draft - COMPLETED (automatic)
  • ✅ Error handling for missing API keys - COMPLETED

Nice to Have:

  • ⏳ Before/after preview mode - PENDING
  • ⏳ Word count and quality metrics - PENDING
  • ⏳ Comparison with original section - PENDING

Status: Core functionality COMPLETED ✅ | Enhancement features PENDING

Feature #2: Key Information Rewrite

Must Have:

  • ✅ Identifies all affected sections
  • ✅ Applies corrections consistently
  • ✅ Preserves citations and formatting
  • ✅ Reassembles final draft automatically
  • ✅ Shows summary of changes

Nice to Have:

  • ✅ Updates research data (—update-research)
  • ✅ Preview mode before applying
  • ✅ Semantic variation detection
  • ✅ Workflow integration

Technical Considerations

API Costs

Feature #1 (Sonar Pro per section):

  • Cost: ~$0.50-1.00 per section improvement
  • Context: ~5k chars in, ~7k chars out
  • Model: sonar-pro

Feature #2 (Corrections):

  • Analysis: 1 Claude call (~$0.01)
  • Per section: 1 Claude call (~$0.05)
  • Total for 7 sections: ~$0.36
  • Model: claude-sonnet-4-5

Comparison to Full Regeneration:

  • Full regeneration: 10 sections × $1.00 = $10.00
  • Section improvement: 1 section × $0.75 = $0.75 (13× cheaper)
  • Key correction: 7 sections × $0.05 = $0.35 (29× cheaper)

Performance

Feature #1:

  • Time: ~30-60 seconds per section (Sonar Pro call)
  • Parallel: Not applicable (one section at a time)

Feature #2:

  • Analysis: ~5 seconds
  • Section scanning: ~1 second
  • Correction per section: ~10-15 seconds
  • Total for 7 sections: ~90 seconds (vs. 10+ minutes for full regeneration)

Rate Limits

Perplexity Sonar Pro:

  • Rate limit: 50 requests/minute
  • Constraint: None (processing one section at a time)

Anthropic Claude:

  • Rate limit: 50 requests/minute
  • Constraint: None for corrections (max ~10 sections)

Monitoring & Quality Assurance

Metrics to Track

Feature #1:

  • Sections improved per week
  • Average quality improvement (word count, citations added)
  • User satisfaction (manual review scores)
  • Time saved vs. full regeneration

Feature #2:

  • Corrections performed per week
  • Average sections affected per correction
  • Accuracy (manual review of corrections)
  • Time saved vs. manual editing

Quality Checks

Pre-Deployment:

  • Test both features on 5 real memos
  • Manual review of outputs
  • Verify citations preserved
  • Check formatting maintained

Post-Deployment:

  • Monitor error rates
  • Collect user feedback
  • Review edge cases
  • Iterate on prompts

Future Enhancements

Feature #1 Extensions

Batch Improvements:

# Improve multiple sections at once
python improve-section.py "Avalanche" --sections "Team,Market Context,Technology"

Comparative Mode:

# Compare section across versions
python improve-section.py "Avalanche" --compare v0.0.1 v0.0.2 --section "Team"

Auto-Improve:

# Automatically improve sections scoring < 7/10
python improve-section.py "Avalanche" --auto-improve --threshold 7

Feature #2 Extensions

Multiple Corrections:

# Apply multiple corrections at once
python rewrite-key-info.py "Avalanche" \
  --corrections corrections.json

Validation Mode:

# Validate consistency across sections
python rewrite-key-info.py "Avalanche" --validate

Rollback Support:

# Undo last correction
python rewrite-key-info.py "Avalanche" --rollback

  • Multi-Agent-Orchestration-for-Investment-Memo-Generation.md - Main architecture
  • changelog/2025-11-20_01.md - Section-by-section processing refactor
  • CLAUDE.md - Developer guide
  • README.md - User guide

Changelog

2025-11-20: Document created with comprehensive implementation plan for both features