← Corpus / memopop-orchestrator / other
Improve Investment Memo Output
Improve the quality and depth of investment memos generated by the Investment Memo Orchestrator.
- Path
- issue-resolution/Improving-Memo-Output.md
- Authors
- Michael Staton, Tugce Ergul
- Augmented with
- Claude Code (Sonnet 4.5)
- Tags
- Workflow · Investment-Analysis · Content-Generation · Venture-Capital · AI-Assisted-Writing
Improving Memo Output: Section Improvement & Key Information Rewrite
Status: Feature #1 Implemented ✅ | Feature #2 Planned Date: 2025-11-20 Last Updated: 2025-11-20 Author: AI Labs Team Related: Multi-Agent-Orchestration-for-Investment-Memo-Generation.md
Implementation Status
Feature #1: Section Improvement with Sonar Pro ✅
Status: COMPLETED (Steps 1-2)
Completed Work:
- ✅ Step 1: Sonar Pro Integration (commit: 6fbafe5)
- Replaced Claude with Perplexity Sonar Pro in
improve-section.py - Added comprehensive citation instructions to prompt
- Citations now added during improvement (one-step process)
- Replaced Claude with Perplexity Sonar Pro in
- ✅ Step 2: Automatic Final Draft Reassembly (commit: 6fbafe5)
- Implemented
reassemble_final_draft()function - Automatic reassembly after section improvement
- Includes header.md (company trademark) and all sections
- Implemented
- ✅ Testing: Successfully tested on Avalanche Team section
- 11 citations added automatically
- Section quality improved significantly
- Final draft reassembled correctly
Test Results (Avalanche Team Section, 2025-11-20):
Input: output/Avalanche-v0.0.1/2-sections/04-team.md (existing weak section)
Command: python improve-section.py "Avalanche" "Team" --version v0.0.1
API: Perplexity Sonar Pro
Time: ~60 seconds
Results:
✓ Citations added: 11
✓ Obsidian-style formatting: Correct
✓ Section saved: 2-sections/04-team.md
✓ Final draft reassembled: 4-final-draft.md
✓ Quality improvement: Significant (specific metrics, company names, concrete details)
Pending Work:
- ⏳ Step 3: Before/After Preview Mode (not yet implemented)
- 🔄 Step 4: Documentation & Testing (IN PROGRESS)
Documentation:
- ✅ README.md: Section improvement usage documented
- 🔄 CLAUDE.md: Update in progress
- 🔄 This spec: Update in progress
Feature #2: Key Information Rewrite Agent 📋
Status: PLANNED (Steps 5-10)
Next Steps:
- Create
src/agents/key_info_rewrite.py - Create
rewrite-key-info.pyCLI script - Test on Avalanche $50M → $10M correction
Executive Summary
This document specifies two complementary features for improving memo quality without regenerating entire memos:
- Section Improvement: Enhance individual sections with better research and citations using Perplexity Sonar Pro
- Key Information Rewrite: Correct crucial facts that appear across multiple sections (e.g., fund size, dates, names)
Both features leverage the section-by-section architecture introduced in the 2025-11-20 refactor, allowing targeted improvements while preserving the artifact trail.
Problem Statement
Current Limitations
Issue #1: No Targeted Section Improvements
- When one section is weak, users must regenerate the entire memo
- Full regeneration is expensive (10 LLM calls + research)
- Good sections may degrade during regeneration
- No way to iteratively improve specific sections
Issue #2: No Global Fact Correction
- Factual errors often appear in multiple sections
- Example: Avalanche memo states “$50M fund” in 7 different sections, but actual size is “$10M”
- Manually editing each section is error-prone
- Citations may reference the wrong information
Requirements
Feature #1 Requirements:
- Improve a single section without touching others
- Use Perplexity Sonar Pro for real-time research
- Add citations automatically during improvement
- Preserve existing artifact structure
- Allow reassembly of final draft
Feature #2 Requirements:
- Identify all sections affected by a correction
- Apply corrections consistently across sections
- Preserve citations and formatting
- Update research data if needed
- Reassemble final draft automatically
Feature #1: Section Improvement with Sonar Pro
Current Implementation Review
Existing File: improve-section.py (created 2025-11-18)
Current Behavior:
- Loads artifacts (state, research, other sections)
- Uses Claude to improve section content
- Saves to
2-sections/directory - Missing: Citations must be added separately
What Exists:
def improve_section_with_agent(
section_name: str,
artifacts: dict,
artifact_dir: Path,
console: Console
) -> str:
"""Use agents to improve or create a specific section."""
# Uses ChatAnthropic (Claude)
# Does NOT add citations
# Requires separate citation enrichment step
What’s Needed:
- Replace Claude with Perplexity Sonar Pro
- Citations added during improvement (not after)
- One-step process instead of two-step
Target Architecture
Improved Function:
def improve_section_with_sonar_pro(
section_name: str,
artifacts: dict,
artifact_dir: Path,
console: Console
) -> str:
"""Use Perplexity Sonar Pro to improve section with citations."""
from openai import OpenAI
# Initialize Perplexity client
perplexity_client = OpenAI(
api_key=os.getenv("PERPLEXITY_API_KEY"),
base_url="https://api.perplexity.ai"
)
# Build comprehensive improvement prompt
prompt = build_improvement_prompt(
section_name=section_name,
existing_content=artifacts["sections"].get(section_file, ""),
company_name=artifacts["state"]["company_name"],
research_data=artifacts["research"],
other_sections=artifacts["sections"],
investment_type=artifacts["state"]["investment_type"],
memo_mode=artifacts["state"]["memo_mode"]
)
# Call Sonar Pro with improvement + citation instructions
response = perplexity_client.chat.completions.create(
model="sonar-pro",
messages=[{"role": "user", "content": prompt}]
)
improved_content = response.choices[0].message.content
# Save improved section
save_section_artifact(artifact_dir, section_num, section_name, improved_content)
return improved_content
Prompt Design
Sonar Pro Improvement Prompt Structure:
You are improving the '{section_name}' section for an investment memo about {company_name}.
INVESTMENT TYPE: {investment_type.upper()}
MEMO MODE: {memo_mode.upper()} ({'retrospective justification' if justify else 'prospective analysis'})
CURRENT SECTION CONTENT:
{existing_content}
RESEARCH DATA AVAILABLE:
{research_data_json}
CONTEXT FROM OTHER SECTIONS:
{other_sections_summary}
TASK: Significantly improve this section by:
1. Adding specific metrics and data from authoritative sources
2. Removing vague or speculative language ("could potentially", "might be", etc.)
3. Strengthening analysis with concrete evidence
4. Adding inline citations [^1], [^2], [^3] for ALL factual claims
5. Including a comprehensive Citations section at the end
REQUIREMENTS:
- Use Obsidian-style citations: [^1], [^2], etc.
- Place citations AFTER punctuation: "text. [^1]" not "text[^1]."
- Always include ONE SPACE before each citation: "text. [^1] [^2]"
- Use quality sources:
* Company websites, blogs, press releases
* TechCrunch, The Information, Sifted, Protocol, Axios
* Crunchbase, PitchBook (for funding data)
* SEC filings, investor letters
* Industry analyst reports (Gartner, CB Insights, McKinsey)
* Bloomberg, Reuters, WSJ, FT (for news)
- Match the analytical tone of professional VC memos
- Be specific, not promotional or dismissive
- For {memo_mode} mode: {'justify the investment decision' if justify else 'objectively assess'}
CITATION FORMAT:
[^1]: YYYY, MMM DD. [Source Title](https://full-url-here.com). Publisher Name. Published: YYYY-MM-DD | Updated: YYYY-MM-DD
IMPROVED SECTION CONTENT:
Key Differences from Citation Enrichment Agent:
- Citation Enrichment: Preserves narrative, only adds citations
- Section Improvement: Rewrites for quality AND adds citations
- Both use same citation format (Obsidian-style)
CLI Interface
Usage:
# Activate venv first (recommended)
source .venv/bin/activate
# Basic usage: improve section
python improve-section.py "Avalanche" "Team"
# Specify version
python improve-section.py "Avalanche" "Team" --version v0.0.1
# With final draft reassembly
python improve-section.py "Avalanche" "Team" --rebuild-final
# Direct path to artifact directory
python improve-section.py output/Avalanche-v0.0.1 "Market Context"
New Flags:
--rebuild-final: Reassemble4-final-draft.mdafter improvement--preview: Show before/after comparison without saving
Output:
✓ Loading artifacts from: output/Avalanche-v0.0.1/
✓ Loaded state.json
✓ Loaded research data
✓ Loaded 10 existing sections
🔧 Improving section: Team
Using Perplexity Sonar Pro for real-time research...
✓ Section improved with 8 new citations added
✓ Saved to: output/Avalanche-v0.0.1/2-sections/04-team.md
📊 Changes Summary:
- Original length: 850 words
- Improved length: 1,200 words
- Citations added: 8
- Vague claims removed: 5
- Specific metrics added: 12
✓ Reassembled final draft: 4-final-draft.md
Next steps:
1. Review improved section in: output/Avalanche-v0.0.1/2-sections/
2. Export to HTML: python export-branded.py output/Avalanche-v0.0.1/4-final-draft.md
Implementation Steps
Step 1: Update improve-section.py for Sonar Pro
Files Modified:
improve-section.py
Changes:
- Replace
improve_section_with_agent()withimprove_section_with_sonar_pro() - Import OpenAI client for Perplexity
- Update prompt to include citation instructions
- Test with PERPLEXITY_API_KEY
Testing:
# Test on weak section
python improve-section.py "Avalanche" "Team" --version v0.0.1
# Verify:
# - Section has inline citations [^1], [^2]
# - Citations section at end with URLs
# - Content quality improved
# - Vague language removed
Step 2: Add Reassembly Feature
Changes:
- Add
--rebuild-finalflag - Implement
reassemble_final_draft()function:- Load header.md if exists
- Load all sections from 2-sections/ in order
- Concatenate with proper spacing
- Save as 4-final-draft.md
Code:
def reassemble_final_draft(artifact_dir: Path, console: Console):
"""Reassemble 4-final-draft.md from section files."""
console.print("\n[bold]Reassembling final draft...[/bold]")
# Load header if exists
header_file = artifact_dir / "header.md"
if header_file.exists():
with open(header_file) as f:
content = f.read() + "\n"
else:
content = ""
# Load sections in order
sections_dir = artifact_dir / "2-sections"
section_files = sorted(sections_dir.glob("*.md"))
for section_file in section_files:
with open(section_file) as f:
content += f.read() + "\n\n"
# Save final draft
final_draft = artifact_dir / "4-final-draft.md"
with open(final_draft, "w") as f:
f.write(content.strip())
console.print(f"[green]✓ Final draft reassembled:[/green] {final_draft}")
Step 3: Add Before/After Comparison
Changes:
- Add
--previewflag - Show diff before saving
- Require confirmation
Output Example:
📊 Section Improvement Preview:
BEFORE (850 words):
"The team has extensive experience in the industry..."
AFTER (1,200 words):
"The founding team brings 40+ years of combined experience. [^1]
CEO Jane Doe previously scaled Acme Corp from $5M to $150M ARR
over 6 years (2015-2021). [^2] CTO John Smith led engineering at..."
Changes:
✓ Removed 5 vague claims
✓ Added 12 specific metrics
✓ Added 8 citations
✓ Increased depth by 41%
Save improved section? [y/N]:
Step 4: Error Handling & Edge Cases
Handle:
- Missing PERPLEXITY_API_KEY
- Invalid section names
- Missing artifact directories
- Network errors during API calls
- Malformed citations in response
Code:
def validate_environment():
"""Check required environment variables."""
if not os.getenv("PERPLEXITY_API_KEY"):
console.print("[red]Error: PERPLEXITY_API_KEY not set[/red]")
console.print("[yellow]Set it in .env file or export it[/yellow]")
sys.exit(1)
def validate_section_name(section_name: str) -> bool:
"""Validate section name against known sections."""
if section_name not in SECTION_MAP:
console.print(f"[red]Error: Unknown section '{section_name}'[/red]")
console.print("\n[yellow]Available sections:[/yellow]")
for name in sorted(SECTION_MAP.keys()):
console.print(f" • {name}")
return False
return True
Step 5: Documentation & Testing
Update Files:
CLAUDE.md: Add Section Improvement sectionREADME.md: Add to “Remaining Enhancements” → “Completed”- Create examples in
docs/EXAMPLES.md
Test Cases:
- ✅ Improve existing weak section
- ✅ Create missing section from scratch
- ✅ Handle section with existing citations (preserve them)
- ✅ Error: Invalid section name
- ✅ Error: Missing artifacts
- ✅ Reassemble final draft after improvement
Feature #2: Key Information Rewrite Agent
Use Cases
Scenario 1: Fund Size Correction
- Error: Memo states “$50M fund” in 7 sections
- Correction: Actual size is “$10M”
- Impact: Affects deployment strategy, check sizes, portfolio construction, economics
Scenario 2: Person Title Correction
- Error: “Katelyn Donnelly, Partner at Avalanche”
- Correction: “Katelyn Donnelly, Managing Partner and Founder at Avalanche”
- Impact: Affects GP Background, Track Record, decision-making authority
Scenario 3: Date Correction
- Error: “Company founded in 2020”
- Correction: “Company founded in 2019”
- Impact: Affects traction timeline, milestones, growth metrics
Scenario 4: Investment Stage Correction
- Error: “Series B company”
- Correction: “Series A company”
- Impact: Affects valuation expectations, metrics benchmarks, competitive positioning
YAML-Based Correction System
Rationale
Why YAML over CLI flags?
The original design used simple CLI corrections: --correction "Fund size is $10M, not $50M"
Problems with CLI approach:
- Can only handle one correction at a time
- No way to provide source verification
- Cannot distinguish between inaccurate, incomplete, and narrative guidance
- Difficult to track/audit corrections
- Not reusable across memo versions
YAML Template Benefits:
- Structured corrections: Explicit categories (inaccurate vs incomplete vs narrative)
- Batch corrections: Multiple corrections in one file
- Source references: Authoritative sources for verification
- Auditable: Corrections file becomes part of project history
- Reusable: Save correction templates for common issues
- Version control: Track changes to correction guidance over time
- Rich guidance: Narrative shaping comments guide tone and framing
YAML Template Structure
Template Location: data/{CompanyName}-corrections.yaml
Schema:
# Correction template for investment memo improvements
company: "Avalanche"
# VERSION MANAGEMENT
source_version: "v0.0.3" # Which version to correct (required, can be path or version tag)
# source_version: "output/Avalanche-v0.0.3" # Alternative: full path
output_mode: "new_version" # "new_version" or "in_place"
# output_mode: "in_place" # Overwrites source version artifacts
date_created: "2025-11-20"
corrections:
# Correction Object 1: Inaccurate Information
- type: "inaccurate"
inaccurate_information: |
The memo states that Avalanche VC Fund II is raising $50M, appearing
in multiple sections (Fund Strategy, Economics, Portfolio Construction).
correct_information: |
Avalanche VC Fund II is raising $10M, not $50M. The fund targets
$10M with a hard cap at $12M.
affected_sections:
- "Fund Strategy & Thesis"
- "Portfolio Construction"
- "Fee Structure & Economics"
- "Executive Summary"
sources:
- "https://avalanche.vc/fund-ii"
- "data/Avalanche-v0.0.1/0-deck-analysis.json"
narrative_shaping_comments:
- "Emphasize that the $10M fund size is intentional for boutique, high-touch approach"
- "Connect fund size to check size strategy ($250K-$500K initial)"
- "Frame smaller fund as competitive advantage for emerging EdTech companies"
# Correction Object 2: Incomplete Information
- type: "incomplete"
incomplete_information: |
The Team section mentions Katelyn Donnelly but doesn't specify her
previous role at Pearson Ventures or the fund's performance metrics.
additional_information: |
Katelyn Donnelly was Managing Director at Pearson Ventures, where she
oversaw a $65M fund that delivered an 18% IRR. She's also a Kauffman
Fellow (Class 21) and was featured on Forbes 30 Under 30 in 2014.
affected_sections:
- "GP Background & Track Record"
- "Executive Summary"
sources:
- "https://www.linkedin.com/in/katelyndonnelly/"
- "https://avalanche.vc/team"
narrative_shaping_comments:
- "Highlight the 18% IRR as significantly above industry average"
- "Connect Pearson Ventures experience to EdTech sector expertise"
- "Emphasize operational experience (co-founded Delivery Associates, $40M revenue)"
# Correction Object 3: Narrative Shaping Only
- type: "narrative"
section: "Investment Thesis"
narrative_shaping_comments:
- "Reduce promotional language about 'revolutionary' and 'game-changing'"
- "Add more balanced risk discussion alongside opportunity"
- "Include specific competitive comparisons (Reach Capital, Learn Capital)"
- "Quantify claims wherever possible (e.g., 'market leader' → 'top 3 in sector')"
sources:
- "https://www.crunchbase.com/organization/reach-capital"
- "https://www.crunchbase.com/organization/learn-capital"
# Correction Object 4: Multiple Facts + Narrative
- type: "mixed"
inaccurate_information: "Portfolio construction assumes 25 investments"
correct_information: "Portfolio will include 15-20 core investments, not 25"
incomplete_information: "No mention of reserve strategy for follow-on rounds"
additional_information: |
The fund reserves 50% of capital for follow-on investments in top performers.
Average initial check: $400K. Reserve per company: $300-500K.
affected_sections:
- "Portfolio Construction"
- "Fund Strategy & Thesis"
sources:
- "data/Avalanche-deck.pdf"
narrative_shaping_comments:
- "Frame reserve strategy as deliberate capital deployment discipline"
- "Compare concentration to industry norms (seed funds typically 30-40 companies)"
- "Connect to ownership targets (8-12% initial, 10-15% after reserves)"
Version Management Options
Critical Design Decision: Should corrections modify the existing version or create a new version?
Option 1: output_mode: "new_version" (Recommended for most cases)
Creates a new version directory with corrected content, preserving the original.
How it works:
source_version: "v0.0.3"
output_mode: "new_version"
Behavior:
- Reads all artifacts from
output/Avalanche-v0.0.3/ - Applies corrections to sections
- Creates
output/Avalanche-v0.0.4/with:- Corrected section files (
2-sections/) - Updated final draft (
4-final-draft.md) - Copied artifacts from v0.0.3 (state.json, research, validation)
- New
corrections-log.jsondocumenting changes - Updated
state.jsonwith correction metadata
- Corrected section files (
- Increments version: v0.0.3 → v0.0.4
- Updates
output/versions.json
Use when:
- You want to preserve the original memo for comparison
- Corrections might substantially change the narrative/recommendation
- You want an audit trail of what changed between versions
- Multiple stakeholders reviewing different versions
- Experimenting with different correction approaches
Example: Fund size correction ($50M → $10M)
- Creates v0.0.4 with corrected fund size
- Original v0.0.3 remains unchanged
- Can export both versions to compare side-by-side
- If correction is wrong, v0.0.3 is still available
Option 2: output_mode: "in_place" (Use with caution)
Overwrites the source version’s artifacts directly. Original content is lost.
How it works:
source_version: "v0.0.3"
output_mode: "in_place"
Behavior:
- Reads all artifacts from
output/Avalanche-v0.0.3/ - Applies corrections to sections
- Overwrites files in
output/Avalanche-v0.0.3/:- Replaces section files in
2-sections/ - Replaces
4-final-draft.md - Adds
corrections-log.json - Updates
state.jsonwith correction metadata
- Replaces section files in
- Version tag remains v0.0.3
- No new version created
Use when:
- Minor corrections (typos, small factual updates)
- You don’t need to preserve the original
- Disk space is limited
- Corrections are unambiguously correct
- Internal draft that hasn’t been shared
Example: Typo correction (“Managign Partner” → “Managing Partner”)
- Fixes typo directly in v0.0.3
- No need to create v0.0.4 for a typo
- Original v0.0.3 is overwritten
⚠️ Warning: This is destructive. Use --preview flag first to verify changes.
Version Specification Options
Option A: Version Tag (Recommended)
source_version: "v0.0.3"
- Agent resolves to
output/Avalanche-v0.0.3/ - Validates version exists
- Works with version history
Option B: Full Path
source_version: "output/Avalanche-v0.0.3"
- Direct path to artifact directory
- Useful if directory isn’t in standard location
- Bypasses version resolution
Option C: Latest (via CLI, not YAML)
# Use --source-version latest flag
python rewrite-key-info.py "Avalanche" \
--corrections data/Avalanche-corrections.yaml \
--source-version latest
- Agent finds latest version automatically
- Useful for quick iterations
Version Comparison & Audit Trail
With new_version mode, the system creates a comparison log:
File: output/Avalanche-v0.0.4/corrections-log.json
{
"source_version": "v0.0.3",
"output_version": "v0.0.4",
"output_mode": "new_version",
"corrections_applied": 4,
"sections_modified": 7,
"timestamp": "2025-11-20T15:30:00Z",
"corrections_file": "data/Avalanche-corrections.yaml",
"changes": [
{
"correction_type": "inaccurate",
"sections_affected": ["Fund Strategy & Thesis", "Portfolio Construction", "Fee Structure & Economics", "Executive Summary"],
"instances_corrected": 11,
"summary": "Corrected fund size from $50M to $10M"
},
{
"correction_type": "incomplete",
"sections_affected": ["GP Background & Track Record", "Executive Summary"],
"facts_added": 5,
"summary": "Added Katelyn's Pearson Ventures track record (18% IRR)"
},
{
"correction_type": "narrative",
"sections_affected": ["Investment Thesis"],
"summary": "Reduced promotional language, added competitive comparisons"
},
{
"correction_type": "mixed",
"sections_affected": ["Portfolio Construction", "Fund Strategy & Thesis"],
"instances_corrected": 6,
"facts_added": 3,
"summary": "Corrected portfolio size and added reserve strategy"
}
],
"narrative_impact": "Substantial - fund size correction changes check size strategy, portfolio construction, and economics sections. May affect overall recommendation.",
"recommendation_changed": false,
"recommendation_note": "Recommendation remains COMMIT but with updated rationale reflecting smaller fund size"
}
Comparison Command (Future Enhancement):
# Compare two versions
python compare-versions.py Avalanche v0.0.3 v0.0.4
# Output:
# Differences between v0.0.3 and v0.0.4:
# 7 sections modified
# 11 factual corrections
# 3 narrative improvements
# Recommendation: COMMIT (unchanged)
# Key changes:
# - Fund size: $50M → $10M
# - Portfolio: 25 → 15-20 investments
# - Added reserve strategy details
Impact on Research Data
Important: Corrections do NOT re-run research or regenerate from scratch.
What happens to research artifacts:
New Version Mode:
1-research.jsonand1-research.mdare copied from source version- Optionally updated if correction fundamentally conflicts (see “Research Conflicts” below)
0-deck-analysis.jsonis copied unchanged
In-Place Mode:
- Research artifacts remain unchanged
- Only section files and final draft are modified
Research Conflicts (Optional --update-research flag):
If correction contradicts research data, the agent can optionally update research:
# In corrections YAML
corrections:
- type: "inaccurate"
inaccurate_information: "Fund size $50M"
correct_information: "Fund size $10M"
update_research: true # Optional: update research artifacts
Behavior with update_research: true:
- Agent detects conflict: research mentions “$50M” but correction says “$10M”
- Updates
1-research.jsonto reflect $10M - Updates
1-research.mdnarrative - Logs research update in corrections-log.json
Default behavior (update_research: false or omitted):
- Research artifacts unchanged
- Only sections and final draft corrected
- Potential discrepancy logged in corrections-log.json
Why this matters: If research says “$50M” but memo says “$10M”, future regenerations might reintroduce the error. Updating research ensures consistency.
Correction Types
Type 1: inaccurate - Factual errors that must be corrected
- Required fields:
inaccurate_information,correct_information,affected_sections - Optional:
sources,narrative_shaping_comments
Type 2: incomplete - Missing information that should be added
- Required fields:
incomplete_information,additional_information,affected_sections - Optional:
sources,narrative_shaping_comments
Type 3: narrative - Tone/framing improvements without factual changes
- Required fields:
section,narrative_shaping_comments - Optional:
sources(for competitive research, benchmarking)
Type 4: mixed - Combination of inaccurate + incomplete
- Required fields: All of the above
- Most comprehensive correction type
Workflow with YAML Corrections
Step 1: User Creates Correction File
# Copy template
cp templates/corrections-template.yaml data/Avalanche-corrections.yaml
# Edit with corrections
# User fills in specific corrections based on feedback
Step 2: Agent Parses YAML
def load_corrections_yaml(corrections_file: Path) -> List[CorrectionObject]:
"""Load and validate corrections YAML file."""
with open(corrections_file) as f:
data = yaml.safe_load(f)
# Validate schema
validate_corrections_schema(data)
# Parse into CorrectionObject list
corrections = []
for corr in data['corrections']:
corrections.append(CorrectionObject(
type=corr['type'],
inaccurate_info=corr.get('inaccurate_information'),
correct_info=corr.get('correct_information'),
incomplete_info=corr.get('incomplete_information'),
additional_info=corr.get('additional_information'),
affected_sections=corr.get('affected_sections', []),
sources=corr.get('sources', []),
narrative_comments=corr.get('narrative_shaping_comments', [])
))
return corrections
Step 3: Source Verification (Optional)
def verify_corrections_with_sources(
corrections: List[CorrectionObject],
use_sonar_pro: bool = True
) -> List[VerificationResult]:
"""
Use Perplexity Sonar Pro to verify corrections against provided sources.
For each correction with sources:
1. Fetch source content (if URL)
2. Use Sonar Pro to verify correctness
3. Return confidence score + evidence
"""
if not use_sonar_pro:
return [VerificationResult(verified=True, confidence=1.0)]
results = []
for correction in corrections:
if not correction.sources:
results.append(VerificationResult(verified=True, confidence=0.8,
note="No sources provided, assuming user is correct"))
continue
# Build verification prompt
prompt = f"""Verify this correction using the provided sources:
CLAIMED INACCURATE INFO: {correction.inaccurate_info}
CLAIMED CORRECT INFO: {correction.correct_info}
SOURCES TO VERIFY:
{chr(10).join(correction.sources)}
TASK:
1. Check if the correction is accurate according to sources
2. Return confidence score (0.0-1.0)
3. Provide evidence from sources
Return JSON:
{{
"verified": true/false,
"confidence": 0.95,
"evidence": "Quote or summary from sources",
"concerns": "Any potential issues"
}}
"""
# Call Sonar Pro
response = perplexity_client.chat.completions.create(
model="sonar-pro",
messages=[{"role": "user", "content": prompt}]
)
result = parse_verification_result(response)
results.append(result)
return results
Step 4: Apply Corrections Section-by-Section
def apply_correction_to_section(
section_file: Path,
correction: CorrectionObject,
company_name: str
) -> str:
"""Apply single correction to section with narrative guidance."""
with open(section_file) as f:
original_content = f.read()
# Build correction prompt with narrative guidance
correction_prompt = f"""You are correcting an investment memo section for {company_name}.
CORRECTION TYPE: {correction.type}
{"INACCURATE INFORMATION: " + correction.inaccurate_info if correction.inaccurate_info else ""}
{"CORRECT INFORMATION: " + correction.correct_info if correction.correct_info else ""}
{"INCOMPLETE - MISSING: " + correction.incomplete_info if correction.incomplete_info else ""}
{"ADDITIONAL INFORMATION: " + correction.additional_info if correction.additional_info else ""}
NARRATIVE SHAPING GUIDANCE:
{chr(10).join(f"• {comment}" for comment in correction.narrative_comments)}
SOURCES FOR REFERENCE:
{chr(10).join(correction.sources)}
CURRENT SECTION CONTENT:
{original_content}
TASK:
1. Apply factual corrections (inaccurate → correct)
2. Add missing information (incomplete → additional)
3. Follow narrative shaping guidance for tone and framing
4. Preserve ALL existing citations
5. Add NEW citations for newly added facts (use sources provided)
6. Maintain formatting and structure
Return ONLY the corrected section content with citations.
"""
# Call Claude for correction
response = anthropic_client.invoke(correction_prompt)
corrected_content = response.content
return corrected_content
Step 5: CLI Usage
# Basic usage: apply corrections from YAML
# (source_version and output_mode specified in YAML)
python rewrite-key-info.py --corrections data/Avalanche-corrections.yaml
# With source verification (uses Sonar Pro to verify corrections)
python rewrite-key-info.py \
--corrections data/Avalanche-corrections.yaml \
--verify-sources
# Preview mode (show what would change without saving)
python rewrite-key-info.py \
--corrections data/Avalanche-corrections.yaml \
--preview
# Override YAML output mode (force in-place even if YAML says new_version)
python rewrite-key-info.py \
--corrections data/Avalanche-corrections.yaml \
--output-mode in_place
# Override source version (use latest instead of YAML-specified version)
python rewrite-key-info.py \
--corrections data/Avalanche-corrections.yaml \
--source-version latest
# Direct path to artifact directory (bypasses company resolution)
python rewrite-key-info.py \
--corrections data/Avalanche-corrections.yaml \
--source-path output/Avalanche-v0.0.3
CLI Flag Priority:
- CLI flags override YAML settings
- YAML settings override defaults
- Defaults:
output_mode: "new_version",source_version: "latest"
Step 6: Output
Example 1: New Version Mode
📋 Loaded corrections: data/Avalanche-corrections.yaml
Company: Avalanche
Source version: v0.0.3
Output mode: new_version → v0.0.4
Corrections: 4
🔍 Verifying corrections with sources...
✓ Correction 1: Verified (confidence: 0.95) - Fund size $10M confirmed
✓ Correction 2: Verified (confidence: 0.92) - Katelyn's track record confirmed
✓ Correction 3: No verification needed (narrative only)
✓ Correction 4: Verified (confidence: 0.88) - Portfolio construction confirmed
📝 Applying corrections...
Correction 1 (inaccurate):
✓ Fund Strategy & Thesis (3 instances corrected)
✓ Portfolio Construction (2 instances corrected)
✓ Fee Structure & Economics (4 instances corrected)
✓ Executive Summary (2 instances corrected)
Correction 2 (incomplete):
✓ GP Background & Track Record (added Pearson metrics)
✓ Executive Summary (added track record summary)
Correction 3 (narrative):
✓ Investment Thesis (toned down promotional language, added comparisons)
Correction 4 (mixed):
✓ Portfolio Construction (corrected count, added reserve strategy)
✓ Fund Strategy & Thesis (added reserve discussion)
📦 Creating new version: v0.0.4
✓ Copied artifacts from v0.0.3
✓ Applied corrections to 7 sections
✓ Updated state.json with correction metadata
✓ Created corrections-log.json
✅ Reassembled final draft: output/Avalanche-v0.0.4/4-final-draft.md
📊 Correction Summary:
Source version: v0.0.3
Output version: v0.0.4 (NEW)
Total corrections: 4
Sections modified: 7/10
Instances corrected: 15
Citations added: 8
Narrative improvements: 1 section
📝 Correction log saved: output/Avalanche-v0.0.4/corrections-log.json
Next steps:
1. Review corrections: output/Avalanche-v0.0.4/2-sections/
2. View final draft: output/Avalanche-v0.0.4/4-final-draft.md
3. Compare versions: diff output/Avalanche-v0.0.3/4-final-draft.md output/Avalanche-v0.0.4/4-final-draft.md
4. Export to HTML: python export-branded.py output/Avalanche-v0.0.4/4-final-draft.md
Example 2: In-Place Mode
📋 Loaded corrections: data/Avalanche-corrections.yaml
Company: Avalanche
Source version: v0.0.3
Output mode: in_place (⚠️ will overwrite v0.0.3)
Corrections: 1
⚠️ WARNING: In-place mode will overwrite existing artifacts.
Use --preview to see changes before applying.
Original content will be lost. Continue? [y/N]: y
📝 Applying corrections...
Correction 1 (inaccurate):
✓ GP Background & Track Record (1 instance corrected)
✅ Updated final draft: output/Avalanche-v0.0.3/4-final-draft.md
📊 Correction Summary:
Version: v0.0.3 (MODIFIED IN-PLACE)
Total corrections: 1
Sections modified: 1/10
Instances corrected: 1
📝 Correction log saved: output/Avalanche-v0.0.3/corrections-log.json
Next steps:
1. Review corrections: output/Avalanche-v0.0.3/2-sections/
2. View final draft: output/Avalanche-v0.0.3/4-final-draft.md
3. Export to HTML: python export-branded.py output/Avalanche-v0.0.3/4-final-draft.md
Benefits of YAML Approach
1. Comprehensive Corrections
- Single file can fix multiple issues across entire memo
- Supports fact corrections, additions, and narrative guidance
- Clear categorization of correction types
2. Source Integration
- Reference authoritative sources for verification
- Automatically verify corrections with Sonar Pro
- Add citations to newly added facts
3. Narrative Control
- Shape tone and framing with explicit guidance
- Not just facts—control how facts are presented
- Maintain analytical rigor vs promotional tone
4. Audit Trail
- Correction YAML files tracked in version control
corrections-log.jsonrecords what was changed- Easy to understand what was corrected and why
5. Reusability
- Save correction templates for common issues
- Apply same corrections to multiple memo versions
- Share correction patterns across projects
6. Batch Efficiency
- Fix 10+ issues in one run
- Fewer API calls than iterative corrections
- Consistent application across all sections
Architecture Design
New Agent: src/agents/key_info_rewrite.py
Agent Function:
def key_information_rewrite_agent(state: MemoState) -> dict:
"""
Correct crucial information that affects multiple sections.
Args:
state: Must contain:
- correction_instruction: str
Example: "The fund size is $10M, not $50M"
- company_name: str
- latest_output_dir: Path (optional, auto-detected if not provided)
Process:
1. Load final draft from latest version
2. Analyze correction to identify affected sections
3. For each affected section:
a. Load section file from 2-sections/
b. Apply correction via LLM
c. Preserve citations and formatting
d. Save corrected section
4. Reassemble final draft
5. Update metadata
Returns:
{
"sections_corrected": int,
"instances_found": int,
"files_updated": List[str],
"messages": List[str]
}
"""
Correction Analysis Algorithm
Phase 1: Parse Correction Instruction
def analyze_correction(instruction: str, company_name: str) -> CorrectionAnalysis:
"""
Use LLM to understand correction and identify search terms.
Returns:
CorrectionAnalysis:
- incorrect_info: str ("$50M")
- correct_info: str ("$10M")
- semantic_variations: List[str] (["fifty million", "Fund II size", "10M fund"])
- affected_section_types: List[str] (["Fund Strategy", "Economics", "Portfolio"])
"""
analysis_prompt = f"""Analyze this correction instruction for {company_name}:
INSTRUCTION: {instruction}
TASK: Extract structured information:
1. What information is INCORRECT?
2. What is the CORRECT information?
3. What semantic variations might appear? (paraphrases, related concepts)
4. Which section types are likely affected?
Return JSON:
{{
"incorrect_info": "exact text",
"correct_info": "exact text",
"semantic_variations": ["variant1", "variant2"],
"affected_section_types": ["section name 1", "section name 2"]
}}
"""
# Call Claude for analysis
response = anthropic_client.invoke(analysis_prompt)
return CorrectionAnalysis.parse(response.content)
Phase 2: Identify Affected Sections
def identify_affected_sections(
correction_analysis: CorrectionAnalysis,
artifact_dir: Path
) -> List[SectionInfo]:
"""
Scan all section files to find which ones contain the error.
Returns:
List of SectionInfo:
- section_name: str
- section_file: Path
- instances_found: int
- sample_text: str (preview of error)
"""
affected_sections = []
sections_dir = artifact_dir / "2-sections"
for section_file in sections_dir.glob("*.md"):
with open(section_file) as f:
content = f.read()
# Check for exact match
exact_count = content.count(correction_analysis.incorrect_info)
# Check for semantic variations
variation_count = 0
for variation in correction_analysis.semantic_variations:
variation_count += content.lower().count(variation.lower())
total_instances = exact_count + variation_count
if total_instances > 0:
affected_sections.append(SectionInfo(
section_name=extract_section_name(section_file),
section_file=section_file,
instances_found=total_instances,
sample_text=extract_sample(content, correction_analysis.incorrect_info)
))
return affected_sections
Phase 3: Apply Correction to Each Section
def correct_section(
section_file: Path,
correction_analysis: CorrectionAnalysis,
other_sections_context: str,
company_name: str
) -> str:
"""
Use LLM to apply correction while preserving formatting and citations.
"""
with open(section_file) as f:
original_content = f.read()
correction_prompt = f"""You are correcting a factual error in an investment memo section.
COMPANY: {company_name}
CORRECTION REQUIRED:
Incorrect: {correction_analysis.incorrect_info}
Correct: {correction_analysis.correct_info}
CONTEXT FROM OTHER SECTIONS:
{other_sections_context}
CURRENT SECTION CONTENT:
{original_content}
TASK:
1. Find ALL instances of the incorrect information (including paraphrases)
2. Replace with the correct information
3. Ensure consistency throughout the section
4. Update any dependent claims (e.g., if fund size changes, check sizes may change)
5. Preserve ALL citations - do not remove or modify them
6. Preserve all formatting (headers, lists, emphasis)
7. Do NOT change other content unrelated to the correction
CRITICAL:
- If a claim becomes unsupported after correction, flag it with [NEEDS CITATION]
- Maintain the analytical tone and depth
- Return ONLY the corrected section content
CORRECTED SECTION:
"""
# Call Claude
response = anthropic_client.invoke(correction_prompt)
corrected_content = response.content
# Save corrected section
with open(section_file, "w") as f:
f.write(corrected_content)
return corrected_content
Phase 4: Reassemble Final Draft
def reassemble_after_correction(artifact_dir: Path) -> Path:
"""Reassemble 4-final-draft.md after corrections."""
# Same logic as Feature #1 reassembly
content = ""
# Load header
header_file = artifact_dir / "header.md"
if header_file.exists():
with open(header_file) as f:
content = f.read() + "\n"
# Load all sections in order
sections_dir = artifact_dir / "2-sections"
for section_file in sorted(sections_dir.glob("*.md")):
with open(section_file) as f:
content += f.read() + "\n\n"
# Save final draft
final_draft = artifact_dir / "4-final-draft.md"
with open(final_draft, "w") as f:
f.write(content.strip())
return final_draft
CLI Interface
Standalone Script: rewrite-key-info.py
Usage:
# Activate venv first
source .venv/bin/activate
# Basic correction
python rewrite-key-info.py "Avalanche" \
--correction "The fund size is $10M, not $50M"
# Specify version
python rewrite-key-info.py "Avalanche" \
--correction "Katelyn Donnelly is Managing Partner, not Partner" \
--version v0.0.1
# Direct path
python rewrite-key-info.py output/Avalanche-v0.0.1 \
--correction "Company founded in 2019, not 2020"
# Preview mode (don't save)
python rewrite-key-info.py "Avalanche" \
--correction "Series A, not Series B" \
--preview
# Update research data too (deep mode)
python rewrite-key-info.py "Avalanche" \
--correction "Fund size is $10M" \
--update-research
Output Example:
🔍 Analyzing correction...
Incorrect: "$50M"
Correct: "$10M"
Semantic variations: "fifty million", "Fund II target", "target size"
🔎 Scanning sections...
✓ Found errors in 7/10 sections:
• Fund Strategy & Thesis (3 instances)
• Portfolio Construction (2 instances)
• Fee Structure & Economics (4 instances)
• Value Add & Differentiation (1 instance)
• Track Record Analysis (2 instances)
• Risks & Mitigations (1 instance)
• Executive Summary (2 instances)
📝 Applying corrections...
✓ Corrected: Fund Strategy & Thesis
✓ Corrected: Portfolio Construction
✓ Corrected: Fee Structure & Economics
✓ Corrected: Value Add & Differentiation
✓ Corrected: Track Record Analysis
✓ Corrected: Risks & Mitigations
✓ Corrected: Executive Summary
✅ Reassembled final draft
📊 Correction Summary:
Sections modified: 7/10
Total instances corrected: 15
Files updated:
• 2-sections/03-fund-strategy--thesis.md
• 2-sections/04-portfolio-construction.md
• 2-sections/07-fee-structure--economics.md
• 2-sections/05-value-add--differentiation.md
• 2-sections/06-track-record-analysis.md
• 2-sections/08-risks--mitigations.md
• 2-sections/01-executive-summary.md
• 4-final-draft.md
Next steps:
1. Review corrections in: output/Avalanche-v0.0.1/
2. Export to HTML: python export-branded.py output/Avalanche-v0.0.1/4-final-draft.md
3. Create new version: python -m src.main "Avalanche" --version-only
Implementation Steps (YAML-Based)
Step 0: Create YAML Template
New File: templates/corrections-template.yaml
Content:
# Investment Memo Correction Template
# Copy to data/{CompanyName}-corrections.yaml and fill in corrections
company: "CompanyName"
# VERSION MANAGEMENT (required)
source_version: "v0.0.3" # Which version to use as source
# Alternatives:
# source_version: "latest" # Use latest version
# source_version: "output/CompanyName-v0.0.3" # Full path
output_mode: "new_version" # "new_version" or "in_place"
# new_version: Creates v0.0.4 from v0.0.3 (preserves original)
# in_place: Overwrites v0.0.3 directly (DESTRUCTIVE - use with caution)
date_created: "YYYY-MM-DD"
corrections:
# Example 1: Inaccurate information
- type: "inaccurate"
inaccurate_information: |
Describe what's incorrect in the memo
correct_information: |
Provide the correct information
affected_sections:
- "Section Name 1"
- "Section Name 2"
sources:
- "https://source-url.com"
- "data/document.pdf"
narrative_shaping_comments:
- "Guidance on how to frame this correction"
- "Additional context or emphasis"
# Example 2: Incomplete information
- type: "incomplete"
incomplete_information: |
Describe what's missing
additional_information: |
Provide the missing information
affected_sections:
- "Section Name"
sources:
- "https://source-url.com"
narrative_shaping_comments:
- "How to integrate this information"
# Example 3: Narrative shaping only
- type: "narrative"
section: "Section Name"
narrative_shaping_comments:
- "Remove promotional language"
- "Add balanced risk discussion"
- "Quantify vague claims"
sources:
- "https://competitor-comparison.com"
# Example 4: Mixed correction
- type: "mixed"
inaccurate_information: "What's wrong"
correct_information: "What's correct"
incomplete_information: "What's missing"
additional_information: "What to add"
affected_sections:
- "Section 1"
- "Section 2"
sources:
- "https://source.com"
narrative_shaping_comments:
- "How to present this holistically"
Step 1: Create YAML Parser & Schema
New File: src/corrections.py
Implement:
from dataclasses import dataclass
from typing import List, Optional
from pathlib import Path
import yaml
@dataclass
class CorrectionObject:
"""Represents a single correction from YAML."""
type: str # "inaccurate", "incomplete", "narrative", "mixed"
inaccurate_info: Optional[str] = None
correct_info: Optional[str] = None
incomplete_info: Optional[str] = None
additional_info: Optional[str] = None
affected_sections: List[str] = None
section: Optional[str] = None # For narrative-only corrections
sources: List[str] = None
narrative_comments: List[str] = None
def __post_init__(self):
if self.affected_sections is None:
self.affected_sections = []
if self.sources is None:
self.sources = []
if self.narrative_comments is None:
self.narrative_comments = []
def load_corrections_yaml(corrections_file: Path) -> dict:
"""Load and validate corrections YAML file."""
with open(corrections_file) as f:
data = yaml.safe_load(f)
# Validate schema
validate_corrections_schema(data)
return data
def validate_corrections_schema(data: dict) -> None:
"""Validate YAML structure and required fields."""
required_top = ["company", "corrections"]
for field in required_top:
if field not in data:
raise ValueError(f"Missing required field: {field}")
for i, corr in enumerate(data["corrections"]):
if "type" not in corr:
raise ValueError(f"Correction {i+1}: Missing 'type' field")
corr_type = corr["type"]
if corr_type == "inaccurate":
required = ["inaccurate_information", "correct_information", "affected_sections"]
for field in required:
if field not in corr:
raise ValueError(f"Correction {i+1} (inaccurate): Missing '{field}'")
elif corr_type == "incomplete":
required = ["incomplete_information", "additional_information", "affected_sections"]
for field in required:
if field not in corr:
raise ValueError(f"Correction {i+1} (incomplete): Missing '{field}'")
elif corr_type == "narrative":
required = ["section", "narrative_shaping_comments"]
for field in required:
if field not in corr:
raise ValueError(f"Correction {i+1} (narrative): Missing '{field}'")
elif corr_type == "mixed":
required = ["affected_sections"]
for field in required:
if field not in corr:
raise ValueError(f"Correction {i+1} (mixed): Missing '{field}'")
else:
raise ValueError(f"Correction {i+1}: Invalid type '{corr_type}'")
def parse_corrections(data: dict) -> List[CorrectionObject]:
"""Parse validated YAML into CorrectionObject list."""
corrections = []
for corr in data["corrections"]:
corrections.append(CorrectionObject(
type=corr["type"],
inaccurate_info=corr.get("inaccurate_information"),
correct_info=corr.get("correct_information"),
incomplete_info=corr.get("incomplete_information"),
additional_info=corr.get("additional_information"),
affected_sections=corr.get("affected_sections", []),
section=corr.get("section"),
sources=corr.get("sources", []),
narrative_comments=corr.get("narrative_shaping_comments", [])
))
return corrections
Testing:
def test_load_corrections_yaml():
yaml_content = """
company: "TestCo"
corrections:
- type: "inaccurate"
inaccurate_information: "Wrong info"
correct_information: "Right info"
affected_sections: ["Team"]
"""
# Test parsing and validation
Step 2: Create CLI Script with YAML Support
New File: rewrite-key-info.py
Structure:
#!/usr/bin/env python3
"""
Correct crucial information in investment memos using YAML correction files.
USAGE:
python rewrite-key-info.py "Company" --corrections data/Company-corrections.yaml
python rewrite-key-info.py "Company" --corrections data/Company-corrections.yaml --verify-sources
"""
import argparse
from pathlib import Path
from rich.console import Console
from rich.panel import Panel
from src.corrections import load_corrections_yaml, parse_corrections
from src.agents.key_info_rewrite import apply_corrections_to_memo
from src.utils import get_latest_output_dir
def main():
parser = argparse.ArgumentParser(
description="Apply YAML-based corrections to investment memos"
)
parser.add_argument("target", help="Company name or path to artifact directory")
parser.add_argument("--corrections", required=True, help="Path to corrections YAML file")
parser.add_argument("--version", help="Specific version (default: latest)")
parser.add_argument("--verify-sources", action="store_true",
help="Verify corrections with Perplexity Sonar Pro")
parser.add_argument("--preview", action="store_true", help="Preview without saving")
args = parser.parse_args()
console = Console()
# Load corrections YAML
corrections_file = Path(args.corrections)
if not corrections_file.exists():
console.print(f"[red]Error: Corrections file not found:[/red] {corrections_file}")
sys.exit(1)
console.print(f"[bold]Loading corrections:[/bold] {corrections_file}")
data = load_corrections_yaml(corrections_file)
corrections = parse_corrections(data)
console.print(f" Company: {data['company']}")
console.print(f" Corrections: {len(corrections)}")
# Determine artifact directory
# ... (similar to improve-section.py)
# Apply corrections
result = apply_corrections_to_memo(
artifact_dir=artifact_dir,
corrections=corrections,
verify_sources=args.verify_sources,
preview=args.preview,
console=console
)
# Display summary
# ...
Step 3: State Schema Updates
Update: src/state.py
Add Field:
class MemoState(TypedDict):
# ... existing fields ...
# NEW: For key information corrections
correction_instruction: NotRequired[str]
correction_metadata: NotRequired[Dict[str, Any]] # Track what was corrected
Step 4: Workflow Integration (Optional)
Update: src/workflow.py
Add Conditional Node:
def build_workflow():
workflow = StateGraph(MemoState)
# ... existing nodes ...
# NEW: Optional correction node
workflow.add_node("correct_key_info", key_information_rewrite_agent)
# Conditional routing
def should_correct(state: MemoState) -> str:
if state.get("correction_instruction"):
return "correct_key_info"
return "continue"
workflow.add_conditional_edges(
"validate",
should_correct,
{
"correct_key_info": "finalize",
"continue": "finalize"
}
)
CLI Support:
# Run memo generation with correction
python -m src.main "Avalanche" --correct "Fund size is $10M, not $50M"
Step 5: Handle Edge Cases
Scenarios:
- No instances found: Warn user, don’t modify anything
- Conflicting citations: Flag sections that need manual review
- Dependent claims: Identify claims that may be affected
- Research data conflicts: Warn if correction contradicts research
Code:
def validate_correction_safety(
correction_analysis: CorrectionAnalysis,
affected_sections: List[SectionInfo],
research_data: dict
) -> List[str]:
"""Check for potential issues before applying correction."""
warnings = []
# No instances found
if not affected_sections:
warnings.append("⚠️ No instances of incorrect information found")
# Check research data conflicts
research_text = str(research_data)
if correction_analysis.incorrect_info in research_text:
warnings.append(
"⚠️ Research data contains the incorrect information. "
"Consider using --update-research flag."
)
# Check for many instances (may indicate systemic issue)
total_instances = sum(s.instances_found for s in affected_sections)
if total_instances > 20:
warnings.append(
f"⚠️ Found {total_instances} instances across {len(affected_sections)} sections. "
"This may indicate a deeper issue. Review carefully after correction."
)
return warnings
Step 6: Research Data Updates (—update-research)
If Flag Set:
def update_research_data(
artifact_dir: Path,
correction_analysis: CorrectionAnalysis
) -> None:
"""Update research.json with corrected information."""
research_file = artifact_dir / "1-research.json"
if not research_file.exists():
return
with open(research_file) as f:
research_data = json.load(f)
# Apply correction to research data fields
research_json = json.dumps(research_data)
corrected_json = research_json.replace(
correction_analysis.incorrect_info,
correction_analysis.correct_info
)
research_data = json.loads(corrected_json)
# Save updated research
with open(research_file, "w") as f:
json.dump(research_data, f, indent=2)
# Also update 1-research.md
research_md = artifact_dir / "1-research.md"
if research_md.exists():
with open(research_md) as f:
content = f.read()
corrected_content = content.replace(
correction_analysis.incorrect_info,
correction_analysis.correct_info
)
with open(research_md, "w") as f:
f.write(corrected_content)
Step 7: Testing & Validation
Test Suite:
- ✅ Simple correction (fund size)
- ✅ Complex correction (person title + role)
- ✅ Date correction with timeline impact
- ✅ Multiple semantic variations
- ✅ Correction with citation conflicts
- ✅ No instances found (error case)
- ✅ Preview mode
- ✅ Research data update
Manual Testing Checklist:
- Run on Avalanche $50M → $10M
- Verify all 7 sections corrected
- Check citations preserved
- Verify formatting maintained
- Review reassembled final draft
- Export to HTML and verify
- Test with —update-research flag
- Test with —preview flag
Step 8: Documentation
Update Files:
CLAUDE.md: Add Key Information Rewrite sectionREADME.md: Move to “Completed” ✅- Create
docs/CORRECTIONS.md: Guide with examples - Add examples to
docs/EXAMPLES.md
Documentation Structure:
# Key Information Rewrite Guide
## When to Use
Use key information rewrite when:
- A crucial fact appears in multiple sections
- The error affects related claims (e.g., fund size affects check sizes)
- Manual editing would be error-prone
Do NOT use when:
- Error is in only one section (use improve-section.py instead)
- You want to rephrase content (use improve-section.py)
- You need to add new information (use improve-section.py or regenerate)
## Common Scenarios
### Fund Size Correction
...
### Person Title/Role Correction
...
### Date/Timeline Correction
...
### Investment Stage Correction
...
Implementation Roadmap
Step 1: Feature #1 - Sonar Pro Integration ✅ COMPLETED
Objective: Update improve-section.py to use Perplexity Sonar Pro for one-step improvements with citations
Tasks:
- Replace Claude with Sonar Pro in improve_section_with_agent()
- Update prompt to include citation instructions
- Test on Avalanche Team section
- Verify citations properly formatted
- Compare quality to Claude-only approach
Deliverables:
- ✅ Updated
improve-section.py(commit: 6fbafe5) - ✅ Test results: Avalanche Team section (11 citations added)
- ✅ Quality verified: Significant improvement with concrete details
Completion Date: 2025-11-20
Step 2: Feature #1 - Reassembly ✅ COMPLETED
Objective: Add ability to reassemble final draft after section improvement
Tasks:
- Implement reassemble_final_draft() function
-
Add —rebuild-final flag(automatic reassembly, no flag needed) - Test reassembly on improved sections
- Verify formatting preserved
Deliverables:
- ✅ Working reassembly feature (automatic after improvement)
- ✅ Includes header.md (company trademark)
- ✅ Verified on Avalanche final draft
Completion Date: 2025-11-20
Step 3: Feature #1 - Before/After Preview ⏳ PENDING
Objective: Show improvements before applying
Tasks:
- Add —preview flag
- Implement diff display
- Add confirmation prompt
- Show metrics (word count, citations, etc.)
Deliverables:
- Preview mode implementation
- User-friendly diff output
Step 4: Feature #1 - Documentation & Testing 🔄 IN PROGRESS
Objective: Document Feature #1 and complete testing
Tasks:
- Test on 1 section (Avalanche Team) ✅
- Test on 4 more sections from different memos
- Handle edge cases (missing API key, invalid sections) ✅
- Update README.md ✅
- Update CLAUDE.md
- Mark as complete in README “Remaining Enhancements” ✅
Deliverables:
- 🔄 Test results: 1/5 memos tested
- 🔄 Documentation: README done, CLAUDE in progress
- ⏳ Feature marked complete
Step 5: Feature #2 - YAML Template & Parser
Objective: Create correction YAML template and parser
Tasks:
- Create templates/corrections-template.yaml
- Create src/corrections.py with CorrectionObject dataclass
- Implement load_corrections_yaml()
- Implement validate_corrections_schema()
- Implement parse_corrections()
- Write unit tests for YAML parsing
Deliverables:
- Working YAML template
- Validated YAML parser
- Unit tests passing
Step 6: Feature #2 - Agent Core (YAML-Based)
Objective: Create key_info_rewrite agent with YAML corrections support
Tasks:
- Create src/agents/key_info_rewrite.py
- Implement apply_correction_to_section() with narrative guidance
- Implement apply_corrections_to_memo() (batch processor)
- Optional: Implement verify_corrections_with_sources() (Sonar Pro)
- Implement reassemble_after_correction()
- Handle all 4 correction types (inaccurate, incomplete, narrative, mixed)
- Write unit tests
Deliverables:
- Working agent module
- Support for all correction types
- Unit tests passing
Step 7: Feature #2 - CLI Script (YAML-Based)
Objective: Create standalone CLI for YAML-based corrections
Tasks:
- Create rewrite-key-info.py
- Implement —corrections flag (required, YAML path)
- Implement —verify-sources flag (optional, uses Sonar Pro)
- Add preview mode
- Implement rich console output with progress
- Save corrections-log.json for audit trail
Deliverables:
- Working CLI script
- Help documentation
- Example YAML files
Step 8: Feature #2 - Testing & Validation
Objective: Comprehensive testing of correction feature
Tasks:
- Test on Avalanche $50M → $10M
- Test person title correction
- Test date correction
- Test with semantic variations
- Test preview mode
- Test research updates
- Handle edge cases
Deliverables:
- Test results for all scenarios
- Edge case handling
- Bug fixes
Step 9: Feature #2 - Workflow Integration (Optional)
Objective: Allow corrections during memo generation workflow
Tasks:
- Update MemoState schema
- Add conditional routing in workflow
- Add —correct flag to main CLI
- Test integrated workflow
Deliverables:
- Workflow integration
- Updated CLI interface
Step 10: Documentation & Examples
Objective: Complete documentation for both features
Tasks:
- Create docs/CORRECTIONS.md guide
- Add examples to docs/EXAMPLES.md
- Update CLAUDE.md comprehensively
- Update README.md
- Mark both features complete ✅
Deliverables:
- Complete documentation
- Usage examples
- Features marked complete in README
Success Criteria
Feature #1: Section Improvement
Must Have:
- ✅ Uses Perplexity Sonar Pro (not Claude) - COMPLETED
- ✅ Citations added during improvement (not after) - COMPLETED
- ✅ Obsidian-style citation format - COMPLETED
- ✅ Preserves artifact structure - COMPLETED
- ✅ Can reassemble final draft - COMPLETED (automatic)
- ✅ Error handling for missing API keys - COMPLETED
Nice to Have:
- ⏳ Before/after preview mode - PENDING
- ⏳ Word count and quality metrics - PENDING
- ⏳ Comparison with original section - PENDING
Status: Core functionality COMPLETED ✅ | Enhancement features PENDING
Feature #2: Key Information Rewrite
Must Have:
- ✅ Identifies all affected sections
- ✅ Applies corrections consistently
- ✅ Preserves citations and formatting
- ✅ Reassembles final draft automatically
- ✅ Shows summary of changes
Nice to Have:
- ✅ Updates research data (—update-research)
- ✅ Preview mode before applying
- ✅ Semantic variation detection
- ✅ Workflow integration
Technical Considerations
API Costs
Feature #1 (Sonar Pro per section):
- Cost: ~$0.50-1.00 per section improvement
- Context: ~5k chars in, ~7k chars out
- Model: sonar-pro
Feature #2 (Corrections):
- Analysis: 1 Claude call (~$0.01)
- Per section: 1 Claude call (~$0.05)
- Total for 7 sections: ~$0.36
- Model: claude-sonnet-4-5
Comparison to Full Regeneration:
- Full regeneration: 10 sections × $1.00 = $10.00
- Section improvement: 1 section × $0.75 = $0.75 (13× cheaper)
- Key correction: 7 sections × $0.05 = $0.35 (29× cheaper)
Performance
Feature #1:
- Time: ~30-60 seconds per section (Sonar Pro call)
- Parallel: Not applicable (one section at a time)
Feature #2:
- Analysis: ~5 seconds
- Section scanning: ~1 second
- Correction per section: ~10-15 seconds
- Total for 7 sections: ~90 seconds (vs. 10+ minutes for full regeneration)
Rate Limits
Perplexity Sonar Pro:
- Rate limit: 50 requests/minute
- Constraint: None (processing one section at a time)
Anthropic Claude:
- Rate limit: 50 requests/minute
- Constraint: None for corrections (max ~10 sections)
Monitoring & Quality Assurance
Metrics to Track
Feature #1:
- Sections improved per week
- Average quality improvement (word count, citations added)
- User satisfaction (manual review scores)
- Time saved vs. full regeneration
Feature #2:
- Corrections performed per week
- Average sections affected per correction
- Accuracy (manual review of corrections)
- Time saved vs. manual editing
Quality Checks
Pre-Deployment:
- Test both features on 5 real memos
- Manual review of outputs
- Verify citations preserved
- Check formatting maintained
Post-Deployment:
- Monitor error rates
- Collect user feedback
- Review edge cases
- Iterate on prompts
Future Enhancements
Feature #1 Extensions
Batch Improvements:
# Improve multiple sections at once
python improve-section.py "Avalanche" --sections "Team,Market Context,Technology"
Comparative Mode:
# Compare section across versions
python improve-section.py "Avalanche" --compare v0.0.1 v0.0.2 --section "Team"
Auto-Improve:
# Automatically improve sections scoring < 7/10
python improve-section.py "Avalanche" --auto-improve --threshold 7
Feature #2 Extensions
Multiple Corrections:
# Apply multiple corrections at once
python rewrite-key-info.py "Avalanche" \
--corrections corrections.json
Validation Mode:
# Validate consistency across sections
python rewrite-key-info.py "Avalanche" --validate
Rollback Support:
# Undo last correction
python rewrite-key-info.py "Avalanche" --rollback
Related Documentation
Multi-Agent-Orchestration-for-Investment-Memo-Generation.md- Main architecturechangelog/2025-11-20_01.md- Section-by-section processing refactorCLAUDE.md- Developer guideREADME.md- User guide
Changelog
2025-11-20: Document created with comprehensive implementation plan for both features