← Corpus / investment-memo-orchestrator / spec
Introducing a Table Generator Agent
A post-writer enrichment agent that identifies tabular data opportunities in memo sections and generates markdown tables with overflow anchor linking, configurable column schemas, and firm-specific customization.
- Path
- specs/Introducing-a-Table-Generator-Agent.md
- Authors
- Michael Staton, AI Labs Team
- Augmented with
- Claude Code (Opus 4.6)
- Tags
- Agent-Design · Tables · Data-Visualization · Markdown · Enrichment
Introducing a Table Generator Agent
Status: Draft (v0.2.0 — supersedes Table-Generator-Agent-Spec.md) Date: 2026-03-10 Last Updated: 2026-03-10 Author: AI Labs Team Related: Table-Generator-Agent-Spec.md (original spec), Introducing-a-Competitive-Landscape-Research-and-Evaluation-System.md, Format-Memo-According-to-Template-Input.md Supersedes: context-v/Table-Generator-Agent-Spec.md (original spec remains for reference)
Executive Summary
The table generator agent scans written memo sections and structured data from upstream agents to identify content that would be more effectively communicated in tabular format. It generates markdown tables, inserts them into relevant sections, and handles overflow data through anchor links to detail sections.
This is a revised specification that incorporates design decisions from architecture discussions, including:
- Configurable table schemas defined in outline YAML or
templates/table-schemas/ - Anchor link overflow pattern for data that doesn’t fit in table cells (e.g., long investor lists)
- Integration with the Competitive Landscape system for the competitive comparison table
- Firm-specific customization with system defaults as fallback
Goals
- Enhance readability by converting inline data series into scannable tables
- Leverage structured data from upstream agents (competitive landscape, deck analysis, dataroom)
- Handle data overflow gracefully via anchor links to detail sections
- Support firm customization of table schemas through outline YAML or template files
- Complement, don’t replace narrative prose (tables are additive in default mode)
Pipeline Position
dataroom → deck → research → section_research → competitive_researcher
→ competitive_evaluator → cite → cleanup_research → writer
→ inject_deck_images → enrich_trademark → enrich_socials → enrich_links
→ TABLE_GENERATOR → enrich_visualizations → toc → revise_summaries
→ cleanup_sections → assemble_citations → validate_citations
→ fact_check → validate → scorecard
Runs after: enrich_links (entity links are already in place)
Runs before: enrich_visualizations and toc (tables inform what visualizations might help; TOC needs final headers)
Why This Position?
- After writer: Tables are generated from written content + structured data, not the other way around
- After link enrichment: Entity links (LinkedIn, company URLs) are already in place, so tables can include linked names
- Before citation assembly: Tables may reference cited data; citation renumbering happens later
- Before TOC: If tables add new subsection headers (e.g., ”### Investor Details”), the TOC needs to capture them
Table Types and Priority
Priority 1: Competitive Comparison Table
The most requested and highest-value table. Fed directly by the Competitive Landscape system.
Data source: state["competitive_landscape"]["evaluated_competitors"]
Default columns:
| Column | Source | Notes |
|---|---|---|
| Company | name | Bold the subject company row |
| Founded | founded | Year only |
| Funding | funding_total | Formatted: $50M |
| Stage | funding_stage | Seed, Series A, etc. |
| Notable Investors | notable_investors | Max 2 inline, anchor link for full list |
| Key Differentiator | key_differentiator | Brief phrase |
| Overlap | classification | Direct / Indirect |
Example output:
### Competitive Landscape
> _Use Markdown syntax to link out to external sources where appropriate. For example, if a company has a website, link to it. Investors, key customers, etc will all have websites that can be linked._
| Company | Founded | Funding | Stage | Notable Investors | Differentiator | Overlap |
|---------|---------|---------|-------|-------------------|----------------|---------|
| **Metabologic** | 2024 | $1.8M | Pre-Seed | Nucleus Capital | AI-designed enzymes | — |
| [Sweet Defeat](https://www.sweetdefeat.com/) | 2016 | $5M | Seed | [Full list](#sweet-defeat-investors) | Sugar-blocking lozenges | Direct |
| [SENS.life](https://sens.life/) | 2019 | $3M | Seed | [Full list](#sens-investors) | Enzymatic sugar blockers | Direct |
| [Pendulum](https://www.pendulum.health/) | 2019 | $100M | Series C | [Full list](#pendulum-investors) | Probiotic-based gut health | Indirect |
#### Investor Details {#investor-details}
##### Sweet Defeat Investors {#sweet-defeat-investors}
[Y Combinator](https://www.ycombinator.com/), [First Round Capital](https://www.frc.com/), [500 Startups](https://500.co/)
##### SENS Investors {#sens-investors}
[IndieBio](https://www.indiebio.vc/), [SOSV](https://sosv.com/), [European Angels Fund](https://www.europeanangelsfund.com/)
##### Pendulum Investors {#pendulum-investors}
[Sequoia Capital](https://www.sequoiacap.com/), [8VC](https://8vc.com/), [Meritech Capital](https://www.meritech.com/), [True Ventures](https://www.trueventures.com/), ...
Priority 2: Funding History Table
Data source: Deck analysis, research data, section prose Target section: Funding & Terms
| Round | Date | Amount | Pre-Money | Lead Investor | Participants |
|---|
Priority 3: Team Credentials Table
Data source: Deck analysis, section prose, socials enrichment Target section: Team / Organization
| Role | Name | Prior Experience | Notable Achievement |
|---|
Priority 4: Market Sizing Table
Data source: Research data, deck analysis Target section: Market Context / Opening
| Market Segment | Size | Growth | Source |
|---|
Priority 5: Key Customers / Traction Table
Data source: Deck analysis, dataroom, research Target section: Traction & Milestones
| Customer | Contract Value | Use Case | Status |
|---|
Priority 6: Cap Table (if available)
Data source: Dataroom analysis Target section: Funding & Terms
| Shareholder | Shares | Percentage | Type |
|---|
Table Schema Configuration
Option 1: Defined in Outline YAML (Preferred)
Table schemas can be defined directly in the outline YAML, keeping content structure and table definitions together.
# In templates/outlines/direct-investment.yaml
tables:
competitive_comparison:
target_section: "03-market-context.md" # Or "06-opportunity.md"
placement: "after_prose" # "after_prose", "before_prose", "replace_list"
columns:
- name: "Company"
source_field: "name"
bold_subject_company: true
- name: "Founded"
source_field: "founded"
align: "center"
- name: "Funding"
source_field: "funding_total"
align: "right"
- name: "Stage"
source_field: "funding_stage"
- name: "Notable Investors"
source_field: "notable_investors"
overflow:
max_inline: 2
anchor_pattern: "{company}-investors"
detail_header: "Investor Details"
- name: "Differentiator"
source_field: "key_differentiator"
- name: "Overlap"
source_field: "classification"
min_rows: 3 # Don't generate table if fewer than 3 competitors
funding_history:
target_section: "07-funding-terms.md"
placement: "after_prose"
columns:
- name: "Round"
source_field: "round"
- name: "Date"
source_field: "date"
- name: "Amount"
source_field: "amount"
align: "right"
- name: "Pre-Post Money"
source_field: "pre_post_money"
align: "right"
- name: "Valuation"
source_field: "valuation"
align: "right"
- name: "Stage"
source_field: "funding_stage"
- name: "Structure"
source_field: "structure" # SAFE, Convertible Note, Priced Round
- name: "Lead Investor"
source_field: "lead"
- name: "Participants"
source_field: "participants"
overflow:
max_inline: 3
anchor_pattern: "{round}-participants"
Option 2: Standalone Table Schema Templates
For firms that want to customize tables independently of the outline, schemas can live in templates/table-schemas/.
templates/
├── table-schemas/
│ ├── default/ # System defaults (ships with repo)
│ │ ├── competitive-comparison.yaml
│ │ ├── funding-history.yaml
│ │ ├── team-credentials.yaml
│ │ ├── market-sizing.yaml
│ │ └── cap-table.yaml
│ └── custom/
│ └── hypernova/ # Firm-specific overrides
│ └── competitive-comparison.yaml
Resolution Order
- Check outline YAML for
tables:section → use if present - Check
templates/table-schemas/custom/{firm}/→ use if present - Fall back to
templates/table-schemas/default/→ system defaults
Anchor Link Overflow Pattern
Problem
Some data doesn’t fit in a table cell. An investor list might include 20+ names. A customer list might have detailed use cases. Cramming this into a table cell destroys readability, especially in portrait-layout PDFs.
Solution: Inline Summary + Anchor Link to Detail
For columns marked with overflow configuration:
- Show the N most notable items inline in the cell
- Add an anchor link
[Full list](#anchor-id)to a detail section - Generate the detail section below the table (or at end of the memo section)
Implementation
def format_overflow_cell(
items: list[str],
max_inline: int,
anchor_id: str,
anchor_label: str = "Full list"
) -> str:
"""Format a cell with overflow items."""
if len(items) <= max_inline:
return ", ".join(items)
inline = ", ".join(items[:max_inline])
return f"{inline}, [{anchor_label}](#{anchor_id})"
def generate_overflow_details(
overflow_data: dict[str, list[str]],
detail_header: str
) -> str:
"""Generate the detail section for overflow data."""
lines = [f"\n#### {detail_header}\n"]
for anchor_id, items in overflow_data.items():
display_name = anchor_id.replace("-", " ").title()
lines.append(f"##### {display_name} {{#{anchor_id}}}")
lines.append(", ".join(items))
lines.append("")
return "\n".join(lines)
Export Compatibility
- HTML: Anchor links work natively with
idattributes - PDF (WeasyPrint): Internal anchor links work in generated PDFs
- DocX (Pandoc): Pandoc converts markdown anchor links to Word bookmarks
Table Placement and Prose Handling
Default Mode: Additive
Tables are inserted after the relevant prose. The prose is NOT modified. This means some data appears both in prose and in the table — this is intentional for skimmable documents.
Concise Mode (Future)
When content_mode: "concise" is set in the outline, the table generator also trims the prose:
- Remove inline enumerations that the table now handles (e.g., “competitors include A ($X), B ($Y), C ($Z)”)
- Replace with a brief reference: “Key competitors are summarized below.”
- Keep narrative analysis that adds context beyond the table
Note: Concise mode is a future enhancement. The initial implementation should be additive only.
Placement Rules
- After the prose that references the data, not before
- Before the next
###subsection header - One blank line above and below the table
- If a section has multiple table opportunities, order them by relevance to the surrounding text
Detection Patterns
The table generator uses two approaches:
1. Structured Data Consumption (Primary)
Consume pre-structured data from upstream agents:
state["competitive_landscape"]→ competitive comparison tablestate["deck_analysis"]→ funding history, team credentials, market sizingstate["dataroom_analysis"]→ cap table, financialsstate["research"]→ market data comparisons
This is the primary approach. Most tables should come from structured data, not prose parsing.
2. Prose Pattern Detection (Secondary)
Scan section content for patterns that indicate tabular data embedded in prose:
Temporal series: Numbers associated with time periods
r'\$[\d.]+[KMB]?\s+(?:in\s+)?\d{4}' # "$1.2M in 2022"
Entity comparisons: Multiple entities with same attributes
r'([A-Z][a-z]+(?:\s+[A-Z][a-z]+)*)\s+(?:raised|has|with)\s+\$[\d.]+[KMB]?'
Repeated list structures: Bullet lists where each item has consistent data fields
For edge cases where regex is insufficient, use an LLM call to identify and extract tabular data patterns.
Agent Architecture
Input
def table_generator(state: MemoState) -> dict:
"""
Generate markdown tables from structured data and prose patterns.
Insert tables into relevant sections with overflow anchor links.
"""
Processing Steps
- Load table schemas (outline YAML → custom templates → defaults)
- Collect structured data from state (competitive landscape, deck, dataroom, research)
- Match data to table types based on available data and schema definitions
- Scan prose for additional tabular patterns not covered by structured data
- Generate tables with proper formatting, overflow handling, and anchor links
- Insert tables into section files at optimal positions
- Generate overflow detail sections for columns with anchor links
- Save table artifacts to output directory
Output
return {
"tables_generated": {
"tables": [
{
"id": "competitive-comparison",
"type": "competitive_comparison",
"inserted_in": "03-market-context.md",
"rows": 5,
"columns": 7,
"overflow_anchors": ["sweet-defeat-investors", "sens-investors"]
},
# ...
],
"total_tables": 4,
"sections_updated": ["03-market-context.md", "04-organization.md", "07-funding-terms.md"]
},
"messages": ["Table generation complete: 4 tables inserted into 3 sections"]
}
Output Artifacts
Table Files
Each table is saved individually for debugging and reuse:
output/{Company}-v0.0.x/
├── 2-tables/ # Table artifacts
│ ├── tables-manifest.json # Index of all tables
│ ├── competitive-comparison.md # Individual table files
│ ├── funding-history.md
│ ├── team-credentials.md
│ └── market-sizing.md
Tables Manifest
{
"generated_at": "2026-03-10T10:30:00Z",
"schema_source": "outline_yaml",
"tables": [
{
"id": "competitive-comparison",
"file": "competitive-comparison.md",
"type": "competitive_comparison",
"data_source": "competitive_landscape",
"target_section": "03-market-context.md",
"rows": 5,
"columns": ["Company", "Founded", "Funding", "Stage", "Notable Investors", "Differentiator", "Overlap"],
"overflow_columns": ["Notable Investors"],
"inserted": true
}
]
}
Formatting Standards
Markdown Syntax
| Header 1 | Header 2 | Header 3 |
|:---------|:--------:|---------:|
| Left | Center | Right |
Number Formatting
- Currency:
$1.2M,$500K,$4.4T - Percentages:
45%,12.5% - Large numbers:
4M users,1.2B requests - Dates:
Q1 2023,2024,Mar 2024
Visual Hierarchy
- Bold the subject company row in comparison tables
- Right-align numeric columns
- Use consistent decimal precision within columns
- Include units in headers, not cells (e.g., “Revenue ($M)” not “$1M, $2M”)
Missing Values
- Use
—(em dash) for missing cells, never leave blank - Use
N/Aonly when the field is known to not apply
Edge Cases
Insufficient Data
If fewer rows than min_rows (default: 3), skip table generation for that type. Keep data in prose only.
Conflicting Sources
If multiple sources give different numbers for the same metric, use the most recent source with citation. Note the discrepancy in the table artifact file.
Already Tabular
If data is already in a table in a section (from deck screenshots or prior processing), do not create a duplicate. Check for existing markdown table syntax before inserting.
Very Wide Tables
If a table exceeds 7 columns, consider splitting into two tables or moving lower-priority columns to an overflow detail section.
Interaction with Other Agents
Competitive Landscape System
The competitive comparison table is the primary consumer of competitive landscape data. The table generator formats it; the competitive researcher and evaluator produce it.
Citation Assembly
Tables may contain data that needs citations. The citation assembly agent (which runs later) should handle citations within table cells. Table cells with [^N] references are valid markdown.
TOC Generator
If tables add new subsection headers (e.g., ”#### Investor Details”), the TOC generator should capture these. Since the table generator runs before TOC, this works naturally.
Revise Summary Sections
The summary revision agent runs after TOC and should be aware of tables when summarizing. Tables add information density that the summary should reflect.
Implementation Priority
- Competitive comparison table: Highest value, most requested by clients
- Funding history table: Common data, straightforward extraction
- Team credentials table: Enhances readability of team section
- Market sizing table: TAM/SAM/SOM data benefits from tabular presentation
- Schema configuration system: Allow firm customization
- Prose detection patterns: Catch data not in structured sources
Success Metrics
| Metric | Target |
|---|---|
| Tables per memo | 3-6 for a complete memo |
| Competitive table present | 100% of memos with 3+ competitors |
| Overflow anchors working in HTML export | 100% |
| Overflow anchors working in PDF export | 95%+ |
| No duplicate data display (table + identical prose) in concise mode | 100% |
| Firm-specific schema loaded when available | 100% |