← Corpus / lost-in-public / prompt
Fix one YAML Issue at a Time
Systematic approach to cleaning URL properties in YAML frontmatter
- Path
- prompts/data-integrity/Fix-one-YAML-Issue-at-a-Time--alt.md
- Authors
- Michael Staton
- Augmented with
- Windsurf Cascade on Claude 3.5 Sonnet
- Tags
- YAML-Validation · Error-Handling · Build-Scripts · URL-Processing
Executive Summary
Purpose
Create a focused, single-purpose script to detect and clean URL properties in YAML frontmatter, addressing one specific issue: quote characters before URLs.
Scope
- Target Directory:
site/src/content/tooling(~700 markdown files) - Operation: Detection only (non-destructive)
- Focus: URL properties with quote characters before the “h” character in “http”
Technical Details
URL Property Definition
A URL property is any YAML frontmatter property that:
- Contains ‘http://’ or ‘https://’
- The url SHOULD be a bare string (no quotes, no block scalar syntax), but we are looking for looking for quote character abnormalities in this pass, and only quotes BEFORE the “h”
- Must be a single contiguous string without interruptions
Detection Pattern
// Detect any non-space character before http(s)
/^([^:\n]+):\s*([^\s].*?)(https?:\/\/.*?)$/
Report Structure
const report_data = {
content: {
summary: {
total_files: 0,
files_with_issues: 0
},
details: {
yaml_lines_with_urls_that_have_quote_characters_at_start_of_value: []
}
}
}
Implementation Steps
-
Report Template Setup
- Location:
site/scripts/tidy-up/tidy-one-property/assure-clean-url-properties/reportQuoteCharactersOfAnyType.cjs - Output:
site/src/content/changelog--content/reports/2025-03-19_unclean-url-report_01.md - Format: As specified in reportTemplateForUncleanURLs, though the user may have typos or non-working javascript — though it should convey the logic.
- Location:
-
URL Detection Implementation
- File:
site/scripts/tidy-up/tidy-one-property/assure-clean-url-properties/detectUncleanURLs.cjs - Source: Extract from
getKnownErrorsAndFixes.cjs - Key Focus: Comprehensive URL pattern detection
- File:
-
Helper Function Setup
- File:
site/scripts/tidy-up/tidy-one-property/tidyOneAtaTimeUtils.cjs - Process: One file at a time
- Memory: Report data accumulation per file
- File:
-
Configuration Integration
- Source:
site/scripts/build-scripts/getUserOptions.cjs - Principle: DRY and Single Source of Truth
- Note: No redundant code creation
- Source:
Processing Requirements
-
File Processing
- Process one file at a time
- Complete evaluation before moving to next file
- Immediate report data accumulation
- No glob patterns or bulk processing
-
URL Property Handling
- Check for any non-space characters before ‘http’
- Identify all quote variations (single, double, nested)
- Flag block scalar syntax if present
-
Report Generation
- Accumulate data during processing
- Generate final report in markdown
- Include file paths and specific issues
- Provide summary statistics
Future Considerations
-
Validation Phase (To Be Developed)
- Verification of cleaned URLs
- Format compliance checking
- Link validity testing
-
Extended Functionality (Future)
- Block scalar syntax removal
- Quote character cleanup
- URL validation
Notes
- This is a detection-only phase
- No file modifications in this pass
- Focus on accurate detection and reporting