← Corpus / cite-wide / workflow
cite-wide/workflow/2026-05-01
- Path
- workflow/2026-05-01.md
Add Claude to Perpled Plugin
/Users/mpstaton/code/lossless-monorepo/perplexed
{
"model": "claude-3-7-sonnet-20250219",
"system": "Use citations to back up your answer.",
"messages": [
{
"role": "user",
"content": [
{
"type": "document",
"source": {
"type": "base64",
"media_type": "application/pdf",
"data": "...",
"citations": {"enabled": true}
}
},
{"type": "text", "text": "What does the document say about..."}
]
}
]
}
URL Validity Checker.
Sends urls to a service that checks if they are valid and returns the result.
Populates a list in the modal with line number, a-href link long enough to read most of the title, a-href link to publisher/source organization, and a checkbox to remove the citation.
User has the choice to remove or to send to a search service that will take the title and other details and search THE SAME ARTICLE that was referenced (NO HALLUCINATION ALLOWED, MISTAKES ARE A NO NO NO).
Returns found articles in a list with links, auto sets replace checkboc to true. User may click to confirm. User may deny replacing the citation. Unchecking replace should set another checkbox representing remove/delete.
LM Research Response parsing by LLM provider, transformed into our preferred format.
The file we are working with now is: /Users/mpstaton/content-md/lossless/Tooling/Software Development/Databases/ChromaDB.md
I have been doing transformations from the LM response to our preferred format, but it’s getting arduous and boring so I figured enhance the plugin I already use… At least I will be less bored.
Rather than write here: look at this blueprint: /Users/mpstaton/code/lossless-monorepo/cite-wide/context-v/blueprints/Parse-Common-Citation-Formats.md
Dedupe citations by url.
In our workflow, it would be commong for us to as Perplexity a question, then copy the response into our markdown file. Later, we might ask another question and copy that response into the same markdown file. We might also ask Google and Claude, copy and pasting more content.
We often start content we do not finish, so pasting research is considered part of the content development process
If the second response has the same url as the first response, we should dedupe it. And again and again.
This will require traversing the AST or file, greping all matching urls, assuring that only ONE instance of the url exists in the Reference section at the bottom of the file (with the right syntax) [^{citation_id}]:
We need to use whatever conventions we have to expose a function/command to Deduplicate by URL
This will command will need to launch our modal shell, and the modal should list all potential duplicates in the order they appear in the file, list the lines they appear in (all of them), each line number should be an anchor link to that place in the markdow file if possible, and the entire list should be “checked” by default. The check is a true/false boolean on if the duplicates should be deduped. The user can uncheck one or more if they feel there has been an error or if they don’t fully understand the locations for inline and reference section.
Evaluate how to use bases for cannonical source citations
Some, not all, of sources in any one markdown file should be canonicalized to a base source saved in bases, locally. Eventually backed up to the “Cloud”.
This is particularly interesting because there is hype now around “Second Brains” and “Company Brains” that can provide RAG/KAG context information to inform LLM and Agent behavior.
Consider cleaning unused packages.
Misc Fixes
Fix the date insertion and redundant use of Publisher name in syntax of reference definiations
Current:
[^fhz60g]: Jul 2025. "[Pinecone vs Weaviate vs ChromaDB: Which Vector Database Should You Use for Scalable AI Search? | AGIX Technologies](https://agixtech.com/pinecone-vs-weaviate-vs-chromadb-vector-database-comparison/)". AGIX Technologies. [AGIX Technologies](https://agixtech.com).`
Desired:
[^fhz60g]: 2025, Jul. "[Pinecone vs Weaviate vs ChromaDB: Which Vector Database Should You Use for Scalable AI Search?](https://agixtech.com/pinecone-vs-weaviate-vs-chromadb-vector-database-comparison/)". [AGIX Technologies](https://agixtech.com).
Observe: the year comes before the month and date, the month is abbreviated, and the publisher name is not repeated in the citation (this removed two instances of the publisher name).
Tidy titles - remove pipe characters and other artifacts that make titles hard to read.
Because of the simplicty of open graph, publishers often concatenate extra marketing languages, append their company name, include symbols or even glyph icons to the title. This makes the title hard to read and understand. It’s just unpolished.
[!THOUGHT] It’s likely that a script would end up fudging up the title, I think it requires enough context understanding to identify the actual title from the publisher name and url, or any other miscellaneous text that got concatenated to the title.
- The agent should isolate the key phrase that is the title from the publisher name and url.
- The agent should remove pipe characters and other artifacts that make titles hard to read.