← Corpus / image-gin / blueprint
Blueprint — Add a New Image-Generation Provider (case study: Ideogram)
- Path
- blueprints/Add-New-Image-API-to-Providers.md
- Authors
- Michael Staton
- Augmented with
- Claude Code (Opus 4.7, 1M context)
Blueprint — Add a New Image-Generation Provider
This blueprint describes how to add a new image-generation provider to image-gin alongside the existing Recraft integration. It uses Ideogram v3 as the worked example. The same shape applies to any future generation API (e.g., Flux, SDXL, Imagen).
Image search providers (e.g., Magnific) follow a similar but distinct shape — see
src/services/magnificService.tsandsrc/modals/MagnificModal.ts. This blueprint focuses on generation, which is closer in shape to Recraft.
Pre-existing state (already in the repo)
The user has already done the credential prep:
IDEOGRAM_API_KEYadded to project.envdata.jsonextended with:"ideogram": { "enabled": true, "apiKey": "<key>" }
Note:
data.jsonis the runtime-loaded settings file. TheDEFAULT_SETTINGSconstant insrc/settings/settings.tsis the source of truth for shape —loadSettings()shallow-merges loaded data over defaults. Any field present at runtime must also exist inDEFAULT_SETTINGS, otherwise it has no type and no UI.
Recraft is not being deprecated. Ideogram is purely additive. The two providers coexist with separate commands, separate modals, and separate settings sections. Nothing in the Recraft path is touched by this work.
Evaluation — how Recraft uses the frontmatter today
Tracing the Recraft path end-to-end (CurrentFileModal.ts + recraftImageService.ts + settings.ts):
| Concern | Where it lives | Per-file? |
|---|---|---|
| Subject-matter prompt | Frontmatter key image_prompt (configurable via imagePromptKey) | ✅ |
| Visual style (broad) | Settings — style.presetStyle.base (e.g. digital_illustration) | ❌ |
| Visual style (narrow) | Settings — style.presetStyle.substyle (e.g. graphic_intensity) | ❌ |
| Brand style (Recraft custom) | Settings — style.customStyleId or first entry of imageStylesJSON | ❌ |
| Sizes to render | Modal toggles (selected from settings.imageSizes presets) | per-call |
| Resulting image path | Written back to frontmatter under each ImageSize.yamlKey (e.g. banner_image) | ✅ |
| Updated prompt | Written back to frontmatter under image_prompt | ✅ |
Net characterization: Recraft’s brand style is held entirely in settings — typically as a Recraft style_id that the user trained on Recraft’s own platform. The frontmatter contributes one piece of information: the per-file subject-matter prompt. Recraft’s “brand voice” lives server-side, identified by a UUID.
This works because Recraft’s API treats style as a first-class server-side parameter (you train styles in their UI, reference them by ID at generation time).
How Ideogram differs — and what that means for the frontmatter/settings split
Ideogram has no concept of a server-side custom style. Its style controls are:
style_type— one of five enums (AUTO | GENERAL | REALISTIC | DESIGN | FICTION) — coarsemagic_prompt—AUTO | ON | OFF— lets Ideogram rewrite your promptnegative_prompt— free-text exclusionsprompt— the actual creative direction, which carries all brand/style/theme information not captured by the enums above
This shifts the burden: with Recraft, “make it look like our brand” is a style_id. With Ideogram, “make it look like our brand” is prompt engineering — the user (or the plugin) must wrap the per-file subject-matter prompt with brand-style template text. The astro-knots reference comments this directly:
/** Required. The text prompt. Wrap with your brand style template before calling. */
prompt: string;
So the design question for Ideogram is: where does that wrapping template live, and what gets to override what per-file?
Proposed split
| Concern | Belongs in | Key / Field | Rationale |
|---|---|---|---|
| Subject matter (the actual idea) | Frontmatter | image_prompt (reuse Recraft’s key) | Same per-file concept Recraft already uses; sharing the key means switching providers on the same note doesn’t require re-typing the prompt. |
| Brand prompt template (prefix/suffix) | Settings | ideogram.brandTemplate.{prefix,suffix} | Brand-wide constant. {prompt} placeholder gets substituted with the frontmatter prompt. |
Default style_type | Settings | ideogram.defaults.styleType | Brand-wide constant; per-file override possible (see below). |
Default rendering_speed | Settings | ideogram.defaults.renderingSpeed | Cost/quality tradeoff is a brand-wide policy, not a per-file decision. |
Default magic_prompt | Settings | ideogram.defaults.magicPrompt | Whether Ideogram is allowed to rewrite prompts is a brand voice decision. |
| Brand-wide negative prompt | Settings | ideogram.brandTemplate.baseNegativePrompt | ”no text, no watermarks, no signatures” applies everywhere. |
| Per-file style override | Frontmatter | image_style_type (optional) | One essay needs REALISTIC even though brand default is DESIGN. Read but never written by the plugin. |
| Per-file extra negative prompt | Frontmatter | image_negative_prompt (optional) | Appended to the brand-wide base. |
| Per-file seed (reproducibility) | Frontmatter | image_seed (optional) | Lets a note pin a successful generation. |
| Sizes / aspect ratio | Modal | toggles (existing ImageSize presets) | Per-call decision; no change from Recraft pattern. |
| Layerize-text post-processing | Settings + per-call toggle | ideogram.layerizeText default; modal toggle override | Brand-wide policy with one-off opt-out. |
Final-prompt assembly (service-layer responsibility)
finalPrompt = brandTemplate.prefix
+ (frontmatter.image_prompt or modal.imagePrompt)
+ brandTemplate.suffix
negativePrompt = brandTemplate.baseNegativePrompt
+ (frontmatter.image_negative_prompt ?? '')
styleType = frontmatter.image_style_type ?? settings.ideogram.defaults.styleType
If the prefix/suffix contain a literal {prompt} token, prefer placeholder-substitution over concatenation — gives the user explicit control over exactly where the subject matter slots into the template (e.g. “Editorial illustration in our house style: {prompt}, on a soft pastel background, viewed from above”).
What deliberately does not go in the frontmatter
- API key. Settings only.
num_images. Always 1 in this plugin (one image per size). If the user wants variants, they re-run with a different seed.magic_promptat per-file level. Brand voice consistency would be undermined by per-file toggles. Settings only.- The brand template itself. It’s a brand-wide constant. Per-file overrides of the template would defeat its purpose.
This split is the central design decision in the Ideogram integration — it determines what the user has to think about at write-time vs. what they configure once.
Constraints inherited from this repo (re-read before coding)
These are codified in CLAUDE.md and apply uniformly to every provider:
- No new dependencies. Do not
pnpm addanything to support a new provider. Propose first if absolutely needed. - HTTP via Obsidian
requestUrl— neverfetch. Obsidian’s environment lacks browser CORS/CSP semantics andFormDatais unavailable. The Ideogram reference (astro-knots) is Node 22 build-time code and usesfetch+FormDatafreely; that code is a contract reference, not a paste source. - Multipart bodies are hand-built. See
src/services/imagekitService.tsfor the established pattern (boundary string + manually concatenatedUint8Arrayparts). Ideogram’s/v1/ideogram-v3/generateand/layerize-textare both multipart endpoints — they will need this treatment. - TypeScript is very strict.
exactOptionalPropertyTypes,noUncheckedIndexedAccess,noUnusedLocals,noUnusedParameters,useUnknownInCatchVariables. Prefernulloverundefinedfor optional state.tsc -noEmitruns before everypnpm buildand will fail the build on any of these. - No YAML library. If you need to read/write YAML keys, use
app.fileManager.processFrontMatter(Obsidian’s own emitter) — seeCurrentFileModal.updateFrontmatter. For ad-hoc parsing, extendsrc/utils/yamlFrontmatter.ts; do not addjs-yaml. - Logger, not
console.*. Useloggerfromsrc/utils/logger. The plugin pipes it to.obsidian/plugins/image-gin-plugin/log.json. - Settings are one flat
ImageGinSettingsobject. All provider-specific settings live in a nested sub-object (recraft*fields are flat for legacy reasons; new providers should be nested, likeimageKit,magnific,imageCache). isDesktopOnly: true— NodeBuffer,fs,require()are available. The Recraft service already usesBuffer.from(...)andrequire('fs')for absolute-path saves; Ideogram can do the same.main.tsstays thin. Three things only: load settings, register the settings tab, register commands. All real logic insrc/services/andsrc/modals/.
Architecture mapping: Ideogram reference → image-gin layers
The astro-knots Ideogram connector (api-connectors/ideogram.ts) is a single file that mixes API I/O, error type, and a convenience pipeline. In image-gin’s layered architecture this splits across three files:
| External reference (one file) | image-gin equivalent | Layer |
|---|---|---|
requireKey(), IdeogramApiError | IdeogramService constructor + thrown Error | service |
generate(), layerizeText() | IdeogramService.generateImage, .layerizeText | service |
downloadEphemeralImage() | inline in service (use requestUrl, get arrayBuffer) | service |
generateAndLayerize() pipeline | composed in modal callback (or service helper) | service/modal |
Type exports (AspectRatio, etc.) | re-exported from service file | service |
Note the body-format mismatch:
- Reference uses
FormDatafor both endpoints → won’t work in Obsidian. - Replacement: hand-built multipart body.
/generateis text-only fields, so a JSON-equivalent path is possible — but the Ideogram API spec is multipart, so build multipart. Reuse the boundary helper fromimagekitService.ts. Lift it intosrc/utils/multipart.tsif a second caller now exists.
Step-by-step plan
Phase 1 — Settings shape
File: src/settings/settings.ts
-
Add Ideogram type unions and a nested settings interface that reflects the frontmatter/settings split decided above:
export type IdeogramRenderingSpeed = 'FLASH' | 'TURBO' | 'DEFAULT' | 'QUALITY'; export type IdeogramStyleType = 'AUTO' | 'GENERAL' | 'REALISTIC' | 'DESIGN' | 'FICTION'; export type IdeogramMagicPrompt = 'AUTO' | 'ON' | 'OFF'; export interface IdeogramBrandTemplate { /** Inserted before the per-file prompt. May contain `{prompt}` to control insertion point. */ prefix: string; /** Inserted after the per-file prompt (ignored if `prefix` contains `{prompt}`). */ suffix: string; /** Always-applied negative prompt; per-file `image_negative_prompt` is appended. */ baseNegativePrompt: string; } export interface IdeogramDefaults { renderingSpeed: IdeogramRenderingSpeed; styleType: IdeogramStyleType; magicPrompt: IdeogramMagicPrompt; } export interface IdeogramSettings { enabled: boolean; apiKey: string; brandTemplate: IdeogramBrandTemplate; defaults: IdeogramDefaults; /** Run Layerize Text after generate, by default. Modal can override per-call. */ layerizeText: boolean; } -
Add
ideogram: IdeogramSettingstoImageGinSettings. -
Extend
DEFAULT_SETTINGSwith sensible defaults:ideogram: { enabled: false, apiKey: '', brandTemplate: { prefix: '', suffix: '', baseNegativePrompt: 'no text, no watermarks, no signatures, no captions', }, defaults: { renderingSpeed: 'DEFAULT', styleType: 'GENERAL', magicPrompt: 'AUTO', }, layerizeText: false, }The user’s existing
data.jsonentry ({ enabled: true, apiKey: '<key>' }) shallow-merges over this — the nested defaults fill in. Caveat:loadSettings()does a one-level spread ({ ...DEFAULT_SETTINGS, ...loadedSettings }), so a partialideogram: { enabled, apiKey }indata.jsonwill overwrite the defaultideogramobject wholesale, droppingbrandTemplate/defaults/layerizeText. Either deep-merge inloadSettings()(preferred — generalize it for all nested provider blocks) or backfill missing nested keys explicitly during the Ideogram-specific load path. Pick deep-merge; it’s the right fix and benefits Magnific and ImageKit too. -
Add an “Ideogram Image Generation” section to
ImageGinSettingTab.display(). Layout (top-to-bottom):- Toggle: “Enable Ideogram Integration” (gates everything below)
- Text: API Key
- Brand Template subsection:
- Multi-line textarea: “Prompt prefix” (with helper text: “Use
{prompt}to control where the per-file prompt is inserted; otherwise it’s appended.”) - Multi-line textarea: “Prompt suffix”
- Multi-line textarea: “Base negative prompt”
- Multi-line textarea: “Prompt prefix” (with helper text: “Use
- Defaults subsection:
- Dropdown: Rendering speed (
FLASH | TURBO | DEFAULT | QUALITY) - Dropdown: Style type (
AUTO | GENERAL | REALISTIC | DESIGN | FICTION) - Dropdown: Magic prompt (
AUTO | ON | OFF)
- Dropdown: Rendering speed (
- Toggle: “Layerize text after generate” (default off)
Place this section under the Recraft block, before ImageKit (mirrors the order of provider modality: generate → CDN-upload → search).
Why the gating toggle even though the user has
enabled: true: settings persistence treatsenabledas a soft switch users may flip from the UI without removing the key. The command callback should refuse to open the modal when!settings.ideogram.enabled(see Phase 4 — same pattern asMagnificModal).
Phase 1.5 — Frontmatter contract
The Ideogram modal reads the following frontmatter keys from the active file:
| Key | Required | Type | Effect |
|---|---|---|---|
image_prompt | yes | string | Subject-matter prompt. Reused from Recraft (same setting key: imagePromptKey). |
image_negative_prompt | no | string | Appended to brandTemplate.baseNegativePrompt. |
image_style_type | no | enum | Overrides defaults.styleType. Validated against the enum; invalid values fall back to default with a Notice. |
image_seed | no | number | Pins the seed for reproducibility. |
The plugin writes only:
image_prompt(when the user edits it in the modal and toggles “Write prompt to frontmatter” on — same as Recraft)- The image path keys (e.g.
banner_image) — same as Recraft
It does not write image_negative_prompt, image_style_type, or image_seed. Those are read-only inputs the user manages by hand. (Writing them back would silently re-render the modal’s default checkboxes as “user-set,” which is a bad UX trap.)
Reuse the existing
imagePromptKeysetting (don’t introduce a separateideogramPromptKey). The whole point of sharing the key is that switching providers on a note doesn’t require re-typing.
Phase 2 — Service
New file: src/services/ideogramService.ts
Mirror RecraftImageService (≈ 250 lines) — same constructor signature (settings: ImageGinSettings, vault: Vault), same saveImage / getImagePath methods (lift these to a small shared util only if a third provider arrives — premature abstraction warning).
Method surface:
export class IdeogramService {
constructor(settings: ImageGinSettings, vault: Vault);
async generateImage(opts: IdeogramGenerateOptions): Promise<GeneratedImage>;
async layerizeText(image: ArrayBuffer, opts?: LayerizeOptions): Promise<GeneratedImage>;
async generateAndLayerize(opts: IdeogramGenerateOptions): Promise<GeneratedImage>;
async saveImage(image: GeneratedImage, filePath: string): Promise<TFile | null>;
getImagePath(baseName: string, width: number, height: number, timestamp: number): string;
}
Reuse the GeneratedImage interface from recraftImageService.ts — re-export it from a shared place (src/services/types.ts) when adding the second provider, rather than duplicating. Keep the change footprint small: move the type, update Recraft’s import, done. Do not refactor Recraft’s logic during this work.
Implementation notes specific to Ideogram:
-
Endpoint constants:
https://api.ideogram.ai/v1/ideogram-v3/generatehttps://api.ideogram.ai/v1/ideogram-v3/layerize-text
-
Auth header:
Api-Key: <key>(note: capitalA, notAuthorization: Bearer). -
Multipart body: hand-build per
imagekitService.ts. Fields for/generate:prompt,aspect_ratio,rendering_speed,style_type, optionalmagic_prompt,negative_prompt,seed,num_images. Field for/layerize-text:image(file part withContent-Type: image/png), optionalprompt,seed. -
Aspect ratio vs. width/height: Ideogram uses string aspect ratios (
1x1,16x9,2x3, …) instead of explicit pixel sizes. Map the existingImageSizepresets to the closest Ideogram ratio. Add a small lookup helper inside the service:function pickAspectRatio(width: number, height: number): IdeogramAspectRatio { // closest-match by ratio; fallback to '1x1' }Document the lookup behavior — image-gin users define sizes in pixels, and Ideogram will not render arbitrary pixel sizes. This is a real semantic gap; expose the resolved aspect ratio in the modal preview if practical.
-
Response handling: Ideogram returns
{ data: [{ url, prompt, resolution, seed, ... }] }. Downloaddata[0].urlimmediately (URLs are ephemeral S3 — same caveat as the reference). UserequestUrl({ url, method: 'GET' })thenBuffer.from(response.arrayBuffer).toString('base64')— exactly the pattern inrecraftImageService.generateImage:158-170. -
Errors: throw plain
Errorwith the endpoint, status, and body. Do not create a customIdeogramApiErrorclass unless the modal needs to branch on it (the reference defines one for build-time CI logging — image-gin runs interactively and usesNotice+logger).
Phase 3 — Modal
New file: src/modals/IdeogramModal.ts
Reuse the CurrentFileModal shape — copy it, swap RecraftImageService for IdeogramService, replace the style section with Ideogram-specific UI. Do not add a provider selector to CurrentFileModal; keeping the two modals separate validates the integration end-to-end without churning the Recraft UI. A unified-provider modal can be considered later, after the user has lived with both.
Modal sections (top-to-bottom):
- Image Prompt — same textarea as Recraft, pre-filled from
image_promptfrontmatter. (Subject matter only — the brand template is invisible here; the user is editing the slot, not the wrapping.) - Resolved Prompt Preview (new) — read-only display showing the fully assembled prompt (
prefix + image_prompt + suffix, with{prompt}substitution). Critical because the brand template is hidden in settings; without a preview, the user has no idea what’s actually being sent. Update live as they edit the textarea. - Image Sizes — same toggles as Recraft.
- Per-call Overrides (collapsible, defaults from settings):
- Style type dropdown — initial value comes from
frontmatter.image_style_type ?? settings.defaults.styleType - Rendering speed dropdown — initial value from
settings.defaults.renderingSpeed - Magic prompt dropdown — initial value from
settings.defaults.magicPrompt - Negative prompt textarea — initial value is the assembled negative prompt (
base + frontmatter.image_negative_prompt); user can further append/edit per-call - Toggle: “Layerize text after generate” — initial value from
settings.layerizeText
- Style type dropdown — initial value comes from
- Frontmatter Options — “Write prompt to frontmatter” toggle (same as Recraft). Writes only
image_promptand the image path keys; never the override fields. - Generate Button.
Modal-level state holds the resolved (post-override) values. The service receives only resolved values — it doesn’t re-read settings or frontmatter. This keeps the service stateless w.r.t. UI choices and makes per-call overrides trivially correct.
Frontmatter writes use this.app.fileManager.processFrontMatter — same as Recraft path. The image path key (banner_image, etc.) is already provider-agnostic; no change needed to ImageSize.yamlKey.
Phase 4 — Command registration
File: main.ts
Add a fourth (or fifth) command, gated on enabled, mirroring the Magnific command at main.ts:65-75:
this.addCommand({
id: 'generate-images-ideogram',
name: 'Generate Images (Ideogram)',
callback: () => {
if (this.settings.ideogram.enabled) {
new IdeogramModal(this.app, this).open();
} else {
new Notice('Ideogram integration is not enabled. Enable it in settings.');
}
}
});
Keep main.ts thin — no logic beyond settings/setting-tab/commands.
Phase 5 — Verify
pnpm build— must passtsc -noEmitwith zero errors. The strict-TS settings will catch most integration mistakes (unused imports,anyleaks, missing optional-property handling). If the build fails onnoUncheckedIndexedAccessfordata[0], that is a real bug — guard it the wayRecraftImageService.generateImage:150does.- Live-test in a vault. Symlink the repo into
<vault>/.obsidian/plugins/, restart Obsidian, run the new command on a file with animage_promptfrontmatter key. - Verify: image is downloaded, base64-decoded, written to
imageOutputFolder, and thebanner_image(or matchingyamlKey) frontmatter is updated. Check.obsidian/plugins/image-gin-plugin/log.jsonfor the request/response trace.
Out of scope for this blueprint
- Refactoring Recraft to share more code with Ideogram. Two providers is not enough signal to abstract. Wait for the third.
- Adding
layerize-textas a standalone command. Phase 1 ships it as a post-generate toggle on the Ideogram modal. A standalone command on top of an arbitrary local image is a separate feature. - Migration of legacy flat
recraft*settings intorecraft: { ... }. Out of scope; would break the existingdata.jsonshape and require a migration shim like thefreepik → magnificone inmain.ts:18-22. - Provider-agnostic command (
Generate Images) with internal routing. Considered as Phase 3(b) above; deferred.
Files touched (summary)
New:
src/services/ideogramService.tssrc/modals/IdeogramModal.ts
Modified:
src/settings/settings.ts— addIdeogramSettings, extendImageGinSettings, extendDEFAULT_SETTINGS, add settings-tab UI blockmain.ts— register new commandsrc/services/types.ts(only if extractingGeneratedImage) — new shared types module
Reference, not modified:
src/services/recraftImageService.ts— pattern sourcesrc/services/imagekitService.ts— multipart-body pattern sourcesrc/modals/CurrentFileModal.ts— modal-shape source/Users/mpstaton/code/lossless-monorepo/astro-knots/sites/fullstack-vc/src/utils/api-connectors/ideogram.ts— Ideogram API contract reference (request/response shapes, field names, parameter enums). Do not paste — Obsidian environment is incompatible withfetch/FormData.