ANNÁVE PDF Engine: Built for My Own Products First

Inputs map to one JSON document standard (the AST); YAML customizes output; new formats and products plug in at the parser or JSON layer.

Every product I ship eventually needs PDF — a report, an in-app export, the public converter, structured data rendered as pages. I built one pipeline so the same layout rules apply whether the input is a file upload, CLI stdin, or records from an app.

ANNÁVE PDF Engine is that pipeline: one document model, one layout pass, one renderer. The open-source Go project is annavetech/annave-pdf-engine-golang. Full reference: docs.annave.tech/pdf-engine. This note is where it shows up in my stack.

Pipeline

In the Go engine, every path converges on the same stages after the AST exists:

text
Normalize → Sanitize HTML → Parse → Validate → Layout → Paginate → Render → PDF bytes

Parse includes a dedicated JSON path: format=json, or auto-detect on {, maps document.v1 JSON into the AST. In Go you can also call RunFromDoc when the AST is already built — that entry point starts at validate. HTTP, CLI, and file I/O stay outside the core.

The AST and the document standard

AST here means the engine’s internal tree of blocks: headings, paragraphs, tables, lists, code, images, and inline spans for bold, links, and similar markup. It is not PDF syntax and not HTML. It is my document standard — the shape every path must end up in before layout runs.

In theory any input format can plug in: Markdown, DOCX, CSV, YAML, images such as PNG or JPG, or data from your own app. Each parser implements port.DocumentParser in internal/parser/, reads its format, and maps it into the AST. I do not mean “convert to PDF” at this step — I mean normalize foreign structure into one predictable tree. When you serialize that tree, it is JSON with type: "document", version, and a children array — the same schema whether the source was a file upload or generated in code. Schema: `document.v1.schema.json`.

Product integrations usually send JSON that matches the schema with format=json on POST /convert, or call RunFromDoc in Go with a *ast.DocumentNode you built yourself. Your domain model emits blocks, not PDF coordinates.

json
{
  "type": "document",
  "version": "1",
  "children": [
    { "type": "heading", "level": 1, "text": "Report title" },
    { "type": "paragraph", "text": "Summary." },
    {
      "type": "table",
      "headers": ["Date", "Value"],
      "rows": [["2026-05-01", "42"]]
    }
  ]
}

Once the document is in this form, layout, pagination, and render do not care whether the source was a DOCX upload, Markdown, or a Swift struct mapped field by field.

Customization without forking layout code

Most visual and operational tuning is YAML, embedded at build time — no need to patch the renderer for a new margin or font size.

File What you change
`config/style.yaml` Page size, margins, font sizes, line heights, colors
`config/limits.yaml` Max upload size, max nodes, max pages
`config/messages.yaml` Error codes and user-facing messages

Edit YAML, go build, deploy — that is a customized engine binary for your product or tenant. Per request, the CLI and HTTP API accept a --style / style JSON override on top of those defaults. Configuration is documented on docs.annave.tech/pdf-engine.

Adding a format

A new format is a new parser, not a new PDF writer.

  1. Implement DocumentParser in internal/parser/ with CanParse and Parse returning *ast.DocumentNode.
  2. Register the format string and file extension in internal/parser/registry.go.
  3. Add a fixture test; update config/messages.yaml if the unsupported-format list changes.

Nothing after parse changes: validate, layout, paginate, and render stay as they are. That is why adding CSV versus DOCX was incremental work, not a second engine.

Integrating with any product

Three common patterns:

  • HTTPPOST /convert with a file, raw body, or JSON document; get PDF bytes back. The Angular tool and any backend can call the same endpoint. See the integration guide.
  • Go library — import annave.tech/pdf-engine, run pipeline.Run(input, format) or RunFromDoc(doc) when the AST is already built.
  • JSON in — POST document.v1 JSON with format=json, or build nodes in Go and call RunFromDoc.

PasspoPet uses Swift on device, but the stages match: map pet profile records into the same block types, paginate, render. Different language; same contract. Self-host the Go binary or embed the package in your service.

Step-by-step parser template: docs.annave.tech/pdf-engine/architecture.

Where I use it

ANNÁVE PDF

The web tool at annave.tech/tools/pdf is the most visible surface. Preview runs the TypeScript layout pipeline in the browser for fast feedback while you edit. Export sends the file or text to the Go engine via POST /convert; the server returns PDF bytes and does not keep the upload. Same engine as annave pdf convert and self-hosted deploys. Upload path: How ANNÁVE PDF conversion works.

CV

The CV at annave.tech/cv is a structured document: sections, roles, skills blocks — the same class of problem as “export my profile as PDF”. Download PDF uses the browser print path on that page today. The Go engine is the server-side path for the same document-and-pagination rules when I need PDF without a one-off layout fork per repo.

PasspoPet

PasspoPet is a separate consumer product: iOS, local-first. v1 needs a pet profile PDF for travel and vet visits. That export is not the Go binary on the phone — it is a Swift implementation that follows the same pipeline idea as the golang engine: normalize input, map records into blocks, paginate, render. One design language; platform-appropriate code.

Why one engine instead of many

Whatever I ship next, I assume PDF will show up again — compliance export, tables, optional sections that disappear when data is missing, footers on page two. Owning the pipeline means I fix pagination once and keep table rules consistent between a CSV upload and a structured export, instead of rewriting layout in each codebase.

Limits

Wide tables on A4 still need judgment. Complex marketing HTML is not browser layout — we parse structure, not flexbox. Image-heavy DOCX deserves a real test before you promise fidelity. Domain models should not carry PDF coordinates; empty sections should omit, not render blank headings. Style belongs in config, not hardcoded in each app.

Go deeper

New format or product integration: contact@annave.tech.