AIApril 9, 2026

Token Optimization: How Markdown Saves You Money on AI API Calls

person

Sarah Chen

ML Engineer

schedule5 min read

If you are calling the OpenAI, Anthropic, or Google AI APIs, you are paying per token. Input tokens, output tokens, every one of them costs money. And if you are feeding documents into those APIs, the format of your input directly affects your bill.

Raw PDF text, HTML, and plain copy-paste all waste tokens on noise — repeated headers, broken whitespace, verbose markup tags. Markdown eliminates that noise while preserving the structure your model needs. The result: same information, fewer tokens, lower cost.

What Eats Your Tokens

To understand the savings, you need to understand what wastes tokens in the first place.

Raw PDF Text Extraction

When you extract text from a PDF, you get everything the PDF renderer sees:

  • Page headers and footers repeated on every page. A 40-page report includes "Acme Corp — Confidential — Q3 2026" forty times.
  • Page numbers. "Page 1 of 40" through "Page 40 of 40" — pure waste.
  • Broken line wraps. PDFs wrap text at fixed visual boundaries, producing fragments like "The com-\npany achieved" instead of "The company achieved".
  • Column artifacts. Multi-column layouts produce interleaved text from left and right columns.
  • Table destruction. A structured table becomes a flat string of numbers with no alignment.

All of these problems produce tokens that carry no useful information but still cost money.

HTML

HTML is even worse for token efficiency. Consider a simple heading:

<div class="section-header mt-8 mb-4">
  <h2 class="text-2xl font-bold text-gray-900">
    Quarterly Results
  </h2>
</div>

In Markdown, the same heading is:

## Quarterly Results

Two tokens versus dozens. Multiply that across every element on a page, and HTML typically uses 3-5x more tokens than Markdown for the same content. The CSS classes, div wrappers, attributes, and closing tags all tokenize into billable units.

Markdown

Markdown's syntax is minimal by design. A heading is ##. Bold is **. A table uses | pipes. The overhead is roughly 5-10% beyond the raw text content, and that small overhead carries critical structural information.

Real Numbers

Here is a comparison from a 15-page corporate earnings report, tokenized with the GPT-4 tokenizer (cl100k_base):

Input FormatToken CountStructural InfoWaste
Raw PDF copy-paste~9,800None~40%
Raw HTML (from web version)~28,500Full~75%
Cleaned HTML (tags stripped)~7,200None~20%
Markdown~5,600Full<10%

Markdown delivers the best of both worlds: structural preservation with minimal token overhead.

What This Costs

Current API pricing (as of spring 2026) for popular models:

ModelInput Cost (per 1M tokens)
GPT-5$1.75
GPT-4o$2.50
Claude Sonnet 4.6$3.00
Claude Haiku 4.5$0.80
Gemini 3.1 Pro$2.00
Gemini 2.5 Flash$0.15
Grok 4.1$0.20

These numbers look small per token. They add up fast at scale.

Say you are processing 1,000 documents per month through a RAG pipeline, averaging 10,000 tokens each in raw PDF form. That is 10M input tokens per month. With GPT-4o:

  • Raw PDF text: 10M tokens × $2.50/1M = $25/month
  • Markdown: 5.6M tokens × $2.50/1M = $14/month

You save roughly $11/month — or 44%. Not dramatic for a small operation, but scale it to 50,000 documents per month and you are saving $550/month just on input tokens.

For Claude Sonnet 4.6 at $3/1M, the same 50,000 documents:

  • Raw: $150/month
  • Markdown: $84/month
  • Savings: $66/month

And this is only input tokens. Better-structured input also tends to produce more concise output — the model does not need to restate or clarify ambiguous data — saving output tokens too.

Beyond Cost: Context Window Efficiency

Token savings are not just about money. They are about fitting more content into the model's context window.

Every model has a limit. Even if you are not paying per token (using a flat-rate subscription), you still cannot exceed the context window. Fewer tokens per document means:

  • More documents per conversation. Compare three reports instead of two.
  • More room for instructions. A detailed system prompt with formatting rules, persona, and constraints can easily consume 2,000-3,000 tokens. If your document wastes 4,000 tokens on noise, your instructions compete with your data for space.
  • Longer conversations. Multi-turn interactions accumulate tokens. Starting with a leaner document means the conversation lasts longer before hitting the limit.

Practical Implementation

Single Document Conversion

Convert your document before sending it to the API:

import httpx

# Convert PDF to Markdown
with open("report.pdf", "rb") as f:
    response = httpx.post(
        "https://mdstill.com/api/convert",
        files={"file": ("report.pdf", f, "application/pdf")}
    )
    markdown = response.text

# Send clean Markdown to your LLM
# (using your preferred SDK)

Batch Processing for RAG Pipelines

For ingestion pipelines that process many documents:

from pathlib import Path
import httpx

docs_dir = Path("./documents")
output_dir = Path("./markdown")
output_dir.mkdir(exist_ok=True)

for pdf in docs_dir.glob("*.pdf"):
    with open(pdf, "rb") as f:
        resp = httpx.post(
            "https://mdstill.com/api/convert",
            files={"file": (pdf.name, f, "application/pdf")}
        )
    (output_dir / f"{pdf.stem}.md").write_text(resp.text)

Convert once, use many times. The Markdown output is stored locally and can be re-embedded, re-chunked, or re-sent to different models without repeating the conversion.

Pre-Processing Tips

After conversion, you can further optimize tokens:

  1. Remove boilerplate. If every document has the same 200-word legal disclaimer, strip it before sending to the API.
  2. Split by sections. Use Markdown headings as natural chunk boundaries. Send only the relevant sections instead of the full document.
  3. Summarize first, detail second. For very long documents, first ask the model to summarize the Markdown, then drill into specific sections in follow-up messages.

When Not to Optimize

Token optimization matters when you are operating at scale or when your documents are bumping against context window limits. For a one-off question about a short document, the overhead of any format is negligible.

The sweet spot is automated pipelines — RAG ingestion, batch analysis, document processing workflows — where you are converting hundreds or thousands of documents. That is where format choice directly affects your monthly bill and your output quality.

Start Saving

Drop a document on mdstill and compare the token count against your current raw extraction. The savings scale linearly with volume — the more documents you process, the more you save.

#tokens#ai#api#cost#markdown#optimization#chatgpt#claude

Related technical reads

View allarrow_forward