API Documentation

Convert documents to clean Markdown programmatically. One POST request, one file, Markdown back. Works in any language that can speak HTTP.

Quickstart

Your first conversion in 30 seconds

No signup required for the free anonymous tier. Just send a file:

bash
curl -X POST https://mdstill.com/api/convert \
  -F "file=@document.pdf"

You get JSON back with the Markdown and some metadata:

json
{
  "markdown": "# Document Title\n\nConverted content here...",
  "metadata": {
    "filename": "document.pdf",
    "format": ".pdf",
    "converter": "fast",
    "size_bytes": 245760,
    "conversion_time_sec": 0.42,
    "markdown_length": 8320,
    "token_count": 2080
  }
}

That is the whole API. Everything below is just details: authentication for higher limits, other languages, error handling, rate limits.

Authentication

Authentication is optional. Anonymous requests work but share a small per-IP daily quota. Sign up for a free account and generate an API key in your Dashboard to get a higher per-account limit.

Pass your key in the Authorization header:

http
Authorization: Bearer mdr_your_api_key_here

API usage counts toward the same daily quota as the web interface. You can generate up to 5 API keys per account and revoke unused keys from the Dashboard.

Base URL

text
https://mdstill.com

Endpoint

POST
/api/convert

Convert a document to Markdown

Request

Send a multipart/form-data request with the file attached.

file
filerequired

The document to convert. Supported formats: PDF, DOCX, DOC, PPTX, XLSX, XLS, HTML, HTM, EPUB, CSV, JSON, XML, ZIP, RTF, ODT, Pages, Numbers, Keynote.

output
string

Response format. markdown (default) returns just the Markdown and metadata. structured adds a structure object with semantic sections, document outline, and token-counted chunks ready for RAG ingestion.

chunk_tokens
integer

Max tokens per chunk (soft limit). Range: 100 – 4000, default 500. Atomic content blocks (tables, code blocks) are never split mid-element, so individual chunks may exceed this value. Only used when output=structured.

Response format

By default the API returns JSON. You can also request the raw Markdown file directly using the Accept header.

Accept: application/jsonDefault. Returns JSON with markdown and metadata fields.
Accept: text/markdownReturns the .md file directly as a download. No metadata.

File response (Accept: text/markdown):

http
HTTP/1.1 200 OK
Content-Type: text/markdown; charset=utf-8
Content-Disposition: attachment; filename="report.md"

# Document Title

Converted content here...

Code samples

Every sample below does the same thing: upload document.pdf, parse the JSON response, and save the Markdown. Replace mdr_your_api_key with your real key, or drop the Authorization header for anonymous usage.

Simplest form — returns JSON:

bash
curl -X POST https://mdstill.com/api/convert \
  -H "Authorization: Bearer mdr_your_api_key" \
  -F "file=@document.pdf"

Download as a .md file directly:

bash
curl -X POST https://mdstill.com/api/convert \
  -H "Authorization: Bearer mdr_your_api_key" \
  -H "Accept: text/markdown" \
  -F "file=@report.pdf" \
  -o report.md

Or extract markdown from JSON with jq:

bash
curl -s -X POST https://mdstill.com/api/convert \
  -H "Authorization: Bearer mdr_your_api_key" \
  -F "file=@report.pdf" \
  | jq -r '.markdown' > report.md

Structured / RAG-ready output

Chunks with metadata for vector databases

Add output=structured to get the Markdown plus a structureobject with semantic sections, a document outline, and token-counted chunks. Each chunk includes a heading path and content type labels — ready to drop into LangChain, LlamaIndex, or any RAG pipeline.

bash
curl -X POST https://mdstill.com/api/convert \
  -H "Authorization: Bearer mdr_your_api_key" \
  -F "file=@report.pdf" \
  -F "output=structured" \
  -F "chunk_tokens=500"

The response includes everything from the standard response, plus a structure field:

json
{
  "markdown": "# Introduction\n\nThis report covers...",
  "metadata": {
    "filename": "report.pdf",
    "format": ".pdf",
    "converter": "fast",
    "size_bytes": 245760,
    "conversion_time_sec": 0.42,
    "markdown_length": 8320,
    "token_count": 2080
  },
  "structure": {
    "sections": [
      {"heading": "Introduction", "level": 1, "content": "This report covers...", "tokens": 342},
      {"heading": "Methods", "level": 1, "content": "We used...", "tokens": 518}
    ],
    "headings": ["Introduction", "Methods", "Results", "Discussion"],
    "total_tokens": 2080,
    "max_chunk_tokens": 500,
    "chunks": [
      {
        "id": 0,
        "text": "# Introduction\n\nThis report covers...",
        "tokens": 342,
        "heading_path": ["Introduction"],
        "content_types": ["paragraph"]
      },
      {
        "id": 1,
        "text": "# Methods\n\nWe used a mixed-methods approach...",
        "tokens": 487,
        "heading_path": ["Methods"],
        "content_types": ["paragraph", "table"]
      }
    ]
  }
}

Chunk fields

id
integer

Sequential chunk index, starting from 0.

text
string

The chunk content as Markdown text.

tokens
integer

Exact token count (tiktoken cl100k_base, compatible with GPT-4 and Claude).

heading_path
string[]

Breadcrumb of parent headings, e.g. ["Chapter 1", "Methods", "Data Collection"]. Empty for documents without headings (most PDFs).

content_types
string[]

Types of content in this chunk: paragraph, table, list, code.

How chunking works

Convert and get chunks ready for a vector database:

python
import requests

with open("report.pdf", "rb") as f:
    resp = requests.post(
        "https://mdstill.com/api/convert",
        headers={"Authorization": "Bearer mdr_your_api_key"},
        files={"file": f},
        data={"output": "structured", "chunk_tokens": "500"},
    )

data = resp.json()
chunks = data["structure"]["chunks"]

# Each chunk is ready for embedding
for chunk in chunks:
    print(f"Chunk {chunk['id']}: {chunk['tokens']} tokens")
    print(f"  Path: {' > '.join(chunk['heading_path']) or '(root)'}")
    print(f"  Types: {chunk['content_types']}")
    # embed(chunk["text"])  # your embedding call here

Error codes

All errors return a JSON body with a detail field explaining what went wrong:

json
{
  "detail": "File too large (45MB). Max: 20MB"
}
CodeMeaningWhen it happens & how to fix
400Bad RequestUnsupported format, malformed filename, or file exceeds your plan's size limit. The detail field says which. Check the supported formats list and your plan limits.
408Request TimeoutConversion took longer than the server timeout. Usually means the file is very large or structurally complex (deeply nested tables, thousands of pages). Try splitting the document.
413Payload Too LargeUpload exceeded the absolute body-size ceiling before even reaching the converter. Hard cap independent of plan. Split the file and convert pieces separately.
429Too Many RequestsDaily quota exhausted. The detail field shows current usage (used/limit). Quotas reset at 00:00 UTC. Sign up or upgrade for a higher limit.
500Internal Server ErrorThe converter hit an unexpected error on this specific file. Usually a corrupted or unusual input. Retry once; if it still fails, the file format is likely outside what we handle.
503Service UnavailableServer temporarily overloaded or a dependency is degraded. Retry with exponential backoff (1s, 2s, 4s).

There is currently no 401 Unauthorized on the convert endpoint — an invalid or missing API key simply falls through to the anonymous tier and its lower quota.

Rate limits

Limits are daily, per plan. Anonymous requests count against a per-IP quota; authenticated requests count against your account's quota regardless of which API key was used.

Supported formats

.pdf

PDF

.docx

DOCX

.doc

DOC

.pptx

PowerPoint

.xlsx

XLSX

.xls

XLS

.html

HTML

.htm

HTM

.epub

EPUB

.csv

CSV

.json

JSON

.xml

XML

.zip

ZIP archive

.rtf

RTF

.odt

ODT

.pages

Pages

.numbers

Numbers

.key

Keynote

Notes