API Documentation
Convert documents to clean Markdown programmatically. One POST request, one file, Markdown back. Works in any language that can speak HTTP.
Quickstart
Your first conversion in 30 secondsNo signup required for the free anonymous tier. Just send a file:
curl -X POST https://mdstill.com/api/convert \
-F "file=@document.pdf"You get JSON back with the Markdown and some metadata:
{
"markdown": "# Document Title\n\nConverted content here...",
"metadata": {
"filename": "document.pdf",
"format": ".pdf",
"converter": "fast",
"size_bytes": 245760,
"conversion_time_sec": 0.42,
"markdown_length": 8320,
"token_count": 2080
}
}That is the whole API. Everything below is just details: authentication for higher limits, other languages, error handling, rate limits.
Authentication
Authentication is optional. Anonymous requests work but share a small per-IP daily quota. Sign up for a free account and generate an API key in your Dashboard to get a higher per-account limit.
Pass your key in the Authorization header:
Authorization: Bearer mdr_your_api_key_hereAPI usage counts toward the same daily quota as the web interface. You can generate up to 5 API keys per account and revoke unused keys from the Dashboard.
Base URL
https://mdstill.comEndpoint
/api/convertConvert a document to Markdown
Request
Send a multipart/form-data request with the file attached.
fileThe document to convert. Supported formats: PDF, DOCX, DOC, PPTX, XLSX, XLS, HTML, HTM, EPUB, CSV, JSON, XML, ZIP, RTF, ODT, Pages, Numbers, Keynote.
outputResponse format. markdown (default) returns just the Markdown and metadata. structured adds a structure object with semantic sections, document outline, and token-counted chunks ready for RAG ingestion.
chunk_tokensMax tokens per chunk (soft limit). Range: 100 – 4000, default 500. Atomic content blocks (tables, code blocks) are never split mid-element, so individual chunks may exceed this value. Only used when output=structured.
Response format
By default the API returns JSON. You can also request the raw Markdown file directly using the Accept header.
Accept: application/jsonDefault. Returns JSON with markdown and metadata fields.Accept: text/markdownReturns the .md file directly as a download. No metadata.File response (Accept: text/markdown):
HTTP/1.1 200 OK
Content-Type: text/markdown; charset=utf-8
Content-Disposition: attachment; filename="report.md"
# Document Title
Converted content here...Code samples
Every sample below does the same thing: upload document.pdf, parse the JSON response, and save the Markdown. Replace mdr_your_api_key with your real key, or drop the Authorization header for anonymous usage.
Simplest form — returns JSON:
curl -X POST https://mdstill.com/api/convert \
-H "Authorization: Bearer mdr_your_api_key" \
-F "file=@document.pdf"Download as a .md file directly:
curl -X POST https://mdstill.com/api/convert \
-H "Authorization: Bearer mdr_your_api_key" \
-H "Accept: text/markdown" \
-F "file=@report.pdf" \
-o report.mdOr extract markdown from JSON with jq:
curl -s -X POST https://mdstill.com/api/convert \
-H "Authorization: Bearer mdr_your_api_key" \
-F "file=@report.pdf" \
| jq -r '.markdown' > report.mdUsing requests (standard in most Python setups):
import requests
with open("document.pdf", "rb") as f:
response = requests.post(
"https://mdstill.com/api/convert",
headers={"Authorization": "Bearer mdr_your_api_key"},
files={"file": f},
timeout=60,
)
response.raise_for_status()
data = response.json()
markdown = data["markdown"]
meta = data["metadata"]
print(f"Converted in {meta['conversion_time_sec']}s, {meta['token_count']} tokens")
with open("document.md", "w") as f:
f.write(markdown)In the browser, upload a file directly from an <input type="file">:
async function convertFile(file) {
const form = new FormData();
form.append("file", file);
const response = await fetch("https://mdstill.com/api/convert", {
method: "POST",
headers: {
Authorization: "Bearer mdr_your_api_key",
},
body: form,
});
if (!response.ok) {
const err = await response.json().catch(() => ({}));
throw new Error(err.detail || `HTTP ${response.status}`);
}
const { markdown, metadata } = await response.json();
console.log(`Converted in ${metadata.conversion_time_sec}s`);
return markdown;
}Node 18+ has fetch, Blob, and FormData built in — no dependencies needed:
import { readFile } from "node:fs/promises";
const buffer = await readFile("document.pdf");
const form = new FormData();
form.append("file", new Blob([buffer]), "document.pdf");
const response = await fetch("https://mdstill.com/api/convert", {
method: "POST",
headers: { Authorization: "Bearer mdr_your_api_key" },
body: form,
});
if (!response.ok) {
const err = await response.json().catch(() => ({}));
throw new Error(err.detail || `HTTP ${response.status}`);
}
const { markdown, metadata } = await response.json();
console.log(`${metadata.markdown_length} chars, ~${metadata.token_count} tokens`);package main
import (
"bytes"
"encoding/json"
"fmt"
"io"
"mime/multipart"
"net/http"
"os"
)
type Response struct {
Markdown string `json:"markdown"`
Metadata struct {
ConversionTimeSec float64 `json:"conversion_time_sec"`
TokenCount int `json:"token_count"`
} `json:"metadata"`
}
func main() {
file, err := os.Open("document.pdf")
if err != nil {
panic(err)
}
defer file.Close()
body := &bytes.Buffer{}
writer := multipart.NewWriter(body)
part, _ := writer.CreateFormFile("file", "document.pdf")
io.Copy(part, file)
writer.Close()
req, _ := http.NewRequest("POST", "https://mdstill.com/api/convert", body)
req.Header.Set("Authorization", "Bearer mdr_your_api_key")
req.Header.Set("Content-Type", writer.FormDataContentType())
resp, err := http.DefaultClient.Do(req)
if err != nil {
panic(err)
}
defer resp.Body.Close()
var result Response
json.NewDecoder(resp.Body).Decode(&result)
fmt.Printf("Converted in %.2fs, ~%d tokens\n",
result.Metadata.ConversionTimeSec, result.Metadata.TokenCount)
}<?php
$ch = curl_init("https://mdstill.com/api/convert");
curl_setopt_array($ch, [
CURLOPT_POST => true,
CURLOPT_RETURNTRANSFER => true,
CURLOPT_HTTPHEADER => [
"Authorization: Bearer mdr_your_api_key",
],
CURLOPT_POSTFIELDS => [
"file" => new CURLFile("document.pdf"),
],
]);
$response = curl_exec($ch);
$status = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
if ($status !== 200) {
throw new RuntimeException("API error: HTTP $status");
}
$data = json_decode($response, true);
echo "Converted in {$data['metadata']['conversion_time_sec']}s\n";
file_put_contents("document.md", $data["markdown"]);Convert every PDF in a directory. Drop-in pattern for shell scripts:
mkdir -p output
for file in documents/*.pdf; do
echo "Converting $file..."
curl -s -X POST https://mdstill.com/api/convert \
-H "Authorization: Bearer mdr_your_api_key" \
-F "file=@$file" \
| jq -r '.markdown' > "output/$(basename "$file" .pdf).md"
doneStructured / RAG-ready output
Chunks with metadata for vector databasesAdd output=structured to get the Markdown plus a structureobject with semantic sections, a document outline, and token-counted chunks. Each chunk includes a heading path and content type labels — ready to drop into LangChain, LlamaIndex, or any RAG pipeline.
curl -X POST https://mdstill.com/api/convert \
-H "Authorization: Bearer mdr_your_api_key" \
-F "file=@report.pdf" \
-F "output=structured" \
-F "chunk_tokens=500"The response includes everything from the standard response, plus a structure field:
{
"markdown": "# Introduction\n\nThis report covers...",
"metadata": {
"filename": "report.pdf",
"format": ".pdf",
"converter": "fast",
"size_bytes": 245760,
"conversion_time_sec": 0.42,
"markdown_length": 8320,
"token_count": 2080
},
"structure": {
"sections": [
{"heading": "Introduction", "level": 1, "content": "This report covers...", "tokens": 342},
{"heading": "Methods", "level": 1, "content": "We used...", "tokens": 518}
],
"headings": ["Introduction", "Methods", "Results", "Discussion"],
"total_tokens": 2080,
"max_chunk_tokens": 500,
"chunks": [
{
"id": 0,
"text": "# Introduction\n\nThis report covers...",
"tokens": 342,
"heading_path": ["Introduction"],
"content_types": ["paragraph"]
},
{
"id": 1,
"text": "# Methods\n\nWe used a mixed-methods approach...",
"tokens": 487,
"heading_path": ["Methods"],
"content_types": ["paragraph", "table"]
}
]
}
}Chunk fields
idSequential chunk index, starting from 0.
textThe chunk content as Markdown text.
tokensExact token count (tiktoken cl100k_base, compatible with GPT-4 and Claude).
heading_pathBreadcrumb of parent headings, e.g. ["Chapter 1", "Methods", "Data Collection"]. Empty for documents without headings (most PDFs).
content_typesTypes of content in this chunk: paragraph, table, list, code.
How chunking works
- --Markdown is parsed into an AST of atomic blocks: paragraphs, tables, lists, code blocks, headings.
- --Atomic blocks are never split mid-element. A 2000-token table stays as one chunk even if
chunk_tokens=500. - --Each heading starts a new chunk. Adjacent small blocks are merged until they hit the max.
- --Overlap: the last paragraph of the previous chunk is repeated at the start of the next for retrieval context.
Convert and get chunks ready for a vector database:
import requests
with open("report.pdf", "rb") as f:
resp = requests.post(
"https://mdstill.com/api/convert",
headers={"Authorization": "Bearer mdr_your_api_key"},
files={"file": f},
data={"output": "structured", "chunk_tokens": "500"},
)
data = resp.json()
chunks = data["structure"]["chunks"]
# Each chunk is ready for embedding
for chunk in chunks:
print(f"Chunk {chunk['id']}: {chunk['tokens']} tokens")
print(f" Path: {' > '.join(chunk['heading_path']) or '(root)'}")
print(f" Types: {chunk['content_types']}")
# embed(chunk["text"]) # your embedding call hereExtract just the chunks array:
curl -s -X POST https://mdstill.com/api/convert \
-H "Authorization: Bearer mdr_your_api_key" \
-F "file=@report.pdf" \
-F "output=structured" \
-F "chunk_tokens=500" \
| jq '.structure.chunks' > chunks.jsonimport { readFile } from "node:fs/promises";
const buffer = await readFile("report.pdf");
const form = new FormData();
form.append("file", new Blob([buffer]), "report.pdf");
form.append("output", "structured");
form.append("chunk_tokens", "500");
const resp = await fetch("https://mdstill.com/api/convert", {
method: "POST",
headers: { Authorization: "Bearer mdr_your_api_key" },
body: form,
});
const { structure } = await resp.json();
console.log(`${structure.chunks.length} chunks, ${structure.total_tokens} tokens`);
// Feed chunks into your vector store
for (const chunk of structure.chunks) {
await vectorStore.upsert({
id: `report-${chunk.id}`,
text: chunk.text,
metadata: {
heading_path: chunk.heading_path,
content_types: chunk.content_types,
tokens: chunk.tokens,
},
});
}Error codes
All errors return a JSON body with a detail field explaining what went wrong:
{
"detail": "File too large (45MB). Max: 20MB"
}| Code | Meaning | When it happens & how to fix |
|---|---|---|
| 400 | Bad Request | Unsupported format, malformed filename, or file exceeds your plan's size limit. The detail field says which. Check the supported formats list and your plan limits. |
| 408 | Request Timeout | Conversion took longer than the server timeout. Usually means the file is very large or structurally complex (deeply nested tables, thousands of pages). Try splitting the document. |
| 413 | Payload Too Large | Upload exceeded the absolute body-size ceiling before even reaching the converter. Hard cap independent of plan. Split the file and convert pieces separately. |
| 429 | Too Many Requests | Daily quota exhausted. The detail field shows current usage (used/limit). Quotas reset at 00:00 UTC. Sign up or upgrade for a higher limit. |
| 500 | Internal Server Error | The converter hit an unexpected error on this specific file. Usually a corrupted or unusual input. Retry once; if it still fails, the file format is likely outside what we handle. |
| 503 | Service Unavailable | Server temporarily overloaded or a dependency is degraded. Retry with exponential backoff (1s, 2s, 4s). |
There is currently no 401 Unauthorized on the convert endpoint — an invalid or missing API key simply falls through to the anonymous tier and its lower quota.
Rate limits
Limits are daily, per plan. Anonymous requests count against a per-IP quota; authenticated requests count against your account's quota regardless of which API key was used.
- --Quotas reset at 00:00 UTC every day. The counter is based on successful conversions — failed requests (4xx/5xx) do not consume your quota.
- --Web interface and API share the same daily quota per account.
- --When you exceed the limit the API returns
429 Too Many Requestswith a body like{"detail": "Daily fast conversion limit reached (50/50). Upgrade your plan for higher limits."}. - --There is no
Retry-Afterheader yet — assume the next window opens at the next UTC midnight.
Supported formats
.docx
DOCX
.doc
DOC
.pptx
PowerPoint
.xlsx
XLSX
.xls
XLS
.html
HTML
.htm
HTM
.epub
EPUB
.csv
CSV
.json
JSON
.xml
XML
.zip
ZIP archive
.rtf
RTF
.odt
ODT
.pages
Pages
.numbers
Numbers
.key
Keynote
Notes
- --Files are processed in memory and deleted immediately after conversion. Nothing is stored on our servers.
- --The API supports both plain Markdown and structured RAG-ready output with semantic chunking. Use
output=structuredfor the latter. - --Track your API key usage in the Dashboard. Each key shows total conversions and last activity.
- --You can generate up to 5 API keys per account. Revoke unused keys from the Dashboard.