/document-to-json

Extract structured JSON from any document.

Define your schema, send your file — get clean JSON back.

/document-to-json

# Extract structured JSON from any document
curl -X POST https://api.documentai.dev/document-to-json/v1 \
  -H "x-api-key: YOUR_API_KEY" \
  -F "url=https://example.com/invoice.pdf" \
  -F 'schema={
    "company": "string",
    "paid": "boolean",
    "items": [{"name": "string", "price": "number"}]
  }'

Playground

Try it

Try with the example below, or paste your own text.

JSON Schema

Defines the structure of the extracted JSON output.

Copy request as

Reference

Documentation

Endpoint

POSThttps://api.documentai.dev/document-to-json/v1

Authentication

Header	Required	Description
x-api-key	Required	Your API key. Get one by signing up.

Request Body

Upload a file, provide a public URL, or send plain text.

Field	Type	Required	Description
url	string	One of three	Publicly accessible URL to a document. Use a signed URL for protected files.
file	file	One of three	Any supported document, as multipart/form-data. Max 30 MB.
text	string	One of three	Raw unstructured text to extract from.
schema	object	Required	JSON Schema describing the desired output structure (sent as a JSON string).

Accepted File Types

PDFWord (.doc, .docx)Images (PNG, JPG, WebP, GIF, BMP)PowerPoint (.ppt, .pptx)Excel (.xls, .xlsx, .csv)HTMLRTF / TXTOpenDocument (ODT, ODS, ODP)Code filesPlain text

JSON Schema Examples

Define the structure you want — from flat key-value pairs to deeply nested schemas.

{
  "type": "object",
  "properties": {
    "company": { "type": "string" },
    "total": { "type": "number" },
    "paid": { "type": "boolean" }
  }
}

Tip: use the description keyword to guide the extraction. Add it to any field to tell the AI what it should contain, or at the root of your schema to give overall extraction instructions. This improves accuracy on ambiguous documents.

Supported keywords

typepropertiesrequireditemsdescriptionenumformatminimummaximumminItemsmaxItemsadditionalPropertiesprefixItems

Example Response

200 OK

{
  "status": "success",
  "data": {
    "company": "Acme Corp",
    "total": 1249.5,
    "paid": true
  }
}

Limits

Max Input

Limit	Value
Max file size	30 MB
Max pages per document	1,000
Max input tokens	1,000,000

Max Output

The API allocates output capacity based on your input size, so larger documents can produce richer extractions.

If input is a URL or a file, max output is 2,000 tokens per input page.

If input is text, max output is twice the input tokens count.

Minimum is always 2,000 tokens, absolute maximum is 60,000 tokens per request.

Cost

· File inputs: 1 credit per page.
· Text inputs: 1 credit per 1,000 tokens.
· Only input is billed — output size does not affect cost.
· Failed requests are not charged.

Privacy

This endpoint is stateless — we don't store any trace of the documents you send. Your data is processed and returned immediately. Only /store-and-search-documents retains documents.