DocumentAI.dev
API

/document-to-json

Extract structured JSON from any document.

Define your schema, send your file — get clean JSON back.

/document-to-json
# Extract structured JSON from any document
curl -X POST https://api.documentai.dev/document-to-json/v1 \
  -H "x-api-key: YOUR_API_KEY" \
  -F "url=https://example.com/invoice.pdf" \
  -F 'schema={
    "company": "string",
    "paid": "boolean",
    "items": [{"name": "string", "price": "number"}]
  }'
Playground

Try it

Try with the example below, or paste your own text.

JSON Schema

Defines the structure of the extracted JSON output.

Copy request as
Reference

Documentation

Endpoint

POSThttps://api.documentai.dev/document-to-json/v1

Authentication

HeaderRequiredDescription
x-api-keyRequiredYour API key. Get one by signing up.

Request Body

Upload a file, provide a public URL, or send plain text.

FieldTypeRequired
urlstringOne of three
filefileOne of three
textstringOne of three
schemaobjectRequired

Accepted File Types

PDFWord (.doc, .docx)Images (PNG, JPG, WebP, GIF, BMP)PowerPoint (.ppt, .pptx)Excel (.xls, .xlsx, .csv)HTMLRTF / TXTOpenDocument (ODT, ODS, ODP)Code filesPlain text

JSON Schema Examples

Define the structure you want — from flat key-value pairs to deeply nested schemas.

{
  "type": "object",
  "properties": {
    "company": { "type": "string" },
    "total": { "type": "number" },
    "paid": { "type": "boolean" }
  }
}

Tip: use the description keyword to guide the extraction. Add it to any field to tell the AI what it should contain, or at the root of your schema to give overall extraction instructions. This improves accuracy on ambiguous documents.

Supported keywords

typepropertiesrequireditemsdescriptionenumformatminimummaximumminItemsmaxItemsadditionalPropertiesprefixItems

Example Response

200 OK
{
  "status": "success",
  "data": {
    "company": "Acme Corp",
    "total": 1249.5,
    "paid": true
  }
}

Limits

Max Input

LimitValue
Max file size30 MB
Max pages per document1,000
Max input tokens1,000,000

Max Output

The API allocates output capacity based on your input size, so larger documents can produce richer extractions.

If input is a URL or a file, max output is 2,000 tokens per input page.

If input is text, max output is twice the input tokens count.

Minimum is always 2,000 tokens, absolute maximum is 60,000 tokens per request.

Cost

  • · File inputs: 1 credit per page.
  • · Text inputs: 1 credit per 1,000 tokens.
  • · Only input is billed — output size does not affect cost.
  • · Failed requests are not charged.