DocumentAI.dev

API

/store-and-search-documents

Store and semantically search your documents.

Upload your files with metadata – Query them by meaning and filters.

# 1. Store a document with metadata filters in a dataspace
curl -X POST https://api.documentai.dev/store-and-search-documents/v1/dataspaces/:dataspaceId/documents \
  -H "x-api-key: YOUR_API_KEY" \
  -F "file=@company-wiki.pdf" \
  -F 'filters={"string_1": "hr-policies", "number_1": 2025}'

# 2. Search your documents by meaning + filters
curl -X POST https://api.documentai.dev/store-and-search-documents/v1/dataspaces/:dataspaceId/documents/search \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What is the remote work policy?",
    "filters": [{"field": "string_1", "operator": "==", "value": "hr-policies"}],
    "limit": 5
  }'

We handle for you document conversion, markdown extraction, chunking, embedding and vector databases.

Use our search back-end to build your knowledge bases, RAG pipelines, internal search tools, support bots...

Features

Why This API

Cross-Language Search

Search in any language and find results in any other. Store a document in German, query in English... — the API supports more than 100 languages.

Semantic Understanding

Powered by embeddings, the search engine matches by meaning, not literal text. Synonyms, paraphrases and related concepts are all captured.

Metadata Filters

Attach up to 7 typed filter fields per document. Combine semantic search with precise metadata conditions to narrow results before ranking.

No Infrastructure to Manage

A single POST stores your document. Conversion, chunking, embedding and indexing all happen automatically behind one endpoint.

Dataspaces

Organize documents into isolated dataspaces. Keep different clients, projects or environments fully separated — search never leaks across boundaries.

All File Types Supported

PDF, Word, Excel, PowerPoint, images, HTML, code files and more. Send any document, we extract the content for you.

Quick Start

Getting Started

Get your API key

Sign up for free to receive your API key.
The free plan includes 30 one-time credits and 60 document storage slots to get you started.

Create a dataspace

A dataspace is an isolated container for your documents.
Create one with a simple POST /dataspaces.
Create multiple dataspaces if you want to isolate data (ex: for different environments, projects, or clients).

curl -X POST https://api.documentai.dev/store-and-search-documents/v1/dataspaces \
  -H "x-api-key: YOUR_API_KEY"

Store & search

Upload a file or plain text to your dataspace via our API.
Then search by meaning with a single query.
Results are ranked by semantic relevance and returned with a similarity score.

Reference

Documentation

Endpoint Base URL

https://api.documentai.dev/store-and-search-documents/v1

Authentication

Header	Required	Description
x-api-key	Required	Your API key. Get one by signing up.

Endpoints

Method	Endpoint	Description
Documents
POST	/dataspaces/:id/documents	Store a new document
POST	/dataspaces/:id/documents/search	Semantic search over documents in a dataspace
GET	/dataspaces/:id/documents	List documents (paginated)
GET	/dataspaces/:id/documents/:docId	Retrieve a specific document
PATCH	/dataspaces/:id/documents/:docId	Update document metadata filters only
DELETE	/dataspaces/:id/documents/:docId	Delete a document

Dataspaces
POST	/dataspaces	Create a new dataspace
GET	/dataspaces	List dataspaces (paginated)
GET	/dataspaces/:id	Retrieve a dataspace
DELETE	/dataspaces/:id	Delete dataspace (and all containing documents)

Store Document

POST/dataspaces/:id/documents

Upload a document to have it stored and searchable.
You can provide a file, a public URL, or raw text.
You can also attach optional metadata filters.

Request Body

Send data as multipart/form-data.

Field	Type	Required	Description
url	string	One of three	Publicly accessible URL to a document.
file	file	One of three	Any supported document, max 30 MB.
text	string	One of three	Plain text content.
filters	object	Optional	JSON string defining metadata filters for this document.

Accepted File Types

PDFWord (.doc, .docx)Images (PNG, JPG, WebP, GIF, BMP)PowerPoint (.ppt, .pptx)Excel (.xls, .xlsx, .csv)HTMLRTF / TXTOpenDocument (ODT, ODS, ODP)Code filesPlain text

Filter Fields for Metadata

Key names are fixed

You can attach up to 7 distinct filter fields per document.
These must be formatted as a flat JSON object in the filters field.
You must use these exact keys:

number_1, number_2, number_3
string_1, string_2, string_3, string_4

Example:

{
  "string_1": "hr-policies",
  "string_2": "internal",
  "string_3": "onboarding",
  "string_4": "v2",
  "number_1": 2025,
  "number_2": -3.14159,
  "number_3": 42
}

Document Fields

Every stored document contains the following fields.

Field	Type	Description
id	string	Unique document identifier.
text	string	Plain text content of the document. Always present.
markdown	string	Structured markdown extraction. Populated for file and URL inputs. Empty string for text inputs.
filters	object	Metadata filters attached to the document.
created_at	string	ISO 8601 creation timestamp.

Example Responses

File or URL input

200 OK

{
  "status": "success",
  "data": {
    "id": "8f3kLmNpQ2xR4vW1",
    "markdown": "# Company Wiki\n\n## Remote Work\nEmployees are allowed...",
    "text": "Company Wiki\nRemote Work\nEmployees are allowed...",
    "filters": {
      "string_1": "hr-policies",
      "number_1": 2025
    },
    "created_at": "2026-04-30T12:00:00Z"
  }
}

Text input

200 OK

{
  "status": "success",
  "data": {
    "id": "Kp9nWxYz5TmR3qL7",
    "markdown": "",
    "text": "Employees are allowed to work remotely 3 days a week.\nA monthly stipend of $200 is provided for home office equipment.",
    "filters": {
      "string_1": "hr-policies",
      "number_1": 2025
    },
    "created_at": "2026-04-30T12:05:00Z"
  }
}

Search Documents

POST/dataspaces/:id/documents/search

Perform a semantic vector search across all documents in a dataspace.
You can also apply complex metadata filters to narrow down the results before ranking.

Request Body

Send data as application/json.

Field	Type	Required	Description
query	string	Required	The search string. Used to find the most relevant documents in your dataspace.
limit	number	Optional	Number of results to return. Default is 20. Max is 200.
filters	array	Optional	Array of filter rule objects to apply before searching. Max 7 rules.

Filter Rules

Each rule in the filters array must be an object with three properties: field, operator, and value.
Operators can be combined freely.

Field Type	Supported Operators
number_1, number_2, number_3	==, !=, >, >=, <, <=, in, not-in
string_1, string_2, string_3, string_4	==, !=, in, not-in

Limits:

· You can use maximum 7 filter rules in a request.
· You can only use one in or not-in operator per search query.
· You can only use one != or not-in filter per search query.

Request Examples

Basic query

{
  "query": "What is the remote work policy?"
}

With limit

{
  "query": "What is the remote work policy?",
  "limit": 5
}

Single filter

{
  "query": "How do I cancel my subscription?",
  "limit": 5,
  "filters": [
    { "field": "string_1", "operator": "==", "value": "support" }
  ]
}

Multiple filters

{
  "query": "GDPR compliance requirements for user data",
  "limit": 10,
  "filters": [
    { "field": "string_1", "operator": "==", "value": "legal" },
    { "field": "number_1", "operator": ">=", "value": 2024 },
    { "field": "string_2", "operator": "!=", "value": "draft" }
  ]
}

Range + in operator

{
  "query": "pricing model",
  "limit": 20,
  "filters": [
    { "field": "number_2", "operator": ">=", "value": 3.14 },
    { "field": "number_2", "operator": "<", "value": 3.15 },
    { "field": "string_4", "operator": "in", "value": ["finance", "sales", "marketing"] }
  ]
}

Example Response

200 OK

{
  "status": "success",
  "data": [
    {
      "score": 0.892,
      "document": {
        "id": "8f3kLmNpQ2xR4vW1",
        "text": "Employees are allowed to work remotely 3 days a week...",
        "markdown": "## Remote Work\nEmployees are allowed to work **remotely** 3 days a week...",
        "filters": {
          "string_1": "hr-policies",
          "number_1": 2025
        }
      }
    },
    {
      "score": 0.814,
      "document": {
        "id": "Yt7nBcDe9FgH2jK5",
        "text": "A monthly stipend of $200 is provided for home office equipment...",
        "markdown": "",
        "filters": {
          "string_1": "hr-policies",
          "number_1": 2025
        }
      }
    },
    {
      "score": 0.743,
      "document": {
        "id": "Zw6mXsAp3RqV8uT4",
        "text": "Remote employees must be available during core hours 10am-4pm...",
        "markdown": "## Availability\nRemote employees must be available during core hours **10am-4pm**...",
        "filters": {
          "string_1": "hr-policies",
          "number_1": 2024
        }
      }
    }
  ]
}

The markdown field is populated for documents stored from files or URLs. For documents stored from text input, it is an empty string.

Other Endpoints

Detailed specifications for all remaining CRUD and management endpoints.

Pagination

Both GET /dataspaces and GET /dataspaces/:id/documents support cursor-based pagination.

Query Parameter	Default	Description
limit	20	Number of items to return. Max 200.
cursor	—	ID of the last item from the previous page. Use `next_cursor` from the previous response.

Every paginated response includes a pagination object:

{
  "status": "success",
  "data": [ ... ],
  "pagination": {
    "has_more": true,
    "next_cursor": "8f3kLmNpQ2xR4vW1"
  }
}

List Documents

GET/dataspaces/:id/documents

Returns a paginated list of documents in a dataspace, ordered by creation date (newest first).

200 OK

{
  "status": "success",
  "data": [
    {
      "id": "8f3kLmNpQ2xR4vW1",
      "markdown": "# Company Wiki\n\n## Remote Work\nEmployees are allowed...",
      "text": "Company Wiki\nRemote Work\nEmployees are allowed...",
      "filters": {
        "string_1": "hr-policies",
        "number_1": 2025
      },
      "created_at": "2026-04-30T12:00:00Z"
    },
    {
      "id": "Kp9nWxYz5TmR3qL7",
      "markdown": "",
      "text": "A monthly stipend of $200 is provided for home office equipment...",
      "filters": {
        "string_1": "hr-policies",
        "number_1": 2025
      },
      "created_at": "2026-04-30T11:45:00Z"
    }
  ],
  "pagination": {
    "has_more": true,
    "next_cursor": "Kp9nWxYz5TmR3qL7"
  }
}

Get Document

GET/dataspaces/:id/documents/:docId

Retrieve a single document by its ID.

200 OK

{
  "status": "success",
  "data": {
    "id": "8f3kLmNpQ2xR4vW1",
    "markdown": "# Company Wiki\n\n## Remote Work\nEmployees are allowed...",
    "text": "Company Wiki\nRemote Work\nEmployees are allowed...",
    "filters": {
      "string_1": "hr-policies",
      "number_1": 2025
    },
    "created_at": "2026-04-30T12:00:00Z"
  }
}

Update Document

PATCH/dataspaces/:id/documents/:docId

Update metadata filters only, on an existing document.
Send data as application/json with a filters object.

· Partial merge: Only the keys you include are updated. Existing filters you don't mention are left unchanged.
· Delete a filter: Send null for a key to remove it entirely.
· Content is immutable: You cannot update the text, file, or markdown of a document. Delete and re-store it instead.

Request body

{
  "filters": {
    "string_1": "updated-category",
    "number_1": 2026,
    "string_2": null
  }
}

Response

200 OK

{
  "status": "success",
  "data": {
    "id": "8f3kLmNpQ2xR4vW1",
    "markdown": "# Company Wiki\n\n## Remote Work\nEmployees are allowed...",
    "text": "Company Wiki\nRemote Work\nEmployees are allowed...",
    "filters": {
      "string_1": "updated-category",
      "number_1": 2026
    },
    "created_at": "2026-04-30T12:00:00Z"
  }
}

Note that string_2 was removed from the response because it was set to null in the request. Filter updates take effect on the next search immediately.

Delete Document

DELETE/dataspaces/:id/documents/:docId

Permanently delete a document and all its associated search data.

200 OK

{
  "status": "success",
  "data": {
    "id": "8f3kLmNpQ2xR4vW1",
    "deleted": true
  }
}

Create Dataspace

POST/dataspaces

Create a new, empty dataspace. No request body is required.

200 OK

{
  "status": "success",
  "data": {
    "id": "Xr9pLmWq4TnK2vY8",
    "count": 0,
    "created_at": "2026-04-30T14:00:00Z"
  }
}

List Dataspaces

GET/dataspaces

Returns a paginated list of your dataspaces, ordered by creation date (newest first).
Supports the same ?limit and ?cursor query parameters.

200 OK

{
  "status": "success",
  "data": [
    {
      "id": "Xr9pLmWq4TnK2vY8",
      "count": 47,
      "created_at": "2026-04-30T14:00:00Z"
    },
    {
      "id": "Bc3nYhTw7KpR5jM1",
      "count": 12,
      "created_at": "2026-04-28T09:30:00Z"
    }
  ],
  "pagination": {
    "has_more": false,
    "next_cursor": "Bc3nYhTw7KpR5jM1"
  }
}

Get Dataspace

GET/dataspaces/:id

Retrieve a single dataspace. The count field reflects the number of documents currently stored.

200 OK

{
  "status": "success",
  "data": {
    "id": "Xr9pLmWq4TnK2vY8",
    "count": 47,
    "created_at": "2026-04-30T14:00:00Z"
  }
}

Delete Dataspace

DELETE/dataspaces/:id

Permanently delete a dataspace. All documents inside it are permanently destroyed.

200 OK

{
  "status": "success",
  "data": {
    "id": "Xr9pLmWq4TnK2vY8",
    "deleted": true,
    "documents_destroyed": 47
  }
}

Error Responses

All error responses follow a consistent envelope:

{
  "status": "error",
  "error": {
    "code": "error_code",
    "message": "Human-readable description."
  }
}

Limits

Limit	Value
Max input file size	30 MB
Max pages per document	1,000
Filter fields per document	string_1, string_2, string_3, string_4, number_1, number_2, number_3
Max filter rules per search	7

Document storage limit: The total number of documents you can store across all dataspaces depends on your plan — 60 on the Free plan, 50,000 on Basic, 100,000 on Plus, and 200,000 on Max.

Cost

· Storing files: 1 credit per page.
· Storing text: 1 credit per 1,000 tokens.
· Searching: 0.2 credits per search query.
· All other endpoints (list, get, update, delete, and all dataspace management) are free.
· Failed requests are not charged.
· Storage is bounded by your plan's max_documents limit.