DocumentAI.dev
API

/store-and-search-documents

Store and semantically search your documents.

Upload your files with metadata – Query them by meaning and filters.

/store-and-search-documents
# 1. Store a document with metadata filters in a dataspace
curl -X POST https://api.documentai.dev/store-and-search-documents/v1/dataspaces/:dataspaceId/documents \
  -H "x-api-key: YOUR_API_KEY" \
  -F "file=@company-wiki.pdf" \
  -F 'filters={"string_1": "hr-policies", "number_1": 2025}'

# 2. Search your documents by meaning + filters
curl -X POST https://api.documentai.dev/store-and-search-documents/v1/dataspaces/:dataspaceId/documents/search \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What is the remote work policy?",
    "filters": [{"field": "string_1", "operator": "==", "value": "hr-policies"}],
    "limit": 5
  }'

We handle for you document conversion, markdown extraction, chunking, embedding and vector databases.

Use our search back-end to build your knowledge bases, RAG pipelines, internal search tools, support bots...

Features

Why This API

Cross-Language Search

Search in any language and find results in any other. Store a document in German, query in English... — the API supports more than 100 languages.

Semantic Understanding

Powered by embeddings, the search engine matches by meaning, not literal text. Synonyms, paraphrases and related concepts are all captured.

Metadata Filters

Attach up to 7 typed filter fields per document. Combine semantic search with precise metadata conditions to narrow results before ranking.

No Infrastructure to Manage

A single POST stores your document. Conversion, chunking, embedding and indexing all happen automatically behind one endpoint.

Dataspaces

Organize documents into isolated dataspaces. Keep different clients, projects or environments fully separated — search never leaks across boundaries.

All File Types Supported

PDF, Word, Excel, PowerPoint, images, HTML, code files and more. Send any document, we extract the content for you.

Quick Start

Getting Started

1

Get your API key

Sign up for free to receive your API key.
The free plan includes 100 credits per month and 200 document storage slots to get you started.

2

Create a dataspace

A dataspace is an isolated container for your documents.
Create one with a simple POST /dataspaces.
Create multiple dataspaces if you want to isolate data (ex: for different environments, projects, or clients).

3

Store & search

Upload a file or plain text to your dataspace via our API.
Then search by meaning with a single query.
Results are ranked by semantic relevance and returned with a similarity score.

Reference

Documentation

Endpoint Base URL

https://api.documentai.dev/store-and-search-documents/v1

Authentication

HeaderRequiredDescription
x-api-keyRequiredYour API key. Get one by signing up.

Endpoints

MethodEndpointDescription
Documents
POST/dataspaces/:id/documentsStore a new document
POST/dataspaces/:id/documents/searchSemantic search over documents in a dataspace
GET/dataspaces/:id/documentsList documents (paginated)
GET/dataspaces/:id/documents/:docIdRetrieve a specific document
PATCH/dataspaces/:id/documents/:docIdUpdate document metadata filters only
DELETE/dataspaces/:id/documents/:docIdDelete a document
Dataspaces
POST/dataspacesCreate a new dataspace
GET/dataspacesList dataspaces (paginated)
GET/dataspaces/:idRetrieve a dataspace
DELETE/dataspaces/:idDelete dataspace (and all containing documents)

Store Document

POST/dataspaces/:id/documents

Upload a document to have it stored and searchable.
You can provide a file, a public URL, or raw text.
You can also attach optional metadata filters.

Request Body

Send data as multipart/form-data.

FieldTypeRequired
urlstringOne of three
filefileOne of three
textstringOne of three
filtersobjectOptional

Accepted File Types

PDFWord (.doc, .docx)Images (PNG, JPG, WebP, GIF, BMP)PowerPoint (.ppt, .pptx)Excel (.xls, .xlsx, .csv)HTMLRTF / TXTOpenDocument (ODT, ODS, ODP)Code filesPlain text

Filter Fields for Metadata

Key names are fixed

You can attach up to 7 distinct filter fields per document.
These must be formatted as a flat JSON object in the filters field.
You must use these exact keys:

  • number_1, number_2, number_3
  • string_1, string_2, string_3, string_4

Example:

{
  "string_1": "hr-policies",
  "string_2": "internal",
  "string_3": "onboarding",
  "string_4": "v2",
  "number_1": 2025,
  "number_2": -3.14159,
  "number_3": 42
}

Document Fields

Every stored document contains the following fields.

FieldType
idstring
textstring
markdownstring
filtersobject
created_atstring

Example Responses

File or URL input

200 OK
{
  "status": "success",
  "data": {
    "id": "8f3kLmNpQ2xR4vW1",
    "markdown": "# Company Wiki\n\n## Remote Work\nEmployees are allowed...",
    "text": "Company Wiki\nRemote Work\nEmployees are allowed...",
    "filters": {
      "string_1": "hr-policies",
      "number_1": 2025
    },
    "created_at": "2026-04-30T12:00:00Z"
  }
}

Text input

200 OK
{
  "status": "success",
  "data": {
    "id": "Kp9nWxYz5TmR3qL7",
    "markdown": "",
    "text": "Employees are allowed to work remotely 3 days a week.\nA monthly stipend of $200 is provided for home office equipment.",
    "filters": {
      "string_1": "hr-policies",
      "number_1": 2025
    },
    "created_at": "2026-04-30T12:05:00Z"
  }
}

Search Documents

POST/dataspaces/:id/documents/search

Perform a semantic vector search across all documents in a dataspace.
You can also apply complex metadata filters to narrow down the results before ranking.

Request Body

Send data as application/json.

FieldTypeRequired
querystringRequired
limitnumberOptional
filtersarrayOptional

Filter Rules

Each rule in the filters array must be an object with three properties: field, operator, and value.
Operators can be combined freely.

Field TypeSupported Operators
number_1, number_2, number_3==, !=, >, >=, <, <=, in, not-in
string_1, string_2, string_3, string_4==, !=, in, not-in

Limits:

  • · You can use maximum 7 filter rules in a request.
  • · You can only use one in or not-in operator per search query.
  • · You can only use one != or not-in filter per search query.

Request Examples

Basic query

{
  "query": "What is the remote work policy?"
}

With limit

{
  "query": "What is the remote work policy?",
  "limit": 5
}

Single filter

{
  "query": "How do I cancel my subscription?",
  "limit": 5,
  "filters": [
    { "field": "string_1", "operator": "==", "value": "support" }
  ]
}

Multiple filters

{
  "query": "GDPR compliance requirements for user data",
  "limit": 10,
  "filters": [
    { "field": "string_1", "operator": "==", "value": "legal" },
    { "field": "number_1", "operator": ">=", "value": 2024 },
    { "field": "string_2", "operator": "!=", "value": "draft" }
  ]
}

Range + in operator

{
  "query": "pricing model",
  "limit": 20,
  "filters": [
    { "field": "number_2", "operator": ">=", "value": 3.14 },
    { "field": "number_2", "operator": "<", "value": 3.15 },
    { "field": "string_4", "operator": "in", "value": ["finance", "sales", "marketing"] }
  ]
}

Example Response

200 OK
{
  "status": "success",
  "data": [
    {
      "score": 0.892,
      "document": {
        "id": "8f3kLmNpQ2xR4vW1",
        "text": "Employees are allowed to work remotely 3 days a week...",
        "markdown": "## Remote Work\nEmployees are allowed to work **remotely** 3 days a week...",
        "filters": {
          "string_1": "hr-policies",
          "number_1": 2025
        }
      }
    },
    {
      "score": 0.814,
      "document": {
        "id": "Yt7nBcDe9FgH2jK5",
        "text": "A monthly stipend of $200 is provided for home office equipment...",
        "markdown": "",
        "filters": {
          "string_1": "hr-policies",
          "number_1": 2025
        }
      }
    },
    {
      "score": 0.743,
      "document": {
        "id": "Zw6mXsAp3RqV8uT4",
        "text": "Remote employees must be available during core hours 10am-4pm...",
        "markdown": "## Availability\nRemote employees must be available during core hours **10am-4pm**...",
        "filters": {
          "string_1": "hr-policies",
          "number_1": 2024
        }
      }
    }
  ]
}

The markdown field is populated for documents stored from files or URLs. For documents stored from text input, it is an empty string.

Other Endpoints

Detailed specifications for all remaining CRUD and management endpoints.

Pagination

Both GET /dataspaces and GET /dataspaces/:id/documents support cursor-based pagination.

Query ParameterDefault
limit20
cursor

Every paginated response includes a pagination object:

{
  "status": "success",
  "data": [ ... ],
  "pagination": {
    "has_more": true,
    "next_cursor": "8f3kLmNpQ2xR4vW1"
  }
}

List Documents

GET/dataspaces/:id/documents

Returns a paginated list of documents in a dataspace, ordered by creation date (newest first).

200 OK
{
  "status": "success",
  "data": [
    {
      "id": "8f3kLmNpQ2xR4vW1",
      "markdown": "# Company Wiki\n\n## Remote Work\nEmployees are allowed...",
      "text": "Company Wiki\nRemote Work\nEmployees are allowed...",
      "filters": {
        "string_1": "hr-policies",
        "number_1": 2025
      },
      "created_at": "2026-04-30T12:00:00Z"
    },
    {
      "id": "Kp9nWxYz5TmR3qL7",
      "markdown": "",
      "text": "A monthly stipend of $200 is provided for home office equipment...",
      "filters": {
        "string_1": "hr-policies",
        "number_1": 2025
      },
      "created_at": "2026-04-30T11:45:00Z"
    }
  ],
  "pagination": {
    "has_more": true,
    "next_cursor": "Kp9nWxYz5TmR3qL7"
  }
}

Get Document

GET/dataspaces/:id/documents/:docId

Retrieve a single document by its ID.

200 OK
{
  "status": "success",
  "data": {
    "id": "8f3kLmNpQ2xR4vW1",
    "markdown": "# Company Wiki\n\n## Remote Work\nEmployees are allowed...",
    "text": "Company Wiki\nRemote Work\nEmployees are allowed...",
    "filters": {
      "string_1": "hr-policies",
      "number_1": 2025
    },
    "created_at": "2026-04-30T12:00:00Z"
  }
}

Update Document

PATCH/dataspaces/:id/documents/:docId

Update metadata filters only, on an existing document.
Send data as application/json with a filters object.

  • · Partial merge: Only the keys you include are updated. Existing filters you don't mention are left unchanged.
  • · Delete a filter: Send null for a key to remove it entirely.
  • · Content is immutable: You cannot update the text, file, or markdown of a document. Delete and re-store it instead.

Request body

{
  "filters": {
    "string_1": "updated-category",
    "number_1": 2026,
    "string_2": null
  }
}

Response

200 OK
{
  "status": "success",
  "data": {
    "id": "8f3kLmNpQ2xR4vW1",
    "markdown": "# Company Wiki\n\n## Remote Work\nEmployees are allowed...",
    "text": "Company Wiki\nRemote Work\nEmployees are allowed...",
    "filters": {
      "string_1": "updated-category",
      "number_1": 2026
    },
    "created_at": "2026-04-30T12:00:00Z"
  }
}

Note that string_2 was removed from the response because it was set to null in the request. Filter updates take effect on the next search immediately.

Delete Document

DELETE/dataspaces/:id/documents/:docId

Permanently delete a document and all its associated search data.

200 OK
{
  "status": "success",
  "data": {
    "id": "8f3kLmNpQ2xR4vW1",
    "deleted": true
  }
}

Create Dataspace

POST/dataspaces

Create a new, empty dataspace. No request body is required.

200 OK
{
  "status": "success",
  "data": {
    "id": "Xr9pLmWq4TnK2vY8",
    "count": 0,
    "created_at": "2026-04-30T14:00:00Z"
  }
}

List Dataspaces

GET/dataspaces

Returns a paginated list of your dataspaces, ordered by creation date (newest first).
Supports the same ?limit and ?cursor query parameters.

200 OK
{
  "status": "success",
  "data": [
    {
      "id": "Xr9pLmWq4TnK2vY8",
      "count": 47,
      "created_at": "2026-04-30T14:00:00Z"
    },
    {
      "id": "Bc3nYhTw7KpR5jM1",
      "count": 12,
      "created_at": "2026-04-28T09:30:00Z"
    }
  ],
  "pagination": {
    "has_more": false,
    "next_cursor": "Bc3nYhTw7KpR5jM1"
  }
}

Get Dataspace

GET/dataspaces/:id

Retrieve a single dataspace. The count field reflects the number of documents currently stored.

200 OK
{
  "status": "success",
  "data": {
    "id": "Xr9pLmWq4TnK2vY8",
    "count": 47,
    "created_at": "2026-04-30T14:00:00Z"
  }
}

Delete Dataspace

DELETE/dataspaces/:id

Permanently delete a dataspace. All documents inside it are permanently destroyed.

200 OK
{
  "status": "success",
  "data": {
    "id": "Xr9pLmWq4TnK2vY8",
    "deleted": true,
    "documents_destroyed": 47
  }
}

Error Responses

All error responses follow a consistent envelope:

{
  "status": "error",
  "error": {
    "code": "error_code",
    "message": "Human-readable description."
  }
}

Limits

LimitValue
Max input file size30 MB
Max pages per document1,000
Filter fields per documentstring_1, string_2, string_3, string_4, number_1, number_2, number_3
Max filter rules per search7

Document storage limit: The total number of documents you can store across all dataspaces depends on your plan — 200 on the Free plan, 50,000 on Basic, 100,000 on Plus, and 200,000 on Max.

Cost

  • · Storing files: 1 credit per page.
  • · Storing text: 1 credit per 1,000 tokens.
  • · Searching: 0.2 credits per search query.
  • · All other endpoints (list, get, update, delete, and all dataspace management) are free.
  • · Failed requests are not charged.
  • · Storage is bounded by your plan's max_documents limit.