Just ask
    Introduction

    Welcome to the documentAI API! documentAI provides a retrieval augmented generation service that allows you to generate high-quality content using a combination of large language models and an information retrieval system.

    Our API provides easy access to documentAI's state-of-the-art AI models through a simple REST interface. You can integrate documentAI directly into your application to add powerful natural language generation capabilities.

    This documentation provides complete reference material for using the documentAI API. We recommend reading the quick start guide to get up and running quickly. From there you can explore the available endpoints for generating text, managing knowledge sources, and more.

    Getting started
    Authentication

    To use the documentAI API, you'll need an API key. API keys can be created and managed from your documentAI console.

    When you create a new API key, you'll be shown the key value only once. Make sure to record it in a secure location - you'll need it to authenticate all API requests.

    You can create multiple API keys and revoke individual keys if needed. Having separate keys for development, staging, and production can be useful.

    You must pass your API Key in the X-API-KEY header. The API key should be kept confidential. Anyone with your key can access your documentAI account.

    All API requests must contain a valid API key. Responses will return a 401 Unauthorized status if the key is missing or invalid.

    Authenticating Requests
    curl -X GET /api/v1/hello \
         -H 'X-API-KEY: 46n1Zwy48X95mIfbOjIFO99Dg613KjRu8iFA4bAr'
    
    Your API Key

    You must include a valid API key with each request in a header X-API-KEY.

    You can create and see your existing API keys in your Console.

    Collections

    A collection represents a set of documents that are used for retrieval-augmented generation. Collections allow you to organize and manage documents that are relevant to a particular topic or domain.

    Endpoints
    POST

    /v1/collections

    GET

    /v1/collections/:collectionId

    POST

    /v1/collections/:collectionId

    DELETE

    /v1/collections/:collectionId

    GET

    /v1/collections/:collectionId/documents

    Collection Object
    Attributes

    id
    string
    Unique identifier for the object.
    name
    string
    optional
    Name of the collection.
    documentCount
    int
    Number of documents within the collection.
    created
    timestamp
    Date when the collection was created.
    updated
    timestamp
    Date when the collection was last updated, this includes adding or removing documents.
    metadata
    dictionary
    A set of user provided key value pairs.

    Collection Object
    {
      "id": "aeae7c62-90a9-4793-9a34-af6c8972e0f1",
      "name": "My Collection",
      "documentCount": 500,
      "created": "2023-09-03T12:22:36.291Z",
      "updated": "2023-09-03T12:22:36.291Z",
      "metadata": {
        "key1": "value1",
        "key2": "value2",
        "key3": "value3"
      }
    }
    
    Create Collection

    Collections can either be explicitly created or implicitly when uploading, downloading a document. name and metadata can be updated at a later date.

    Attributes

    name
    string
    optional
    Name of the collection.
    metadata
    dictionary
    optional
    A set of user provided key value pairs.

    POST/v1/collections
    curl -X POST /v1/collections \
         -H 'X-API-KEY: 46n1Zwy48X95mIfbOjIFO99Dg613KjRu8iFA4bAr' \
         -d '{"name": "My Collection"}'
    
    Response
    {
      "id": "aeae7c62-90a9-4793-9a34-af6c8972e0f1",
      "name": "My Collection",
      "documentCount": 0,
      "created": "2023-09-03T12:22:36.291Z",
      "updated": "2023-09-03T12:22:36.291Z",
      "metadata": {}
    }
    
    Get Collection

    Retrieves collection metadata.

    Path Parameters

    collectionId
    string
    ID of the collection.

    GET/v1/collections/:collectionId
    curl -X GET /v1/collections/aeae7c62-90a9-4793-9a34-af6c8972e0f1 \
         -H 'X-API-KEY: 46n1Zwy48X95mIfbOjIFO99Dg613KjRu8iFA4bAr'
    
    Response
    {
      "id": "aeae7c62-90a9-4793-9a34-af6c8972e0f1",
      "name": "My Collection",
      "documentCount": 500,
      "created": "2023-09-03T12:22:36.291Z",
      "updated": "2023-09-03T12:22:36.291Z",
      "metadata": {
        "key1": "value1",
        "key2": "value2",
        "key3": "value3"
      }
    }
    
    Update Collection

    Updates the specific collection by setting the values of the parameters passed. Any parameters not provided will be left unchanged.

    Path Parameters

    collectionId
    string
    ID of the collection.

    Attributes

    name
    string
    optional
    Name of the collection.
    metadata
    dictionary
    optional
    A set of user provided key value pairs.

    POST/v1/collections/:collectionId
    curl -X POST /v1/collections/aeae7c62-90a9-4793-9a34-af6c8972e0f1 \
         -H 'X-API-KEY: 46n1Zwy48X95mIfbOjIFO99Dg613KjRu8iFA4bAr' \
         -d '{"name": "Updated Collection", "metadata": {"newKey": "value"}}'
    
    Response
    {
      "id": "aeae7c62-90a9-4793-9a34-af6c8972e0f1",
      "name": "Updated Collection",
      "documentCount": 500,
      "created": "2023-08-03T12:22:36.291Z",
      "updated": "2023-09-05T20:35:05.456Z",
      "metadata": {
        "newKey": "value"
      }
    }
    
    Delete Collection

    Delete a collection. All documents associated will also be removed.

    Path Parameters

    collectionId
    string
    ID of the collection.

    DELETE/v1/collections/:collectionId
    curl -X DELETE /v1/collections/aeae7c62-90a9-4793-9a34-af6c8972e0f1 \
         -H 'X-API-KEY: 46n1Zwy48X95mIfbOjIFO99Dg613KjRu8iFA4bAr'
    
    Response
    {
      "collectionId": "aeae7c62-90a9-4793-9a34-af6c8972e0f1"
    }
    
    Get Collection Documents

    Retrieves collection metadata.

    Path Parameters

    collectionId
    string
    ID of the collection.

    GET/v1/collections/:collectionId/documents
    curl -X GET /v1/collections/aeae7c62-90a9-4793-9a34-af6c8972e0f1/documents \
         -H 'X-API-KEY: 46n1Zwy48X95mIfbOjIFO99Dg613KjRu8iFA4bAr'
    
    Response
    {
      "id": "aeae7c62-90a9-4793-9a34-af6c8972e0f1",
      "documentCount": 3,
      "documents": [
        "913fd399-95f8-4383-866b-bccc03e406b9", 
        "5db7d87d-9bc5-4dce-b97d-44ba761bf6db",
        "4c2f405b-504f-4007-b1ca-c4fb122e2242"
      ]
    }
    
    Documents

    Documents form the core of a collection.

    Currently these are the supported formats:

    • PDF
    • Text
    • Web Pages
    • Word Documents
    • Excel Spreadsheets
    • CSV files
    Endpoints
    PUT

    /v1/collections/:collectionId/upload

    POST

    /v1/collections/:collectionId/document

    GET

    /v1/collections/:collectionId/crawl/:crawlId

    POST

    /v1/collections/:collectionId/crawl

    GET

    /v1/collections/:collectionId/documents/:documentId

    POST

    /v1/collections/:collectionId/documents/:documentId

    DELETE

    /v1/collections/:collectionId/documents/:documentId

    GET

    /v1/collections/:collectionId/upload/:uploadId

    Document Object
    Attributes
    collectionId
    string

    ID of the parrent collection.

    documentId
    string

    ID of the document.

    metadata
    dictionary
    optional

    A set of user provided key value pairs.

    statusHistory
    list

    A chronological status history of the document, list of Status History.

    status
    dictionary

    The current status of the document. See Status History.

    Status History
    date
    timestamp

    Time when the status was changed.

    message
    string
    optional

    Additional message for the status. Used to display error message.

    status
    string

    Document status, one of:

    QUEUED
    string

    The document is queued to be processed.

    UPLOADED
    string

    The document has been uploaded and is waiting to be processed.

    PROCESSING
    string

    The document is being processed.

    READY
    string

    The document has been processed and will be searchable.

    ERROR
    string

    There was an issue processing the document.

    Document Object
    {
      "collectionId": "d0a42f20-5338-41c4-8c54-36e4a9df4a3f",
      "documentId": "b0dd63b1-f78c-4989-94f9-f73768bb12dd",
      "metadata": {
        "url": "https://documentai.dev/showcase/chat-bot"
      },
      "statusHistory": [
        {
          "date": "2023-09-03T12:22:36.291Z",
          "status": "QUEUED"
        },
        {
          "date": "2023-09-03T12:22:47.817Z",
          "status": "UPLOADED"
        },
        {
          "date": "2023-09-03T12:22:55.979Z",
          "status": "READY"
        }
      ],
      "status": {
        "date": "2023-09-03T12:22:55.979Z",
        "status": "READY"
      }
    }
    
    Upload Document

    To upload a document use multipart/form-data. The maximum file size is 50MB. Documents can be uploaded in a batch.

    Path Parameters

    collectionId
    string
    ID of the collection.

    PUT/v1/collections/:collectionId/upload
    curl -X PUT /v1/collections/mycollection/upload \
         -H 'X-API-KEY: 46n1Zwy48X95mIfbOjIFO99Dg613KjRu8iFA4bAr' \
         --form file=may_report.pdf \
         --form file=june_report.pdf
    
    Response
    {
    	"collectionId": "mycollection",
    	"uploadId": "6f207f16-c30b-47ef-9a58-efea9df9ae73"
    }
    
    Check Upload

    Retrieves the current status of an upload.

    Path Parameters
    collectionId
    string

    ID of the collection.

    uploadId
    string

    ID of the upload.

    GET/v1/collections/:collectionId/upload/:uploadId
    curl -X GET /v1/collections/mycollection/upload/6f207f16-c30b-47ef-9a58-efea9df9ae73 \
         -H 'X-API-KEY: 46n1Zwy48X95mIfbOjIFO99Dg613KjRu8iFA4bAr'
    
    Response
    {
      "collectionId": "mycollection",
      "uploadId": "6f207f16-c30b-47ef-9a58-efea9df9ae73",
      "documents": [
        {
          "documentId": "888329be-f438-42b8-abda-4f8ee59930dd",
          "status": {
            "date": "2023-09-03T12:22:55.979Z",
            "status": "READY"
          },
          "metadata": {
            "filename": "may_report.pdf"
          }
        },
        {
          "documentId": "9531b0e3-472c-464a-87ef-9b0934c038fb",
          "status": {
            "date": "2023-09-03T12:22:55.979Z",
            "status": "UPLOADED"
          },
          "metadata": {
            "filename": "june_report.pdf"
          }
        }
      ]
    }
    
    Add Remote Document

    Retieves a remote document for processing.

    Path Parameters

    collectionId
    string
    ID of the collection.

    Attributes
    url
    string

    URL of the remote document.

    POST/v1/collections/:collectionId/document
    curl -X POST /v1/collections/mycollection/document \
         -H 'X-API-KEY: 46n1Zwy48X95mIfbOjIFO99Dg613KjRu8iFA4bAr' \
         -d '{"url": "https://documentai.dev/showcase/chat-bot"}'
    
    Response
    {
      "collectionId": "d0a42f20-5338-41c4-8c54-36e4a9df4a3f",
      "documentId": "b0dd63b1-f78c-4989-94f9-f73768bb12dd",
      "status": {
        "date": "2023-09-03T12:22:36.291790Z",
        "status": "QUEUED"
      }
    }
    
    Crawl Documents

    Crawls a url. It will only craw urls which are children of the parent url meaning given https://example.com/abc it will only crawl pages which start with https://example.com/abc for example https://example.com/abc/def.

    Path Parameters

    collectionId
    string
    ID of the collection.

    Attributes
    url
    string

    Starting URL to crawl.

    maxDepth
    integer
    optional

    Maxium depth to crawl.

    maxDocuments
    integer
    optional

    Maxium number of processed documents.

    Crawl Status
    QUEUED
    string

    The crawl job is queued.

    CRAWLING
    string

    The documents are being crawled.

    READY
    string

    The crawl job is complete without any errors.

    ERROR
    string

    The crawl job completed with errors. Check individual documents for more info.

    POST/v1/collections/:collectionId/crawl
    curl -X POST /v1/collections/mycollection/crawl \
         -H 'X-API-KEY: 46n1Zwy48X95mIfbOjIFO99Dg613KjRu8iFA4bAr' \
         -d '{"url": "https://documentai.dev/docs", "maxDepth": 5, "maxDocuments": 100}'
    
    Response
    {
      "collectionId": "d0a42f20-5338-41c4-8c54-36e4a9df4a3f",
      "crawlId": "f08594bd-4486-4918-b237-0017b1fd2d6c",
      "status": {
        "date": "2023-09-03T12:22:36.291790Z",
        "status": "CRAWLING"
      }
    }
    
    Check Crawl

    Checks status of a crawl job.

    Path Parameters

    collectionId
    string
    ID of the collection.
    crawlId
    string
    ID of the crawl job.

    GET/v1/collections/:collectionId/crawl/:crawlId
    curl -X GET /v1/collections/mycollection/crawl/ \
         -H 'X-API-KEY: 46n1Zwy48X95mIfbOjIFO99Dg613KjRu8iFA4bAr' \
    
    Response
    {
      "collectionId": "d0a42f20-5338-41c4-8c54-36e4a9df4a3f",
      "crawlId": "f08594bd-4486-4918-b237-0017b1fd2d6c",
      "statusHistory": [
        {
          "date": "2023-09-03T12:22:36.291Z",
          "status": "QUEUED"
        },
        {
          "date": "2023-09-03T12:22:47.817Z",
          "status": "CRAWLING"
        },
        {
          "date": "2023-09-03T12:22:55.979Z",
          "status": "READY"
        }
      ],
      "status": {
        "date": "2023-09-03T12:22:36.291790Z",
        "status": "CRAWLING"
      },
      "documents": [
        {
          "documentId": "888329be-f438-42b8-abda-4f8ee59930dd",
          "status": {
            "date": "2023-09-03T12:22:55.979Z",
            "status": "READY"
          },
          "metadata": {
            "url": "https://documentai.dev/docs/collections"
          }
        },
        {
          "documentId": "9531b0e3-472c-464a-87ef-9b0934c038fb",
          "status": {
            "date": "2023-09-03T12:22:55.979Z",
            "status": "UPLOADED"
          },
          "metadata": {
            "url": "https://documentai.dev/docs/documents"
          }
        }
      ]
    }
    
    Get Document

    Retrieve a document.

    Path Parameters
    collectionId
    string

    ID of the collection.

    documentId
    string

    ID of the document.

    GET/v1/collections/:collectionId/documents/:documentId
    curl -X GET /v1/collections/mycollection/documents/b0dd63b1-f78c-4989-94f9-f73768bb12dd \
         -H 'X-API-KEY: 46n1Zwy48X95mIfbOjIFO99Dg613KjRu8iFA4bAr'
    
    RESPONSE
    {
      "collectionId": "d0a42f20-5338-41c4-8c54-36e4a9df4a3f",
      "documentId": "b0dd63b1-f78c-4989-94f9-f73768bb12dd",
      "metadata": {
        "url": "https://documentai.dev/showcase/chat-bot"
      },
      "statusHistory": [
        {
          "date": "2023-09-03T12:22:36.291Z",
          "status": "QUEUED"
        },
        {
          "date": "2023-09-03T12:22:47.817Z",
          "status": "UPLOADED"
        },
        {
          "date": "2023-09-03T12:22:55.979Z",
          "status": "READY"
        }
      ],
      "status": {
        "date": "2023-09-03T12:22:55.979Z",
        "status": "READY"
      }
    }
    
    Update Document

    Updates document metadata. Any parameters not provided will be left unchanged.

    Path Parameters

    collectionId
    string
    ID of the collection.
    documentId
    string
    ID of the document.

    Attributes
    metadata
    dictionary
    optional

    A set of user provided key value pairs.

    POST/v1/collections/:collectionId/documents/:documentId
    curl -X POST /v1/collections/mycollection/documents/b0dd63b1-f78c-4989-94f9-f73768bb12dd \
         -H 'X-API-KEY: 46n1Zwy48X95mIfbOjIFO99Dg613KjRu8iFA4bAr' \
         -d '{"metadata": {"newKey": "value"}}'
    
    Response
    {
      "collectionId": "d0a42f20-5338-41c4-8c54-36e4a9df4a3f",
      "documentId": "b0dd63b1-f78c-4989-94f9-f73768bb12dd",
      "metadata": {
        "newKey": "value"
      },
      "status": {
        "date": "2023-09-03T12:22:36.291790Z",
        "status": "READY"
      },
      "created": "2023-08-03T12:22:36.291Z",
      "updated": "2023-09-05T20:35:05.456Z"
    }
    
    Delete Document

    Delete a document and remove it from the knowledge base.

    Path Parameters
    collectionId
    string

    ID of the collection.

    documentId
    string

    ID of the document.

    DELETE/v1/collections/:collectionId/documents/:documentId
    curl -X DELETE /v1/collections/mycollection/documents/b0dd63b1-f78c-4989-94f9-f73768bb12dd \
         -H 'X-API-KEY: 46n1Zwy48X95mIfbOjIFO99Dg613KjRu8iFA4bAr'
    
    RESPONSE
    {
      "collectionId": "d0a42f20-5338-41c4-8c54-36e4a9df4a3f",
      "documentId": "b0dd63b1-f78c-4989-94f9-f73768bb12dd"
    }
    
    Chat

    Exposes the conversational interface to a collection. Chats are created dynamically and are scoped to a collection.

    Endpoints
    POST

    /v1/collections/:collectionId/chat/:chatId

    GET

    /v1/collections/:collectionId/chat/:chatId

    Chat Object
    Attributes

    id
    string
    Unique identifier for the object.
    date
    timestamp
    Date of tge message.
    content
    string
    The body of the message.
    context
    list
    List of retrieved context chunks, see Message Context

    Message Context

    collectionId
    string
    ID of the collection.
    documentId
    string
    ID of the document.
    chunkId
    string
    ID of the chunk.
    content
    string
    The chunk content.

    metadata
    dictionary
    optional

    A set of user provided key value pairs.

    Sender Type
    ASSISTANT
    string

    The message was sent as a response from the system.

    USER
    string

    The message was sent by a user.

    Chat Object
    {
      "id": "553beb15-9f2b-4c97-857f-94963bbce84f",
      "date": "2023-09-03T12:24:09.026440Z",
      "content": "Here are some benefits of Jina:\n\n1. Scalability: Jina is designed to be highly scalable, allowing it to handle large-scale and distributed deployments. It can efficiently handle processing large volumes of data and handle heavy workloads.\n\n2. Modularity and Extensibility: Jina follows a modular architecture, making it easy to customize and extend its functionality according to specific requirements. Users can incorporate their own models, algorithms, and components into the framework, enabling flexibility and adaptability.\n\n3. Multi-modal Support: Jina supports processing and searching across various types of data, including images, text, audio, and video. This multi-modal support enables building applications that can handle diverse types of data and perform cross-modal search or retrieval tasks.\n\n4. Ease of Integration: Jina provides well-documented APIs and interfaces, making it simple to integrate with existing systems and frameworks. It supports integration with popular frameworks and tools, such as TensorFlow and PyTorch, facilitating seamless integration into existing workflows.\n\n5. Developer-Friendly: Jina offers a developer-friendly environment and toolset, providing easy-to-use interfaces, visualizations, and debugging tools. This makes it easier for developers to build and experiment with different models and configurations.\n\n6. Community and Support: Jina has an active and growing community of developers and users who contribute to its development and provide support. This community-driven ecosystem ensures access to resources, tutorials, and discussions, fostering collaboration and knowledge-sharing.\n\nThese benefits collectively make Jina a powerful framework for building scalable and modular search and retrieval systems across different modalities, with ease of integration and extensive community support.",
      "context": [
        {
          "collectionId": "d0a42f20-5338-41c4-8c54-36e4a9df4a3f",
          "documentId": "b0dd63b1-f78c-4989-94f9-f73768bb12dd",
          "chunkId": "348a8973-4fc6-42ba-a557-0d89ea8aef54",
          "content": "jina-base-v1 consistently demonstrates perfor-\nmances akin to or better than gtr-t5-base, which\nwas trained specifically for retrieval tasks [Ni et al.,\n2022b]. However, it seldom matches the scores of\nsentence-t5-base, which was trained on sentence\nsimilarity tasks [Ni et al., 2022a].\nThe evaluation of model performances on re-\ntrieval tasks, presented in Table 8, reflects a similar\nrelationship among gtr-t5, sentence-t5, and JINA\nEMBEDDINGS. Here, gtr-t5 models, which have\nbeen specially trained on retrieval tasks, consis-\ntently score the highest for their respective sizes.\nJINA EMBEDDINGS models follow closely behind,\nwhereas sentence-t5 models trail significantly. The\nJINA EMBEDDINGS set’s capability to maintain\ncompetitive scores across these tasks underscores\nthe advantage of multi-task training.\nAs illustrated in Table 7, jina-large-v1 also\nachieves exceedingly high scores on reranking\ntasks, often outperforming larger models. Similarly,",
          "metadata": {}
        },
        {
          "collectionId": "d0a42f20-5338-41c4-8c54-36e4a9df4a3f",
          "documentId": "b0dd63b1-f78c-4989-94f9-f73768bb12dd",
          "chunkId": "bf3d056f-073b-46e2-b08b-7914fc511ba8",
          "content": "JINA EMBEDDINGS: A Novel Set of High-Performance Sentence\nEmbedding Models\nMichael Günther and Louis Milliken and Jonathan Geuter\nGeorgios Mastrapas and Bo Wang and Han Xiao\nJina AI\nOhlauer Str. 43, 10999 Berlin, Germany\n{michael.guenther,louis.milliken,jonathan.geuter,\ngeorgios.mastrapas,bo.wang,han.xiao}@jina.ai\nAbstract\nJINA EMBEDDINGS constitutes a set of high-\nperformance sentence embedding models adept\nat translating various textual inputs into numer-\nical representations, thereby capturing the se-\nmantic essence of the text. The models excel\nin applications such as dense retrieval and se-\nmantic textual similarity. This paper details the\ndevelopment of JINA EMBEDDINGS, starting\nwith the creation of high-quality pairwise and\ntriplet datasets. It underlines the crucial role of\ndata cleaning in dataset preparation, gives in-\ndepth insights into the model training process,\nand concludes with a comprehensive perfor-\nmance evaluation using the Massive Textual",
          "metadata": {}
        },
        {
          "collectionId": "d0a42f20-5338-41c4-8c54-36e4a9df4a3f",
          "documentId": "b0dd63b1-f78c-4989-94f9-f73768bb12dd",
          "chunkId": "858ec3d6-770f-40ac-a203-5dd219c1482d",
          "content": "function for training sentence embedding models,\nand the impact of increasing parameters on per-\nformance. This paper addresses these challenges\nand makes substantial contributions in the field of\nsentence embeddings.\nWe introduce a novel dataset developed specif-\nically for training our sentence embedding mod-\nels. To sensitize our models to distinguish nega-\ntions of statements from conforming statements,\nwe designed a dataset specifically for this purpose\nand included it into the training data. Addition-\nally, we present JINA EMBEDDINGS, a set of high-\nperformance sentence embedding models trained\non our dataset. The JINA EMBEDDINGS set is ex-\npected to comprise five distinct models, ranging in\nsize from 35 million to 6 billion parameters. Three\nof those models are already trained and published. 1\nThe rest is expected to appear soon.\nThe models in the JINA EMBEDDINGS set\nemploy contrastive training on the T5 architec-\nture [Raffel et al., 2020]. It’s important to note",
          "metadata": {}
        }
      ]
    }
    
    Message Chat

    Sends a new message to the current chat.

    Path Parameters

    chatId
    string
    ID of the chat.

    Attributes

    message
    string
    The body of the message.

    POST/v1/collections/:collectionId/chat/:chatId
    curl -X POST /v1/collections/mycollection/chat/mychat \
         -H 'X-API-KEY: 46n1Zwy48X95mIfbOjIFO99Dg613KjRu8iFA4bAr' \
         -d '{"message": "What are the benefits of Jina?"}'
    
    RESPONSE
    {
      "sender": "ASSISTANT",
      "message": {
        "id": "553beb15-9f2b-4c97-857f-94963bbce84f",
        "date": "2023-09-03T12:24:09.026440Z",
        "content": "Here are some benefits of Jina:\n\n1. Scalability: Jina is designed to be highly scalable, allowing it to handle large-scale and distributed deployments. It can efficiently handle processing large volumes of data and handle heavy workloads.\n\n2. Modularity and Extensibility: Jina follows a modular architecture, making it easy to customize and extend its functionality according to specific requirements. Users can incorporate their own models, algorithms, and components into the framework, enabling flexibility and adaptability.\n\n3. Multi-modal Support: Jina supports processing and searching across various types of data, including images, text, audio, and video. This multi-modal support enables building applications that can handle diverse types of data and perform cross-modal search or retrieval tasks.\n\n4. Ease of Integration: Jina provides well-documented APIs and interfaces, making it simple to integrate with existing systems and frameworks. It supports integration with popular frameworks and tools, such as TensorFlow and PyTorch, facilitating seamless integration into existing workflows.\n\n5. Developer-Friendly: Jina offers a developer-friendly environment and toolset, providing easy-to-use interfaces, visualizations, and debugging tools. This makes it easier for developers to build and experiment with different models and configurations.\n\n6. Community and Support: Jina has an active and growing community of developers and users who contribute to its development and provide support. This community-driven ecosystem ensures access to resources, tutorials, and discussions, fostering collaboration and knowledge-sharing.\n\nThese benefits collectively make Jina a powerful framework for building scalable and modular search and retrieval systems across different modalities, with ease of integration and extensive community support.",
        "context": [
          {
            "collectionId": "d0a42f20-5338-41c4-8c54-36e4a9df4a3f",
            "documentId": "b0dd63b1-f78c-4989-94f9-f73768bb12dd",
            "chunkId": "348a8973-4fc6-42ba-a557-0d89ea8aef54",
            "content": "jina-base-v1 consistently demonstrates perfor-\nmances akin to or better than gtr-t5-base, which\nwas trained specifically for retrieval tasks [Ni et al.,\n2022b]. However, it seldom matches the scores of\nsentence-t5-base, which was trained on sentence\nsimilarity tasks [Ni et al., 2022a].\nThe evaluation of model performances on re-\ntrieval tasks, presented in Table 8, reflects a similar\nrelationship among gtr-t5, sentence-t5, and JINA\nEMBEDDINGS. Here, gtr-t5 models, which have\nbeen specially trained on retrieval tasks, consis-\ntently score the highest for their respective sizes.\nJINA EMBEDDINGS models follow closely behind,\nwhereas sentence-t5 models trail significantly. The\nJINA EMBEDDINGS set’s capability to maintain\ncompetitive scores across these tasks underscores\nthe advantage of multi-task training.\nAs illustrated in Table 7, jina-large-v1 also\nachieves exceedingly high scores on reranking\ntasks, often outperforming larger models. Similarly,",
            "metadata": {}
          },
          {
            "collectionId": "d0a42f20-5338-41c4-8c54-36e4a9df4a3f",
            "documentId": "b0dd63b1-f78c-4989-94f9-f73768bb12dd",
            "chunkId": "bf3d056f-073b-46e2-b08b-7914fc511ba8",
            "content": "JINA EMBEDDINGS: A Novel Set of High-Performance Sentence\nEmbedding Models\nMichael Günther and Louis Milliken and Jonathan Geuter\nGeorgios Mastrapas and Bo Wang and Han Xiao\nJina AI\nOhlauer Str. 43, 10999 Berlin, Germany\n{michael.guenther,louis.milliken,jonathan.geuter,\ngeorgios.mastrapas,bo.wang,han.xiao}@jina.ai\nAbstract\nJINA EMBEDDINGS constitutes a set of high-\nperformance sentence embedding models adept\nat translating various textual inputs into numer-\nical representations, thereby capturing the se-\nmantic essence of the text. The models excel\nin applications such as dense retrieval and se-\nmantic textual similarity. This paper details the\ndevelopment of JINA EMBEDDINGS, starting\nwith the creation of high-quality pairwise and\ntriplet datasets. It underlines the crucial role of\ndata cleaning in dataset preparation, gives in-\ndepth insights into the model training process,\nand concludes with a comprehensive perfor-\nmance evaluation using the Massive Textual",
            "metadata": {}
          },
          {
            "collectionId": "d0a42f20-5338-41c4-8c54-36e4a9df4a3f",
            "documentId": "b0dd63b1-f78c-4989-94f9-f73768bb12dd",
            "chunkId": "858ec3d6-770f-40ac-a203-5dd219c1482d",
            "content": "function for training sentence embedding models,\nand the impact of increasing parameters on per-\nformance. This paper addresses these challenges\nand makes substantial contributions in the field of\nsentence embeddings.\nWe introduce a novel dataset developed specif-\nically for training our sentence embedding mod-\nels. To sensitize our models to distinguish nega-\ntions of statements from conforming statements,\nwe designed a dataset specifically for this purpose\nand included it into the training data. Addition-\nally, we present JINA EMBEDDINGS, a set of high-\nperformance sentence embedding models trained\non our dataset. The JINA EMBEDDINGS set is ex-\npected to comprise five distinct models, ranging in\nsize from 35 million to 6 billion parameters. Three\nof those models are already trained and published. 1\nThe rest is expected to appear soon.\nThe models in the JINA EMBEDDINGS set\nemploy contrastive training on the T5 architec-\nture [Raffel et al., 2020]. It’s important to note",
            "metadata": {}
          }
        ]
      }
    }
    
    Retrieve Chat

    Gets all message for a chat.

    Path Parameters

    chatId
    string
    ID of the chat.

    GET/v1/collections/:collectionId/chat/:chatId
    curl -X GET /v1/collections/mycollection/chat/mychat \
         -H 'X-API-KEY: 46n1Zwy48X95mIfbOjIFO99Dg613KjRu8iFA4bAr'
    
    RESPONSE
    {
      "messages": [
        {
          "sender": "USER",
          "id": "d595de0e-4664-43a6-b03b-396bb05933a4",
          "date": "2023-09-03T11:24:09.026440Z",
          "content": "What are the benefits of Jina?",
          "context": []
        },
        {
          "sender": "ASSISTANT",
          "message": {
            "id": "553beb15-9f2b-4c97-857f-94963bbce84f",
            "date": "2023-09-03T12:24:09.026440Z",
            "content": "Here are some benefits of Jina:\n\n1. Scalability: Jina is designed to be highly scalable, allowing it to handle large-scale and distributed deployments. It can efficiently handle processing large volumes of data and handle heavy workloads.\n\n2. Modularity and Extensibility: Jina follows a modular architecture, making it easy to customize and extend its functionality according to specific requirements. Users can incorporate their own models, algorithms, and components into the framework, enabling flexibility and adaptability.\n\n3. Multi-modal Support: Jina supports processing and searching across various types of data, including images, text, audio, and video. This multi-modal support enables building applications that can handle diverse types of data and perform cross-modal search or retrieval tasks.\n\n4. Ease of Integration: Jina provides well-documented APIs and interfaces, making it simple to integrate with existing systems and frameworks. It supports integration with popular frameworks and tools, such as TensorFlow and PyTorch, facilitating seamless integration into existing workflows.\n\n5. Developer-Friendly: Jina offers a developer-friendly environment and toolset, providing easy-to-use interfaces, visualizations, and debugging tools. This makes it easier for developers to build and experiment with different models and configurations.\n\n6. Community and Support: Jina has an active and growing community of developers and users who contribute to its development and provide support. This community-driven ecosystem ensures access to resources, tutorials, and discussions, fostering collaboration and knowledge-sharing.\n\nThese benefits collectively make Jina a powerful framework for building scalable and modular search and retrieval systems across different modalities, with ease of integration and extensive community support.",
            "context": [
              {
                "collectionId": "d0a42f20-5338-41c4-8c54-36e4a9df4a3f",
                "documentId": "b0dd63b1-f78c-4989-94f9-f73768bb12dd",
                "chunkId": "348a8973-4fc6-42ba-a557-0d89ea8aef54",
                "content": "jina-base-v1 consistently demonstrates perfor-\nmances akin to or better than gtr-t5-base, which\nwas trained specifically for retrieval tasks [Ni et al.,\n2022b]. However, it seldom matches the scores of\nsentence-t5-base, which was trained on sentence\nsimilarity tasks [Ni et al., 2022a].\nThe evaluation of model performances on re-\ntrieval tasks, presented in Table 8, reflects a similar\nrelationship among gtr-t5, sentence-t5, and JINA\nEMBEDDINGS. Here, gtr-t5 models, which have\nbeen specially trained on retrieval tasks, consis-\ntently score the highest for their respective sizes.\nJINA EMBEDDINGS models follow closely behind,\nwhereas sentence-t5 models trail significantly. The\nJINA EMBEDDINGS set’s capability to maintain\ncompetitive scores across these tasks underscores\nthe advantage of multi-task training.\nAs illustrated in Table 7, jina-large-v1 also\nachieves exceedingly high scores on reranking\ntasks, often outperforming larger models. Similarly,",
                "metadata": {}
              },
              {
                "collectionId": "d0a42f20-5338-41c4-8c54-36e4a9df4a3f",
                "documentId": "b0dd63b1-f78c-4989-94f9-f73768bb12dd",
                "chunkId": "bf3d056f-073b-46e2-b08b-7914fc511ba8",
                "content": "JINA EMBEDDINGS: A Novel Set of High-Performance Sentence\nEmbedding Models\nMichael Günther and Louis Milliken and Jonathan Geuter\nGeorgios Mastrapas and Bo Wang and Han Xiao\nJina AI\nOhlauer Str. 43, 10999 Berlin, Germany\n{michael.guenther,louis.milliken,jonathan.geuter,\ngeorgios.mastrapas,bo.wang,han.xiao}@jina.ai\nAbstract\nJINA EMBEDDINGS constitutes a set of high-\nperformance sentence embedding models adept\nat translating various textual inputs into numer-\nical representations, thereby capturing the se-\nmantic essence of the text. The models excel\nin applications such as dense retrieval and se-\nmantic textual similarity. This paper details the\ndevelopment of JINA EMBEDDINGS, starting\nwith the creation of high-quality pairwise and\ntriplet datasets. It underlines the crucial role of\ndata cleaning in dataset preparation, gives in-\ndepth insights into the model training process,\nand concludes with a comprehensive perfor-\nmance evaluation using the Massive Textual",
                "metadata": {}
              },
              {
                "collectionId": "d0a42f20-5338-41c4-8c54-36e4a9df4a3f",
                "documentId": "b0dd63b1-f78c-4989-94f9-f73768bb12dd",
                "chunkId": "858ec3d6-770f-40ac-a203-5dd219c1482d",
                "content": "function for training sentence embedding models,\nand the impact of increasing parameters on per-\nformance. This paper addresses these challenges\nand makes substantial contributions in the field of\nsentence embeddings.\nWe introduce a novel dataset developed specif-\nically for training our sentence embedding mod-\nels. To sensitize our models to distinguish nega-\ntions of statements from conforming statements,\nwe designed a dataset specifically for this purpose\nand included it into the training data. Addition-\nally, we present JINA EMBEDDINGS, a set of high-\nperformance sentence embedding models trained\non our dataset. The JINA EMBEDDINGS set is ex-\npected to comprise five distinct models, ranging in\nsize from 35 million to 6 billion parameters. Three\nof those models are already trained and published. 1\nThe rest is expected to appear soon.\nThe models in the JINA EMBEDDINGS set\nemploy contrastive training on the T5 architec-\nture [Raffel et al., 2020]. It’s important to note",
                "metadata": {}
              }
            ]
          }
        }
      ]
    }
    
    Query

    Querying exposes semantic search of a collection.

    Endpoints
    POST

    /v1/collections/:collectionId/query

    Query Object
    Attributes

    context
    list
    Query results, see Message Context.

    QUERY OBJECT
    {
      "context": [
        {
          "collectionId": "d0a42f20-5338-41c4-8c54-36e4a9df4a3f",
          "documentId": "b0dd63b1-f78c-4989-94f9-f73768bb12dd",
          "chunkId": "348a8973-4fc6-42ba-a557-0d89ea8aef54",
          "content": "jina-base-v1 consistently demonstrates perfor-\nmances akin to or better than gtr-t5-base, which\nwas trained specifically for retrieval tasks [Ni et al.,\n2022b]. However, it seldom matches the scores of\nsentence-t5-base, which was trained on sentence\nsimilarity tasks [Ni et al., 2022a].\nThe evaluation of model performances on re-\ntrieval tasks, presented in Table 8, reflects a similar\nrelationship among gtr-t5, sentence-t5, and JINA\nEMBEDDINGS. Here, gtr-t5 models, which have\nbeen specially trained on retrieval tasks, consis-\ntently score the highest for their respective sizes.\nJINA EMBEDDINGS models follow closely behind,\nwhereas sentence-t5 models trail significantly. The\nJINA EMBEDDINGS set’s capability to maintain\ncompetitive scores across these tasks underscores\nthe advantage of multi-task training.\nAs illustrated in Table 7, jina-large-v1 also\nachieves exceedingly high scores on reranking\ntasks, often outperforming larger models. Similarly,",
          "metadata": {}
        },
        {
          "collectionId": "d0a42f20-5338-41c4-8c54-36e4a9df4a3f",
          "documentId": "b0dd63b1-f78c-4989-94f9-f73768bb12dd",
          "chunkId": "bf3d056f-073b-46e2-b08b-7914fc511ba8",
          "content": "JINA EMBEDDINGS: A Novel Set of High-Performance Sentence\nEmbedding Models\nMichael Günther and Louis Milliken and Jonathan Geuter\nGeorgios Mastrapas and Bo Wang and Han Xiao\nJina AI\nOhlauer Str. 43, 10999 Berlin, Germany\n{michael.guenther,louis.milliken,jonathan.geuter,\ngeorgios.mastrapas,bo.wang,han.xiao}@jina.ai\nAbstract\nJINA EMBEDDINGS constitutes a set of high-\nperformance sentence embedding models adept\nat translating various textual inputs into numer-\nical representations, thereby capturing the se-\nmantic essence of the text. The models excel\nin applications such as dense retrieval and se-\nmantic textual similarity. This paper details the\ndevelopment of JINA EMBEDDINGS, starting\nwith the creation of high-quality pairwise and\ntriplet datasets. It underlines the crucial role of\ndata cleaning in dataset preparation, gives in-\ndepth insights into the model training process,\nand concludes with a comprehensive perfor-\nmance evaluation using the Massive Textual",
          "metadata": {}
        },
        {
          "collectionId": "d0a42f20-5338-41c4-8c54-36e4a9df4a3f",
          "documentId": "b0dd63b1-f78c-4989-94f9-f73768bb12dd",
          "chunkId": "858ec3d6-770f-40ac-a203-5dd219c1482d",
          "content": "function for training sentence embedding models,\nand the impact of increasing parameters on per-\nformance. This paper addresses these challenges\nand makes substantial contributions in the field of\nsentence embeddings.\nWe introduce a novel dataset developed specif-\nically for training our sentence embedding mod-\nels. To sensitize our models to distinguish nega-\ntions of statements from conforming statements,\nwe designed a dataset specifically for this purpose\nand included it into the training data. Addition-\nally, we present JINA EMBEDDINGS, a set of high-\nperformance sentence embedding models trained\non our dataset. The JINA EMBEDDINGS set is ex-\npected to comprise five distinct models, ranging in\nsize from 35 million to 6 billion parameters. Three\nof those models are already trained and published. 1\nThe rest is expected to appear soon.\nThe models in the JINA EMBEDDINGS set\nemploy contrastive training on the T5 architec-\nture [Raffel et al., 2020]. It’s important to note",
          "metadata": {}
        }
      ]
    }
    
    Query Collection

    Query a collection. Simply returns matching chunks and their metadata.

    Path Parameters

    collectionId
    string
    ID of the collection.

    Attributes

    query
    string
    Query to search for.

    POST/v1/collections/:collectionId/query
    curl -X POST /v1/collections/mycollection/query \
         -H 'X-API-KEY: 46n1Zwy48X95mIfbOjIFO99Dg613KjRu8iFA4bAr' \
         -d '{"query": "What are the benefits of Jina?"}'
    
    Response
    {
      "context": [
        {
          "collectionId": "d0a42f20-5338-41c4-8c54-36e4a9df4a3f",
          "documentId": "b0dd63b1-f78c-4989-94f9-f73768bb12dd",
          "chunkId": "348a8973-4fc6-42ba-a557-0d89ea8aef54",
          "content": "jina-base-v1 consistently demonstrates perfor-\nmances akin to or better than gtr-t5-base, which\nwas trained specifically for retrieval tasks [Ni et al.,\n2022b]. However, it seldom matches the scores of\nsentence-t5-base, which was trained on sentence\nsimilarity tasks [Ni et al., 2022a].\nThe evaluation of model performances on re-\ntrieval tasks, presented in Table 8, reflects a similar\nrelationship among gtr-t5, sentence-t5, and JINA\nEMBEDDINGS. Here, gtr-t5 models, which have\nbeen specially trained on retrieval tasks, consis-\ntently score the highest for their respective sizes.\nJINA EMBEDDINGS models follow closely behind,\nwhereas sentence-t5 models trail significantly. The\nJINA EMBEDDINGS set’s capability to maintain\ncompetitive scores across these tasks underscores\nthe advantage of multi-task training.\nAs illustrated in Table 7, jina-large-v1 also\nachieves exceedingly high scores on reranking\ntasks, often outperforming larger models. Similarly,",
          "metadata": {}
        },
        {
          "collectionId": "d0a42f20-5338-41c4-8c54-36e4a9df4a3f",
          "documentId": "b0dd63b1-f78c-4989-94f9-f73768bb12dd",
          "chunkId": "bf3d056f-073b-46e2-b08b-7914fc511ba8",
          "content": "JINA EMBEDDINGS: A Novel Set of High-Performance Sentence\nEmbedding Models\nMichael Günther and Louis Milliken and Jonathan Geuter\nGeorgios Mastrapas and Bo Wang and Han Xiao\nJina AI\nOhlauer Str. 43, 10999 Berlin, Germany\n{michael.guenther,louis.milliken,jonathan.geuter,\ngeorgios.mastrapas,bo.wang,han.xiao}@jina.ai\nAbstract\nJINA EMBEDDINGS constitutes a set of high-\nperformance sentence embedding models adept\nat translating various textual inputs into numer-\nical representations, thereby capturing the se-\nmantic essence of the text. The models excel\nin applications such as dense retrieval and se-\nmantic textual similarity. This paper details the\ndevelopment of JINA EMBEDDINGS, starting\nwith the creation of high-quality pairwise and\ntriplet datasets. It underlines the crucial role of\ndata cleaning in dataset preparation, gives in-\ndepth insights into the model training process,\nand concludes with a comprehensive perfor-\nmance evaluation using the Massive Textual",
          "metadata": {}
        },
        {
          "collectionId": "d0a42f20-5338-41c4-8c54-36e4a9df4a3f",
          "documentId": "b0dd63b1-f78c-4989-94f9-f73768bb12dd",
          "chunkId": "858ec3d6-770f-40ac-a203-5dd219c1482d",
          "content": "function for training sentence embedding models,\nand the impact of increasing parameters on per-\nformance. This paper addresses these challenges\nand makes substantial contributions in the field of\nsentence embeddings.\nWe introduce a novel dataset developed specif-\nically for training our sentence embedding mod-\nels. To sensitize our models to distinguish nega-\ntions of statements from conforming statements,\nwe designed a dataset specifically for this purpose\nand included it into the training data. Addition-\nally, we present JINA EMBEDDINGS, a set of high-\nperformance sentence embedding models trained\non our dataset. The JINA EMBEDDINGS set is ex-\npected to comprise five distinct models, ranging in\nsize from 35 million to 6 billion parameters. Three\nof those models are already trained and published. 1\nThe rest is expected to appear soon.\nThe models in the JINA EMBEDDINGS set\nemploy contrastive training on the T5 architec-\nture [Raffel et al., 2020]. It’s important to note",
          "metadata": {}
        }
      ]
    }