Keeping up with the vast and growing body of academic literature is a constant challenge for researchers. Sifting through volumes of paper to find relevant information and insights can be extremely time consuming. This is where AI-powered research assistants integrated with personal literature collections can help. In this post, we'll explore how conversational interfaces and documentAI techniques enable creating chatbots that can understand, summarize, and extract key information from research papers. By uploading your reference papers, you can instantly have a knowledgeable assistant ready to answer questions, make connections, and provide insights to boost your productivity and accelerate discoveries.
A chat interface for a research assistant has several advantages:
First get your API key from the console. Authenticate requests by passing this key in the X-API-KEY header. Learn more about authentication here.
There are 2 options to ingest the papers, you can either download an external document or you can upload it directly.
The easiest way to ingest papers is to download them. See documentation for details.
curl -X POST https://api.documentai.dev/v1/collections/[COLLECTION ID]/document \ -H 'X-API-KEY: [YOUR API KEY]' \ -d '{"url": "[PAPER URL]"}'
Where:
Downloading is asynchronous, the response will contain a documentId, using which you can check status as per documentation.
{ "collectionId": "[COLLECTION ID]", "documentId": "b0dd63b1-f78c-4989-94f9-f73768bb12dd", "status": { "date": "2023-09-03T12:22:36.291790Z", "status": "QUEUED" } }
If you have your papers stored locally you can also upload them directly. To do this you can use the upload API, full reference here.
curl -X PUT https://api.documentai.dev/v1/collections/[COLLECTION ID]/upload \ -H 'X-API-KEY: [YOUR API KEY]' \ --form file=paper1.pdf \ --form file=paper2.docx
Where:
You can upload multiple files if needed in a single request. The upload is asynchronous and you can monitor the status of your upload job using check upload API. The response will contain an uploadId.
{ "collectionId": "[COLLECTION ID]", "uploadId": "6f207f16-c30b-47ef-9a58-efea9df9ae73" }
After the download or upload is completed you can integrate with the chat API. Full reference is available here. The conversational interface provides a natural UX for researchers to ask questions about the content of their uploaded papers. The assistant can leverage the LLMs extensive scientific knowledge to not just retrieve information but provide useful insights that connect concepts across papers. This assists the researcher in quickly understanding key aspects of papers, identifying relationships between them, and deriving higher-level insights from their collection. The assistant augments the researcher's own expertise through the knowledge encoded in the language model.
curl -X POST https://api.documentai.dev/v1/collections/[COLLECTION ID]/chat/[CHAT ID] \ -H 'X-API-KEY: [YOUR API KEY}' \ -d '{"message": [YOUR MESSAGE]}'
Where:
{ "sender": "ASSISTANT", "message": { "id": "553beb15-9f2b-4c97-857f-94963bbce84f", "date": "2023-09-03T12:24:09.026440Z", "content": "Multi-head attention is a variant of the attention mechanism used in the Transformer model. It allows the model to jointly attend to different subspaces of the input information by using multiple sets of attention weights, known as \"attention heads\".\n\nIn the traditional attention mechanism, a single set of query, key, and value vectors is used to compute the attention weights. However, in multi-head attention, the model computes multiple sets of query, key, and value vectors, each corresponding to a different attention head.\n\nThe key idea behind multi-head attention is that different heads can learn to attend to different patterns and capture different types of information. Each attention head has its own set of learned parameters, which allows it to develop its own attention weights independently. This increases the model's capacity to capture diverse and complex patterns in the data.\n\nThe computation in multi-head attention is as follows:\n1. The original input is linearly transformed into separate query, key, and value vectors for each attention head.\n2. For each attention head, the attention weights are computed by taking the dot product of the query and key vectors and applying a softmax function to obtain a distribution over the key-value pairs. These weights determine the importance or relevance of different positions in the input sequence.\n3. The attention weights are then used to compute a weighted sum of the value vectors, producing the output representation for each attention head.\n4. The outputs from all attention heads are concatenated and linearly transformed to obtain the final output representation.\n\nBy using multiple attention heads, the model can effectively attend to different aspects of the input and capture more fine-grained relationships. This allows the model to better handle varied input patterns and improve its ability to learn complex dependencies in the data.\n\nOverall, multi-head attention enhances the expressiveness and representational power of the Transformer model, leading to improved performance on a wide range of natural language processing tasks.", "context": [ { "collectionId": "[COLECTION ID]", "documentId": "b0dd63b1-f78c-4989-94f9-f73768bb12dd", "chunkId": "348a8973-4fc6-42ba-a557-0d89ea8aef54", "content": "Figure 5: Many of the attention heads exhibit behaviour that seems related to the structure of the\nsentence. We give two such examples above, from two different heads from the encoder self-attention\nat layer 5 of 6. The heads clearly learned to perform different tasks.\n15", "metadata": {} }, { "collectionId": "[COLECTION ID]", "documentId": "b0dd63b1-f78c-4989-94f9-f73768bb12dd", "chunkId": "bf3d056f-073b-46e2-b08b-7914fc511ba8", "content": "i ∈ Rdmodel×dk , WK\ni ∈ Rdmodel×dk , WV\ni ∈ Rdmodel×dv\nand WO ∈ Rhdv×dmodel .\nIn this work we employ h = 8 parallel attention layers, or heads. For each of these we use\ndk = dv = dmodel/h = 64. Due to the reduced dimension of each head, the total computational cost\nis similar to that of single-head attention with full dimensionality.\n3.2.3 Applications of Attention in our Model\nThe Transformer uses multi-head attention in three different ways:\n• In \"encoder-decoder attention\" layers, the queries come from the previous decoder layer,\nand the memory keys and values come from the output of the encoder. This allows every\nposition in the decoder to attend over all positions in the input sequence. This mimics the\ntypical encoder-decoder attention mechanisms in sequence-to-sequence models such as\n[38, 2, 9].\n• The encoder contains self-attention layers. In a self-attention layer all of the keys, values", "metadata": {} }, { "collectionId": "[COLECTION ID]", "documentId": "b0dd63b1-f78c-4989-94f9-f73768bb12dd", "chunkId": "858ec3d6-770f-40ac-a203-5dd219c1482d", "content": "linear projections to dk, dk and dv dimensions, respectively. On each of these projected versions of\nqueries, keys and values we then perform the attention function in parallel, yielding dv-dimensional\n4To illustrate why the dot products get large, assume that the components of q and k are independent random\nvariables with mean 0 and variance 1. Then their dot product, q · k =\n∑dk\ni=1 qiki, has mean 0 and variance dk.\n4\noutput values. These are concatenated and once again projected, resulting in the final values, as\ndepicted in Figure 2.\nMulti-head attention allows the model to jointly attend to information from different representation\nsubspaces at different positions. With a single attention head, averaging inhibits this.\nMultiHead(Q,K, V ) = Concat(head1, ...,headh)W\nO\nwhere headi = Attention(QWQ\ni ,KWK\ni , V WV\ni )\nWhere the projections are parameter matrices WQ\ni ∈ Rdmodel×dk , WK\ni ∈ Rdmodel×dk , WV\ni ∈ Rdmodel×dv\nand WO ∈ Rhdv×dmodel .", "metadata": {} } ] } }
You can also retrive the whole conversation using get chat API.
curl -X GET https://api.documentai.dev/v1/collections/[COLLECTION ID]/chat/[CHAT ID] \ -H 'X-API-KEY: [YOUR API KEY}'
Where:
The response will include both your messages and assistant's messages with the relevant context so you can link back to the source.
{ "messages": [ { "sender": "USER", "id": "d595de0e-4664-43a6-b03b-396bb05933a4", "date": "2023-09-03T11:24:09.026440Z", "content": "What is multi head attention?", "context": [] }, { "sender": "ASSISTANT", "message": { "id": "553beb15-9f2b-4c97-857f-94963bbce84f", "date": "2023-09-03T12:24:09.026440Z", "content": "Multi-head attention is a variant of the attention mechanism used in the Transformer model. It allows the model to jointly attend to different subspaces of the input information by using multiple sets of attention weights, known as \"attention heads\".\n\nIn the traditional attention mechanism, a single set of query, key, and value vectors is used to compute the attention weights. However, in multi-head attention, the model computes multiple sets of query, key, and value vectors, each corresponding to a different attention head.\n\nThe key idea behind multi-head attention is that different heads can learn to attend to different patterns and capture different types of information. Each attention head has its own set of learned parameters, which allows it to develop its own attention weights independently. This increases the model's capacity to capture diverse and complex patterns in the data.\n\nThe computation in multi-head attention is as follows:\n1. The original input is linearly transformed into separate query, key, and value vectors for each attention head.\n2. For each attention head, the attention weights are computed by taking the dot product of the query and key vectors and applying a softmax function to obtain a distribution over the key-value pairs. These weights determine the importance or relevance of different positions in the input sequence.\n3. The attention weights are then used to compute a weighted sum of the value vectors, producing the output representation for each attention head.\n4. The outputs from all attention heads are concatenated and linearly transformed to obtain the final output representation.\n\nBy using multiple attention heads, the model can effectively attend to different aspects of the input and capture more fine-grained relationships. This allows the model to better handle varied input patterns and improve its ability to learn complex dependencies in the data.\n\nOverall, multi-head attention enhances the expressiveness and representational power of the Transformer model, leading to improved performance on a wide range of natural language processing tasks.", "context": [ { "collectionId": "[COLECTION ID]", "documentId": "b0dd63b1-f78c-4989-94f9-f73768bb12dd", "chunkId": "348a8973-4fc6-42ba-a557-0d89ea8aef54", "content": "Figure 5: Many of the attention heads exhibit behaviour that seems related to the structure of the\nsentence. We give two such examples above, from two different heads from the encoder self-attention\nat layer 5 of 6. The heads clearly learned to perform different tasks.\n15", "metadata": {} }, { "collectionId": "[COLECTION ID]", "documentId": "b0dd63b1-f78c-4989-94f9-f73768bb12dd", "chunkId": "bf3d056f-073b-46e2-b08b-7914fc511ba8", "content": "i ∈ Rdmodel×dk , WK\ni ∈ Rdmodel×dk , WV\ni ∈ Rdmodel×dv\nand WO ∈ Rhdv×dmodel .\nIn this work we employ h = 8 parallel attention layers, or heads. For each of these we use\ndk = dv = dmodel/h = 64. Due to the reduced dimension of each head, the total computational cost\nis similar to that of single-head attention with full dimensionality.\n3.2.3 Applications of Attention in our Model\nThe Transformer uses multi-head attention in three different ways:\n• In \"encoder-decoder attention\" layers, the queries come from the previous decoder layer,\nand the memory keys and values come from the output of the encoder. This allows every\nposition in the decoder to attend over all positions in the input sequence. This mimics the\ntypical encoder-decoder attention mechanisms in sequence-to-sequence models such as\n[38, 2, 9].\n• The encoder contains self-attention layers. In a self-attention layer all of the keys, values", "metadata": {} }, { "collectionId": "[COLECTION ID]", "documentId": "b0dd63b1-f78c-4989-94f9-f73768bb12dd", "chunkId": "858ec3d6-770f-40ac-a203-5dd219c1482d", "content": "linear projections to dk, dk and dv dimensions, respectively. On each of these projected versions of\nqueries, keys and values we then perform the attention function in parallel, yielding dv-dimensional\n4To illustrate why the dot products get large, assume that the components of q and k are independent random\nvariables with mean 0 and variance 1. Then their dot product, q · k =\n∑dk\ni=1 qiki, has mean 0 and variance dk.\n4\noutput values. These are concatenated and once again projected, resulting in the final values, as\ndepicted in Figure 2.\nMulti-head attention allows the model to jointly attend to information from different representation\nsubspaces at different positions. With a single attention head, averaging inhibits this.\nMultiHead(Q,K, V ) = Concat(head1, ...,headh)W\nO\nwhere headi = Attention(QWQ\ni ,KWK\ni , V WV\ni )\nWhere the projections are parameter matrices WQ\ni ∈ Rdmodel×dk , WK\ni ∈ Rdmodel×dk , WV\ni ∈ Rdmodel×dv\nand WO ∈ Rhdv×dmodel .", "metadata": {} } ] } } ] }
Hopefully this post has demonstrated the benefits of using AI techniques like document understanding and conversational interfaces to create a knowledgeable research assistant. By integrating these capabilities with your personal literature collection, you can have an assistant ready to enhance your productivity and accelerate discoveries through natural conversational queries. With just a few API calls to documentAI, you can upload papers and start querying them conversationally to uncover connections, derive insights, and enhance understanding.