# DocumentSearchTools

## Overview

`DocumentSearchTools` module is capable of listing, querying, and summarizing documents stored within a vector database. It utilizes the [IEmbeddingStorageService](/ai/components/built-in-services/iembeddingstorageservice.md) to generate Retrieval-Augmented Generation (RAG) content for the AI, ensuring efficient and intelligent information processing.

The preconfigured prompt is this:

{% code overflow="wrap" %}

```ini
#
# DocumentSearchTools
#
[DocumentSearchTools]
Provides tools to search a set of documents to extract the content needed to complete your tasks.
Unless instructed otherwise, use this tool before other tools.

Instructions:
- When referring to a documment always use the full name including the path, i.e. "My documents/report.pdf"

he document name must include the path.

[DocumentSearchTools.query_all_documents]
Queries all documents to extract the most relevant content to answer the user’s question.

[DocumentSearchTools.query_all_documents.question]
Rewrite the user’s question to enhance the search using RAG and embeddings.

[DocumentSearchTools.list_all_documents]
Returns the names of all available documents.

[DocumentSearchTools.query_single_document]
Extracts the most relevant content from a specific document.

[DocumentSearchTools.query_single_document.document_name]
Exact name of the document.

[DocumentSearchTools.query_single_document.question]
Rewrite the user’s question to enhance the search using RAG and embeddings.

[DocumentSearchTools.read_documents_metadata]
Returns the metadata for multiple documents.
Metadata include the title, original file path, document size, number of pages, created date, last modified date, and others.

[DocumentSearchTools.read_documents_metadata.document_names]
Exact names of the documents.

[DocumentSearchTools.summarize_document]
Summarizes a specific document.

[DocumentSearchTools.summarize_document.document_name]
Exact name of the document.

```

{% endcode %}

The DocumentSearchTools class provides multiple methods that the AI can use to search a virtually unlimited storage of unstructured data to accomplish a variety of objectives, enhancing its flexibility and adaptability in processing and analyzing data.

## Using DocumentSearchTools

To enable the use of the [DocumentSearchTools](/ai/components/api/tools/wisej.ai.tools.documentsearchtools.md) simply add it to a SmartHub, SmartAdapter, SmartSession or SmartPrompt.

```csharp
this.smartChatBoxAdapter
    .UseTools(
        new DocumentSearchTools("Financials"));
        
// or

this.smartChatBoxAdapter
    .UseTools(
        new DocumentSearchTools());
        
// or

this.smartChatBoxAdapter
    .UseTools(
        new DocumentSearchTools("Reports", (f) => f.Name.EndsWith(".pdf")));
```

When creating an instance of the `DocumentSearchTools`, you have the option to specify a collection name, which functions similarly to a folder. Additionally, you can provide a filter callback function. This callback will be used to filter or screen the documents processed by the tool, allowing you to control which documents are considered for further operations.

Additionally, you have the ability to override or set several properties that define the operational limits for the `DocumentSearchTools` when retrieving content from a vector storage. These properties allow you to tailor the behavior of the tool, ensuring that it meets your specific requirements for the scope and scale of document retrieval.

## Reranking

To incorporate reranking functionality for re-ordering chunks retrieved by the vector search, you can override the `RerankAsync` method. Within your implementation, you have the flexibility to utilize any reranking approach of your choice. This allows you to tailor the reranking process to best suit your specific requirements and improve the relevance of the search results.

## Properties

<table><thead><tr><th width="207" valign="top">Name</th><th>Description</th></tr></thead><tbody><tr><td valign="top">CollectionName</td><td>Read-write and overridable. Default is null.<br>The <code>CollectionName</code> parameter is utilized to organize documents within the vector storage database into distinct groups, analogous to virtual folders. This allows for structured management and retrieval of documents. When performing a vector query, it is restricted to operate within a specified collection. Should the <code>CollectionName</code> be left null, the system defaults to using the default collection space. This ensures that documents are always associated with a particular collection space, even if one is not explicitly defined.</td></tr><tr><td valign="top">TopN</td><td>Read-write and overridable. Default is 10.<br><code>TopN</code> specifies the maximum number of qualifying chunks that are utilized to construct the RAG (Retrieval-Augmented Generation) context. A chunk is deemed qualified if its similarity score exceeds the threshold defined by <code>MinSimilarity</code>. By setting the <code>TopN</code> value, you determine how many of these high-similarity chunks will be included in the context for further processing, allowing you to balance between performance and accuracy according to your application's requirements.</td></tr><tr><td valign="top">MaxClusters</td><td>Read-write and overridable. Default is 5.<br><code>MaxClusters</code> determines the upper limit on the number of clusters that can be generated by the summarization function, which utilizes the <a href="https://en.wikipedia.org/wiki/K-means_clustering">K-means clustering</a> algorithm.</td></tr><tr><td valign="top">MinSimilarity</td><td>Read-write and overridable. Default is 0.25f.<br>This setting defines the minimum similarity threshold used to filter and select qualified chunks from documents. By establishing this threshold, Wisej.AI can efficiently determine which segments of the document are closely aligned with the desired criteria, enhancing the accuracy and relevance of the selection process.</td></tr><tr><td valign="top">MaxContextTokens</td><td>Read-write and overridable. Default is 4096.<br>This setting specifies the maximum number of tokens that can be returned to the AI within the Retrieval-Augmented Generation (RAG) context string. By limiting the token count, you ensure that the context provided to the AI remains concise and manageable.</td></tr><tr><td valign="top">MaxDocumentsSearch</td><td>Read-write and overridable. Default is 100.<br>The MaxDocumentSearch setting restricts the number of documents returned when conducting a generic search query. For example, if you search for documents related to quantum computing research, and the system identifies 10,000 potential matches, this setting will limit the retrieval to a manageable subset of those documents. By controlling the document count, you can ensure more efficient handling and analysis of search results, avoiding overwhelming the system with excessive data.</td></tr><tr><td valign="top">RerankingEnabled</td><td>Read-write and overridable. Default is false.<br>Enables reranking of the vector search results through the <a href="/pages/JunwAWvfkr2kWmWTnwSF">IRerankingService</a>.</td></tr></tbody></table>

## Services

This tool relies on several services, many of which are pre-configured by default:&#x20;

* ITokenizerService
* IEmbeddingStorageService
* IEmbeddingGenerationService
* IRerankingService

Please ote that the default [`IEmbeddingStorageService`](/ai/components/api/services/iembeddingstorageservice.md) is set to [`FileSystemEmbeddingStorageService`](/ai/components/api/services/iembeddingstorageservice/wisej.ai.services.filesystemembeddingstorageservice.md), which is intended solely for testing purposes. Before deploying this tool in an application, it is recommended to choose an appropriate vector storage solution and register the correct service. The process is demonstrated below:

```csharp
Application.Services
    .AddOrReplaceService<IEmbeddingStorageService>(
        new PineconeEmbeddingStorageService(endpoint));
```

We recommend registering your own `IDocumentConversionService` and utilizing a professional library such as Aspose for document-to-text conversion. The built-in converter currently utilizes PdfPig and OpenXML, which are the same tools employed by Semantic Kernel. However, these tools have limitations and may not be capable of accurately processing complex documents containing tables and images.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.wisej.com/ai/components/built-in-smarttools/documentsearchtools.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
