DocumentSearchTools

Wisej.AI.Tools.DocumentSearchTools

Namespace: Wisej.AI.Tools

Assembly: Wisej.AI (3.5.0.0)

Provides tools for searching and managing documents within a specified collection.

public class DocumentSearchTools : ToolsContainer

This class allows for embedding questions, querying documents, listing documents, and summarizing document content. It utilizes these services: ITokenizerService, IEmbeddingStorageService.

Constructors

DocumentSearchTools(collectionName, filter)

Initializes a new instance of the DocumentSearchTools class.

Name
Type
Description

collectionName

The name of the document collection. Default is an empty string.

filter

A predicate to filter embedded documents. Default is null.

Throws:

Properties

CollectionName

String: Gets or sets the name of the document collection. (Default: null)

EmbeddingGenerationService

IEmbeddingGenerationService: Gets or sets the embedding generation service used for embedding questions.

EmbeddingStorageService

IEmbeddingStorageService: Gets or sets the embedding storage service used for storing and retrieving embedded documents.

MaxClusters

Int32: Get or sets the maximum number of vector clusters to generate when performing summarization tasks. (Default: 5)

MaxContextTokens

Int32: Gets or sets the maximum number of context tokens. (Default: 4096)

MaxDocumentsSearch

Int32: Gets or sets the maximum number of documents that can be returned. (Default: 100)

MinSimilarity

Single: Gets or sets the minimum similarity threshold for document retrieval. (Default: 0.25)

RerankingEnabled

Boolean: Gets or sets a value indicating whether reranking is enabled. (Default: False)

RerankingService

IRerankingService: Gets or sets the reranking service.

TokenizerService

ITokenizerService: Gets or sets the tokenizer service used for truncating content to fit within the maximum context tokens.

TopN

Int32: Gets or sets the number of top chunks to retrieve. (Default: 10)

Methods

EmbedQuestionAsync(question)

Asynchronously generates an embedding for the specified question.

Parameter
Type
Description

question

The question to be embedded. Must not be null or empty.

Returns: Task<Embedding>. A task that represents the asynchronous operation. The task result contains the generated Embedding for the question, or null if the input is invalid.

This method checks if the provided question is null or empty and returns null if so. Otherwise, it delegates the embedding generation to the EmbeddingGenerationService.

list_all_documents()

Lists all documents in the collection.

Returns: Task<String>. A task that represents the asynchronous operation. The task result contains a list of document names as a string.

query_all_documents(question)

Queries all documents based on the provided question.

Parameter
Type
Description

question

The question to query documents for.

Returns: Task<String>. A task that represents the asynchronous operation. The task result contains the query results as a string.

This method retrieves all documents that match the embedded question.

query_single_document(document_name, question)

Queries a single document based on the provided document name and question.

Parameter
Type
Description

document_name

The name of the document to query.

question

The question to query the document for.

Returns: Task<String>. A task that represents the asynchronous operation. The task result contains the query result as a string.

read_documents_metadata(document_names)

Reads metadata for the specified documents.

Parameter
Type
Description

document_names

An array of document names to read metadata for.

Returns: Task<String>. A task that represents the asynchronous operation. The task result contains the metadata as a string.

RerankAsync(question, chunks)

Asynchronously reranks the provided text chunks based on their relevance to the given question.

Parameter
Type
Description

question

The question used as the basis for reranking.

chunks

An array of text chunks to be reranked.

Returns: Task<String[]>. A task that represents the asynchronous operation. The task result contains an array of reranked text chunks.

This method is intended to be overridden in derived classes to implement custom reranking logic. The method should return the chunks array reordered by relevance to the question .

summarize_document(document_name)

Summarizes the content of a specified document.

Parameter
Type
Description

document_name

The name of the document to summarize.

Returns: Task<String>. A task that represents the asynchronous operation. The task result contains the summary as a string.

Implements

Name
Description

Represents a container for tools, providing access to a hub, adapter, and a collection of parameters.

Last updated