DocumentTools
Last updated
Last updated
The DocumentTools functionality is closely related to the , with the key difference being that DocumentTools processes a single document at a time. Unlike DocumentSearchTools, which relies on a vector database to manage document embeddings, DocumentTools embeds the document dynamically during processing and retains it in memory.
The preconfigured prompt is this:
The DocumentTools class provides multiple methods that the AI can use to "read" a specfic document, enhancing its flexibility and adaptability in processing and analyzing data.
When instantiating DocumentTools, you have the option to specify either the file path or the stream of the document you wish teh AI to "read". In addition, you can configure several other properties that influence the functionality and behavior of the tools.
FilePath
Read-write and overridable. Default is 10. Full path of the document to "read".
Stream
Read-write and overridable. Default is null. Stream of the document to "read". Alternative to FilePath.
FileType
Read-write and overridable. Default is null. File type of the document in the form of a file extension string: i.e. .pdf, docx, .txt, etc. If not set, the file type is deducted from either the file name.
TopN
Read-write and overridable. Default is 10. Maximum number of RAG content (document chunks) to retrieve.
MaxClusters
MinSimilarity
Read-write and overridable. Default is 0.25f. This setting defines the minimum similarity threshold used to filter and select qualified chunks from documents. By establishing this threshold, Wisej.AI can efficiently determine which segments of the document are closely aligned with the desired criteria, enhancing the accuracy and relevance of the selection process.
MaxContextTokens
Read-write and overridable. Default is 4096. This setting specifies the maximum number of tokens that can be returned to the AI within the Retrieval-Augmented Generation (RAG) context string. By limiting the token count, you ensure that the context provided to the AI remains concise and manageable.
RerankingEnabled
This tool relies on several services, many of which are pre-configured by default:
ITokenizerService
ITextSplitterService
IDocumentConversionService
IEmbeddingGenerationService
We recommend registering your own IDocumentConversionService
and utilizing a professional library such as Aspose for document-to-text conversion. The built-in converter currently utilizes PdfPig and OpenXML, which are the same tools employed by Semantic Kernel. However, these tools have limitations and may not be capable of accurately processing complex documents containing tables and images.
To enable the use of the simply add it to a SmartHub, SmartAdapter, SmartSession or SmartPrompt.
Read-write and overridable. Default is 5.
MaxClusters
determines the upper limit on the number of clusters that can be generated by the summarization function, which utilizes the algorithm
Read-write and overridable. Default is false. Enables reranking of the vector search results through the .