ITextSplitterService

Overview

The ITextSplitterService is employed by Wisej.AI components and tools to divide large blocks of text into smaller, manageable chunks that are suitable for vectorization. At present, this service is primarily utilized by the SmartHub.IngestDocument() methods.

Default Implementation

The default implementation of the ITextSplitterService is the RecursiveCharacterTextSplitterService class. This class is derived from the base TextSplitterServiceBase class and represents a slightly modified version of the implementation found in LangChain.

The RecursiveCharacterTextSplitter approach involves breaking down large blocks of text into smaller chunks by recursively splitting the text at specified character boundaries. This method ensures that the resulting chunks are manageable and suitable for vectorization.

A key feature of this approach is its ability to create overlapping text segments. This overlapping feature ensures that important contextual information is preserved across chunks, enhancing the accuracy and coherence of subsequent text processing tasks.

In the default implementation, the chunkSize is set to 1000 characters, and the chunkOverlap is set to 200 characters. The text is split using the separators "\n\n", "\n", and " ". These settings ensure efficient text division while maintaining context across chunks. However, you can easily modify these default settings to suit your specific needs or take full control of the service to customize its behavior entirely.

// Reduce chunk size to 800 and overlap to 100.
Application.Services
    .AddOrReplaceService<ITextSplitterService>(
        new RecursiveCharacterTextSplitterService(800, 100));

If you choose to implement the ITextSplitterService yourself, you have the option to derive your implementation from the TextSplitterServiceBase class and leverage its built-in services. This can simplify your development process by providing a foundation to build upon. Alternatively, you are free to implement any kind of text chunking that your application requires, allowing you to tailor the service to meet your specific needs and preferences.

Last updated