TextSplitterServiceBase
Wisej.AI.Services.TextSplitterServiceBase
Namespace: Wisej.AI.Services
Assembly: Wisej.AI (3.5.0.0)
Provides the base functionality for text splitting services, allowing subclasses to define custom splitting logic.
public class TextSplitterServiceBase : ITextSplitterService
This abstract class is designed to facilitate the splitting of text into chunks with a defined size and overlap. It is a foundational component for text processing tasks where segmentation of text is required, such as in natural language processing or data analysis. The class is initialized with parameters that determine how text is chunked and how overlaps between chunks are handled. The functionality allows for flexible customization by providing a function to calculate the length of the text, which can be overridden as needed.
Properties
ChunkOverlap

Int32: Gets the overlap size between chunks.
ChunkSize

Int32: Gets the defined chunk size used for splitting text.
Methods
JoinDocs(docs, separator)

Joins a list of document strings using the specified separator and returns null if the resulting string is empty.
Returns: String. The joined string or null if the result is empty.
This method is useful for concatenating text fragments into a single string. It ensures that if the result is an empty string, it returns null instead, which can be useful for avoiding unnecessary processing of empty outputs.
MergeSplits(splits, separator)

Merges a list of text splits into larger chunks of the specified chunk size, ensuring overlap is maintained as needed.
Returns: String[]. An array of merged text chunks.
This method processes a list of text splits and combines them into larger chunks while respecting the defined chunk size and overlap. It handles character replacements to standardize the text format and ensures that no chunk exceeds the defined size without appropriate logging mechanisms, which are to be implemented. Usage example:
TextSplitterServiceBase splitter = new YourTextSplitterService(100, 10, str => str.Length);
string[] mergedChunks = splitter.MergeSplits(new List<string> { "text1", "text2" }, " ");
Throws:
ArgumentNullException Thrown when splits is null.
Split(text)

Splits the given text into an array of strings based on the implemented logic in derived classes.
Returns: String[]. An array of strings representing the split text chunks.
Inherited By
A service for recursively splitting text into chunks based on specified separators and chunk size constraints. This service attempts to split text by different characters to find an optimal separation strategy.
Implements
Represents a service for splitting text into an array of substrings.
Last updated