TextSplitterServiceBase
Wisej.AI.Services.TextSplitterServiceBase
Last updated
Wisej.AI.Services.TextSplitterServiceBase
Last updated
Namespace: Wisej.AI.Services
Assembly: Wisej.AI (3.5.0.0)
Provides the base functionality for text splitting services, allowing subclasses to define custom splitting logic.
This abstract class is designed to facilitate the splitting of text into chunks with a defined size and overlap. It is a foundational component for text processing tasks where segmentation of text is required, such as in natural language processing or data analysis. The class is initialized with parameters that determine how text is chunked and how overlaps between chunks are handled. The functionality allows for flexible customization by providing a function to calculate the length of the text, which can be overridden as needed.
: Gets the overlap size between chunks.
: Gets the defined chunk size used for splitting text.
Joins a list of document strings using the specified separator and returns null if the resulting string is empty.
docs
A list of document strings to be joined.
separator
The separator to be used between documents when joining them.
This method is useful for concatenating text fragments into a single string. It ensures that if the result is an empty string, it returns null instead, which can be useful for avoiding unnecessary processing of empty outputs.
Merges a list of text splits into larger chunks of the specified chunk size, ensuring overlap is maintained as needed.
splits
The text segments to be merged into larger chunks.
separator
The separator to use when joining segments.
This method processes a list of text splits and combines them into larger chunks while respecting the defined chunk size and overlap. It handles character replacements to standardize the text format and ensures that no chunk exceeds the defined size without appropriate logging mechanisms, which are to be implemented. Usage example:
Throws:
Splits the given text into an array of strings based on the implemented logic in derived classes.
text
The text to be split into chunks.
A service for recursively splitting text into chunks based on specified separators and chunk size constraints. This service attempts to split text by different characters to find an optimal separation strategy.
Represents a service for splitting text into an array of substrings.
Returns: . The joined string or null if the result is empty.
Returns: . An array of merged text chunks.
Thrown when splits is null.
Returns: . An array of strings representing the split text chunks.