RecursiveCharacterTextSplitterService
Wisej.AI.Services.RecursiveCharacterTextSplitterService
Namespace: Wisej.AI.Services
Assembly: Wisej.AI (3.5.0.0)
A service for recursively splitting text into chunks based on specified separators and chunk size constraints. This service attempts to split text by different characters to find an optimal separation strategy.
public class RecursiveCharacterTextSplitterService : TextSplitterServiceBase
Constructors
RecursiveCharacterTextSplitterService(chunkSize, chunkOverlap, separators, lengthFunction)

Initializes a new instance of the RecursiveCharacterTextSplitterService class with specified separators, chunk size, overlap, and length function.
separators
The array of string separators to be used for splitting the text. Default is an array containing "\n\n", "\n", " ", and "".
lengthFunction
A function to determine the length of the text, which will be used to comply with the chunk size constraint.
Methods
Split(text)

Splits the given text into chunks using the defined separators and chunk size constraints.
Returns: String[]. An array of strings, where each string represents a chunk of the original text.
This method seeks to split the provided text based on the list of separators, starting with more significant separators and moving to less significant ones. If none of the separators are found, it will treat the text as a sequence of individual characters. The process ensures that each chunk does not exceed the specified chunk size. If a text segment is larger than the specified chunk size, it will be recursively split further. Usage example:
var separators = new[] { ",", " ", "\n" };
var splitterService = new RecursiveCharacterTextSplitterService(separators, 500, 100, text => text.Length);
string textToSplit = "This is a sample text that will be split into smaller chunks.";
string[] chunks = splitterService.Split(textToSplit);
foreach (var chunk in chunks)
{
Console.WriteLine(chunk);
}
Throws:
ArgumentNullException Thrown when the text is null.
Implements
Represents a service for splitting text into an array of substrings.
Last updated