TextTokenizer
Wisej.AI.Helpers.TextTokenizer
Namespace: Wisej.AI.Helpers
Assembly: Wisej.AI (3.5.0.0)
Provides methods for tokenizing text content using a specified encoder.
public class TextTokenizer
Supports the following models:
o200K_base Used by gpt-4o and gpt-4o-mini.
cl100k_base Used by gpt-4-turbo, gpt-4, gpt-3.5-turbo, text-embedding-ada-002, text-embedding-3-small, text-embedding-3-large
p50k_base Used by text-davinci-002, text-davinci-003.
Methods
CountTokens(content, name)

Counts the number of tokens in the specified text content using the given encoding model.
Returns: Int32. The number of tokens in the content.
This method calculates the total number of tokens present in the content based on the specified encoding model.
int tokenCount = TextTokenizer.CountTokens("Sample text");
Console.WriteLine($"Number of tokens: {tokenCount}");
Throws:
ArgumentNullException Thrown when the content is null.
Tokenize(content, name)

Tokenizes the specified text content using the given encoder.
Returns: IReadOnlyCollection<String>. A read-only collection of tokens extracted from the content.
This method uses the specified encoding model to break down the content into individual tokens.
var tokens = TextTokenizer.Tokenize("Sample text");
foreach (var token in tokens)
{
Console.WriteLine(token);
}
Throws:
ArgumentNullException Thrown when the content is null.
Last updated