TextTokenizer
Wisej.AI.Helpers.TextTokenizer
Last updated
Wisej.AI.Helpers.TextTokenizer
Last updated
Namespace: Wisej.AI.Helpers
Assembly: Wisej.AI (3.5.0.0)
Provides methods for tokenizing text content using a specified encoder.
Supports the following models:
o200K_base Used by gpt-4o and gpt-4o-mini.
cl100k_base Used by gpt-4-turbo, gpt-4, gpt-3.5-turbo, text-embedding-ada-002, text-embedding-3-small, text-embedding-3-large
p50k_base Used by text-davinci-002, text-davinci-003.
Counts the number of tokens in the specified text content using the given encoding model.
content
The text content to analyze.
The name of the encoder to use for counting tokens. Default is "o200K_base".
Returns: . The number of tokens in the content.
This method calculates the total number of tokens present in the content based on the specified encoding model.
Throws:
Tokenizes the specified text content using the given encoder.
content
The text content to tokenize.
The name of the encoder to use for tokenization. Default is "o200K_base".
This method uses the specified encoding model to break down the content into individual tokens.
Throws:
name
Thrown when the content is null.
name
Returns: . A read-only collection of tokens extracted from the content.
Thrown when the content is null.