# ITokenizerService

## Overview

The [ITokenizerService](/ai/components/api/services/itokenizerservice.md) provides a standardized interface for operations involving text tokens, such as counting, splitting, or truncating text based on tokenization.

This service is utilized internally whenever an adapter or tool needs to restrict the size of a block of text to a specific number of tokens or to count the tokens within a block of text. The built-in service is based on the [reimplementation ](https://github.com/tryAGI/Tiktoken)of OpenAI's [`tiktoken` ](https://github.com/openai/tiktoken)library, utilizing `o200K_base` as the default encoding model.

You have the option to either replace the service entirely or simply change the default model. Since the internal implementation leverages the public [`Wisej.AI.Helper.TextTokenizer`](/ai/components/api/helpers/wisej.ai.helpers.texttokenizer.md) class, you can also use this helper directly in your code. However, it's important to note that both the service and the helper are designed specifically for tokenizing and truncating, not for encoding and decoding. Therefore, if your application requires more advanced tokenization services, consider using the `tiktoken` library directly or the [Microsoft.ML.Tokenizers](https://learn.microsoft.com/en-us/dotnet/api/microsoft.ml.tokenizers.tokenizer) class.

The code below provides an example of how Wisej.AI utilizes the tokenization service within the document search tool to ensure that the returned context remains below the maximum allowed number of tokens.

```csharp
[SmartTool.Tool]
[Description("[DocumentSearchTools.search_documents]")]
protected virtual async Task<string> search_documents(

    [Description("[DocumentSearchTools.search_documents.question]")]
    string question)
{
    var query = await EmbedQuestionAsync(question);
    var documents = await this.EmbeddingStorageService.QueryAsync(
        _collectionName,
        query?.Vectors[0], 100, this.MinSimilarity, _filter);

    var count = 0;
    var sb = new StringBuilder();
    foreach (var document in documents)
    {
        sb.Append(
@$"
Name:'{document.Name}'
{document.Metadata.ToString()}
===
{document.GetChunks()[0]}
===
");
        if (++count >= 100)
            break;
    }

    return this.TokenizerService.TruncateContent(sb.ToString(), this.MaxContextTokens);
}
```

## Default Implementation

The default implementation of the `ITokenizerService` interface is provided by the [`DefaultTokenizerService`](/ai/components/api/services/itokenizerservice/wisej.ai.services.defaulttokenizerservice.md) class. This implementation utilizes the [`Wisej.AI.Helpers.TextTokenizer`](/ai/components/api/helpers/wisej.ai.helpers.texttokenizer.md) class to efficiently split text into tokens.&#x20;

The `TextTokenizer` class offers several embedded encoders within the Wisej.AI assembly, specifically: `p50k_base`, `o200k_base`, and `cl100k_base`. This functionality is built upon the tiktoken library developed by tryAGI, which is available under the MIT license.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.wisej.com/ai/components/built-in-services/itokenizerservice.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
