ISessionTrimmingService

Overview

The ISessionTrimmingService is invoked by the SmartSession when the total number of tokens in the messages being submitted to the AI exceeds the limit set by the SmartEndpoint.ContextWindow size. This service is responsible for trimming the history by removing or consolidating messages, thus enabling the session to continue interacting with the AI without significantly losing context.

Wisej.AI proactively invokes this service before the AI provider returns an error. It achieves this by comparing the total number of tokens in the last response, through the Usage property of the last messgae, message with the SmartEndpoint.ContextWindow property of the current endpoint.

Default Implementation

The default implementation of the ISessionTrimmingService interface is provided by the DefaultSessionTrimmingService class. This class employs two distinct strategies to manage session data: the RollingWindow strategy and the Summarization strategy.

When trimming is initiated, Wisej.AI calls the TrimAsync() method of the ISessionTrimmingService. This method receives two parameters: the SmartSession that requested the trimming and the collection of messages to be trimmed. By default, the service is set up to employ the RollingWindow trimming strategy, aiming to reduce the message collection by 50%.

Below is an example of how to modify the default parameters for the DefaultSessionTrimmingService. Since the default service is registered with a Shared lifetime, you can adjust the properties on the service singleton:

var service = Application.Services.GetService<ISessionTrimmingService>();

service.TrimmingPercentage = 0.3f;
service.TrimmingStrategy = SmartSession.TrimmingStrategy.Summarization;

If you wish to implement a different trimming system, you can register your own service implementation at any time. This allows you to utilize other AI libraries and adopt any strategy that suits your needs. It's crucial to trim the message collection rather than the messages associated with the session. This is because Wisej.AI internally clones the messages before using the AI model, enabling parallel usage of the session.

See Context Overflow for a description of the two trimming strategies.

Last updated