WebSearchTools
Last updated
Last updated
WebSearchTools
class offers a set of functions to search and retrieve content from the web. It utilizes the to interact with various web search providers.
It can download HTML pages as well as any document format supported by the current . Additionally, it handles redirects and deflated (compressed) responses.
Please note that executing JavaScript and loading images on web pages is not supported.
The search functionality is achieved by offering two methods to the AI. The search
method executes a search query and returns relevant URLs along with page content snippets. The read
method extracts content from the list of URLs selected by the AI.
When the tool downloads content, it does not employ vectorization to condense the text into relevant segments. Instead, it returns the full text content, truncated to the MaxContextToken
property value (default is 4096 tokens). This limit applies to each page being read. Therefore, if the AI uses the tool to read multiple pages and the MaxContextToken
is set too high, you may exceed the ContextWindow
of the current endpoint.
To enable the use of simply add it to a SmartHub, SmartAdapter, or SmartPrompt.
You can add WebSearchTools to any adapter or prompt to enhance their core functionality in conjunction with other tools. For instance, when paired with the , it empowers the AI to fill in missing data by conducting web searches.
MaxSites
Read-write and overridable. Default is 5.
The MaxSites
property limits the number of sites returned by the search engine.
MaxContextTokens
Read-write and overridable. Default is 4096.
The MaxContextTokens
property is responsible for truncating the content of a single web page when it is converted to text. This truncation is necessary to create the Retrieval-Augmented Generation (RAG) string efficiently.
UserAgent
Read-write and overridable. Default is Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36
.
The tool is designed to mimic a desktop browser when accessing websites to prevent being blocked or throttled.
Timeout
Read-write and overridable. Default is 30000.
The Timeout
property specifies the maximum duration, in milliseconds, that the system will wait to read the content of a web page before aborting the request.
There is no default IWebSearchService
installed. To use the WebSearchTools
object, you must register a web search service. The currently available built-in services are: BingWebSearchService
, GoogleWebSearchService
, and BraveWebSearchService
. Each requires an API key, and for GoogleWebSearchService
, an engine ID is also needed (please refer to Google's documentation for more details).
Here is how to register the BingWebSearchService
during startup:
The tool also utilizes the IDocumentConversionService
to convert HTML content into plain text. The default implementation provided by Wisej.AI leverages the HtmlAgilityPack, a library designed for parsing HTML content. It processes HTML by breaking it down and subsequently reassembling it into a coherent and usable text format.
If you prefer to specify the API key directly in the code, instead of using the file or environment variables, use the following code:
Refer to for more information on how to implement and use a custom web search provider.