LogoLogo
HomeNewsSupportVideos
  • Welcome
  • Wisej.NET
  • Concepts
    • Getting Started
    • General Concepts
    • Architecture
    • Extensibility
    • AI Providers
    • Vector Databases
    • Usage Metrics
    • Logging & Error Handling
  • Markup
  • Components
    • API
      • SmartAdapter
        • SmartAudioTTSAdapter
        • SmartAudioWhisperAdapter
        • SmartCalendarAdapter
        • SmartChartAdapter
        • SmartChartJS3Adapter
        • SmartChatBoxAdapter
        • SmartComboBoxAdapter
        • SmartCopilotAdapter
        • SmartDataEntryAdapter
        • SmartDocumentAdapter
        • SmartFullCalendarAdapter
        • SmartObjectAdapter
        • SmartPictureBoxAdapter
        • SmartQueryAdapter
        • SmartRealtimeAdapter
        • SmartReportAdapter
        • SmartTextBoxAdapter
        • SmartAdapter.ExtendsAttribute
        • SmartAdapter.FieldNameAttribute
        • SmartAdapter.FieldPromptAttribute
        • SmartAdapter.FieldRectangleAttribute
        • SmartAdapter.WorksWithAttribute
      • SmartEndpoint
        • AmazonBedrockEndpoint
        • AnthropicEndpoint
        • AzureAIEndpoint
        • CerebrasEndpoint
        • DeepSeekEndpoint
        • GoogleAIEndpoint
        • GroqCloudEndpoint
        • GroqCloudEndpointWhisper
        • HuggingFaceEndpoint
        • HuggingFaceJavaScriptEndpoint
        • LocalAIEndpoint
        • LocalAIEndpointImageGen
        • LocalAIEndpointTTS
        • LocalAIEndpointWhisper
        • NvidiaAIEndpoint
        • OllamaEndpoint
        • OpenAIEndpoint
        • OpenAIEndpointDallE
        • OpenAIEndpointRealtime
        • OpenAIEndpointTTS
        • OpenAIEndpointWhisper
        • SambaNovaEndpoint
        • SmartHttpEndpoint
        • TogetherAIEndpoint
        • XAIEndpoint
        • SmartEndpoint.Metrics
        • SmartEndpoint.Response
      • SmartExtensions
      • SmartHub
        • SmartSession.ConvertParameterEventArgs
        • SmartSession.ConvertParameterEventHandler
        • SmartSession.ErrorEventArgs
        • SmartSession.ErrorEventHandler
        • SmartSession.InvokeToolEventArgs
        • SmartSession.InvokeToolEventHandler
        • SmartSession.MessagesEventArgs
        • SmartSession.MessagesEventHandler
      • SmartObject
      • SmartPrompt
        • SmartAgentPrompt
        • SmartParallelPrompt
        • SmartPrompt.Parameter
        • SmartSession.ConvertParameterEventArgs
        • SmartSession.ConvertParameterEventHandler
        • SmartSession.ErrorEventArgs
        • SmartSession.ErrorEventHandler
        • SmartSession.InvokeToolEventArgs
        • SmartSession.InvokeToolEventHandler
        • SmartSession.MessagesEventArgs
        • SmartSession.MessagesEventHandler
      • SmartRealtimeSession
      • SmartSession
        • SmartSession.ConvertParameterEventArgs
        • SmartSession.ConvertParameterEventHandler
        • SmartSession.ErrorEventArgs
        • SmartSession.ErrorEventHandler
        • SmartSession.InvokeToolEventArgs
        • SmartSession.InvokeToolEventHandler
        • SmartSession.Message
        • SmartSession.MessageCollection
        • SmartSession.MessageRole
        • SmartSession.MessagesEventArgs
        • SmartSession.MessagesEventHandler
        • SmartSession.TrimmingStrategy
      • SmartTool
        • SmartTool.IToolProvider
        • SmartTool.ToolAttribute
        • SmartTool.ToolContext
      • Markup
        • MarkupExtensions
      • Controls
        • UVLightOverlay
      • Embeddings
        • EmbeddedDocument
        • Embedding
        • Matches
        • Metadata
      • Helpers
        • ApiKeys
        • Markdown
        • TextTokenizer
      • Services
        • DefaultSessionTrimmingService
        • IDocumentConversionService
          • DefaultDocumentConversionService
        • IEmbeddingGenerationService
          • DefaultEmbeddingGenerationService
          • HuggingFaceEmbeddingGenerationService
        • IEmbeddingStorageService
          • AzureAISearchEmbeddingStorageService
          • ChromaEmbeddingStorageService
          • FileSystemEmbeddingStorageService
          • MemoryEmbeddingStorageService
          • PineconeEmbeddingStorageService
          • QdrantEmbeddingStorageService
        • IHttpClientService
          • DefaultHttpClientService
        • ILoggerService
          • DefaultLoggerService
        • IOCRService
          • DefaultOCRService
        • IRerankingService
          • DefaultRerankingService
          • LocalAIRerankingService
          • PineconeRerankingService
        • ISessionTrimmingService
          • DefaultSessionTrimmingService
        • ITextSplitterService
          • RecursiveCharacterTextSplitterService
          • TextSplitterServiceBase
        • ITokenizerService
          • DefaultTokenizerService
        • IWebSearchService
          • BingWebSearchService
          • BraveWebSearchService
          • GoogleWebSearchService
      • Tools
        • ArxivTools
        • ChartJS3Tools
        • DatabaseTools
        • DataTableFilterTools
        • DocumentSearchTools
        • DocumentTools
        • FullCalendarTools
        • IToolsContainer
        • MathTools
        • ToolsContainer
        • UtilityTools
        • WebSearchTools
    • Built-in Services
      • IOCRService
      • ILoggerService
      • ITextSplitterService
      • ITokenizerService
      • IHttpClientService
      • IWebSearchService
      • IRerankingService
      • ISessionTrimmingService
      • IDocumentConversionService
      • IEmbeddingStorageService
      • IEmbeddingGenerationService
    • Built-in SmartTools
      • ToolsContainer
      • MathTools
      • UtilityTools
      • DatabaseTools
      • DocumentTools
      • DocumentSearchTools
      • WebSearchTools
      • ChartJS3Tools
      • FullCalendarTools
    • Built-in SmartAdapters
      • SmartAdapter
      • SmartAudioTTSAdapter
      • SmartAudioWhisperAdapter
      • SmartCalendarAdapter
      • SmartChartAdapter
      • SmartChartJS3Adapter
      • SmartChatBoxAdapter
      • SmartComboBoxAdapter
      • SmartCopilotAdapter
      • SmartDataEntryAdapter
      • SmartDocumentAdapter
      • SmartFullCalendarAdapter
      • SmartObjectAdapter
      • SmartPictureBoxAdapter
      • SmartQueryAdapter
      • SmartRealtimeAdapter
      • SmartReportAdapter
      • SmartTextBoxAdapter
    • Configure Services
    • Using SmartHub
    • Using SmartTools
    • Using SmartPrompt
    • Using SmartSession
    • Using SmartRealTimeAdapter
    • UVLightOverlay Control
Powered by GitBook
On this page
  • Overview
  • Default Implementation
Export as PDF
  1. Components
  2. Built-in Services

IDocumentConversionService

PreviousISessionTrimmingServiceNextIEmbeddingStorageService

Last updated 16 days ago

Overview

The is utilized by Wisej.AI components that require converting documents into text.

This service is integral for Wisej.AI components that need to process and analyze textual content extracted from non-text documents, enhancing the capabilities of Wisej.NET controls and third-party widgets by enabling them to handle and interpret a wider range of document formats.

Default Implementation

The default implementation is in . Supports formats such as PDF, HTML, and DOCX. It relies on open-source libraries, including PdfPig and OpenXML, the same libraries used by SemanticKernel. However, these libraries may not offer the same level of performance or features as commercial alternatives like .

Supported conversions:

Format
Description

PDF

The implementation uses to convert PDF documents into pages of text. For improved results, especially when dealing with documents that contain tables, it is recommended to use Aspose or other commercial libraries.

Returns an array of pages and fills the Metadata object from the metadata stored in the document with these values: "Pages", "Title", "Author", "Subject".

Currently, the built-in converter does not support images embedded in PDF documents.

HTML

Utilizes to read HTML content and reconstruct it using simplified tags to represent tables and paragraphs. This approach ensures a more streamlined and consistent representation of the original HTML structure.

Returns the entire content in the first string of the return array and reads the following metadata: "Title", "Author", "Subject", "Description".

Note that reading HTML content cannot execute JavaScript.

DOCX

Uses to read and extract text from the paragraphs of Word documents. It's important to note that Word documents do not inherently store page numbers or page breaks, as these are determined during the document's rendering process. If you need to convert a Word document by pages, it is advisable to use a commercial library such as Aspose, which can handle page-specific conversions.

Returns an array of paragraphs and fills the Metadata object from the metadata stored in the XML with the following values: "Title", "Author", "Subject", "Description".

Images, charts or any other embedded object are not supported.

XLSX

Uses to read and extract worksheets, rows and cells from the Excel document.

Returns an array of worksheets and fills the Metadata object from the metadata stored in the XML with the following values: "Title", "Author", "Subject", "Description".

Images, charts or any other embedded object are not supported.

TXT, CSV

Text and CSV content is read as-is and returned in the first string of the return array.

IDocumentConversionService
DefaultDocumentConversionService
Aspose
DefaultDocumentConversionService
PdfPig
HtmlAgilityPack
OpenXML
OpenXML