DefaultDocumentConversionService
Wisej.AI.Services.DefaultDocumentConversionService
Last updated
Wisej.AI.Services.DefaultDocumentConversionService
Last updated
Namespace: Wisej.AI.Services
Assembly: Wisej.AI (3.5.0.0)
Provides functionality to convert documents from various formats into text representations.
This class implements the interface and provides methods to convert documents from streams into text arrays. Supported formats include PDF, DOCX, and HTML. The conversion process can also include metadata extraction if provided.
Initializes a new instance of .
Converts a document from a given stream into a specified file type.
stream
The stream containing the document to be converted. It must be readable and positioned at the start of the document.
fileType
The format of the input stream represented as a string. It should be the file type, i.e.: "pdf", "docx", "html", ...
Optional metadata related to the document conversion. Defaults to null.
Optional function to process each page or paragraph or section of the document being converted.
This method reads a document of type fileType from the specified stream and convert it to an array strings representing either line, paragraphs or pages, depending on the conversion implementation. If the metadata parameter is provided, this method will output additional information about the document, such as the title, author, subject, pages, etc. Example usage:
Ensure the stream is positioned at the beginning or the correct position for reading.
Supported file types should be verified with the service documentation.
Handle potential exceptions that may arise from invalid streams or unsupported file types.Throws:
Represents a service interface for converting documents from one format to another.
metadata
iterator
Returns: . A string representing the the converted document.
Thrown when stream or fileType is null.