pdftools_sdk.ocr.processor
Module Attributes
Event for warnings occurring during OCR processing |
Classes
Process PDF documents with OCR |
- pdftools_sdk.ocr.processor.WarningFunc
Event for warnings occurring during OCR processing
Non-critical issues during processing are reported via this event. It is recommended to review the
pdftools_sdk.ocr.warning_category.WarningCategoryand handle warnings if necessary for the application.- Parameters:
message (str) – The message describing the warning
category (pdftools_sdk.ocr.warning_category.WarningCategory) – The category of the warning
pageNo (int) – The page number this warning is associated to, or 0 if not page-specific
context (str) – A description of the context where the warning occurred
alias of
Callable[[str,WarningCategory,int,str],None]
- class pdftools_sdk.ocr.processor.Processor[source]
Bases:
_NativeObjectProcess PDF documents with OCR
The processor applies Optical Character Recognition (OCR) to PDF documents. It can make scanned documents searchable, fix text extraction issues and generate PDF tagging/structure.
The processor is decoupled from the document - it takes a
pdftools_sdk.pdf.document.Documentas input and produces a newpdftools_sdk.pdf.document.Documentas output.- process(document: Document, engine: Engine | None, out_stream: IOBase, options: OcrOptions | None = None, out_options: OutputOptions | None = None) Document[source]
Apply OCR to a PDF document
Process the input PDF document with OCR according to the specified options. The processed document is written to the output stream.
Non-critical processing issues raise a
pdftools_sdk.ocr.processor.WarningFunc(). It is recommended to review thepdftools_sdk.ocr.warning_category.WarningCategoryand handle them if necessary for the application.- Parameters:
document (pdftools_sdk.pdf.document.Document) – The input PDF document to process
engine (Optional[pdftools_sdk.ocr.engine.Engine]) – The OCR engine to use for recognition. This parameter may be None for operations that do not require OCR, such as
pdftools_sdk.ocr.image_processing_mode.ImageProcessingMode.REMOVETEXT. For all other modes, a valid engine must be provided.outStream (io.IOBase) – The stream to which the output PDF is written. The stream must support both random read and write access.
options (Optional[pdftools_sdk.ocr.ocr_options.OcrOptions]) – The OCR processing options. If None, default options are used.
outOptions (Optional[pdftools_sdk.pdf.output_options.OutputOptions]) – The PDF output options, e.g. to encrypt the output document.
- Returns:
The resulting output PDF which can be used as a new input for further processing.
Note that this object must be disposed before the output stream object (method argument outStream).
- Return type:
- Raises:
pdftools_sdk.license_error.LicenseError – The license check has failed.
OSError – Writing to the outStream failed.
pdftools_sdk.processing_error.ProcessingError – The document could not be processed.
ValueError – An OCR engine is required for the specified options but engine is None.
ValueError – The options specifies invalid or contradictory settings.
ValueError – The outOptions specifies document encryption for a PDF/A file, which is not allowed.
pdftools_sdk.generic_error.GenericError – An unexpected failure occurred.
pdftools_sdk.corrupt_error.CorruptError – An input image in the document is corrupt and cannot be read.
pdftools_sdk.password_error.PasswordError – The document is encrypted and the password is invalid.
pdftools_sdk.conformance_error.ConformanceError – The document has an invalid conformance level.
pdftools_sdk.unsupported_feature_error.UnsupportedFeatureError – The input PDF contains unrendered XFA form fields. See
pdftools_sdk.pdf.document.Document.xfafor more information.
- add_warning_handler(handler: Callable[[str, WarningCategory, int, str], None]) None[source]
Add handler for the
WarningFunc()event.- Parameters:
handler – Event handler. If a handler is added that is already registered, it is ignored.
- remove_warning_handler(handler: Callable[[str, WarningCategory, int, str], None]) None[source]
Remove registered handler of the
WarningFunc()event.- Parameters:
handler – Event handler that shall be removed. If a handler is not registered, it is ignored.