pdftools_sdk.ocr.ocr_options

Classes

OcrOptions()

The options for OCR processing

class pdftools_sdk.ocr.ocr_options.OcrOptions[source]

Bases: _NativeObject

The options for OCR processing

This class aggregates all OCR processing options including resolution settings, image processing, text processing and page processing.

__init__()[source]

property dpi: float

The default resolution in DPI used for OCR

Each page’s optimal OCR resolution is determined automatically, such that all images and text can be recognized. The default resolution is chosen if it is within the range of optimal resolutions.

The range should be within the resolutions supported by the OCR engine. Most OCR engines are optimized for resolutions around 300 DPI.

Default value: 300.0

Returns:: float

property min_dpi: float

The minimum resolution in DPI used for OCR

Default value: 200.0

Returns:: float

property max_dpi: float

The maximum resolution in DPI used for OCR

Default value: 400.0

Returns:: float

property process_embedded_files: bool

Whether to process embedded files recursively

If enabled, embedded PDF files are also processed with OCR. The default is to copy all embedded files as-is.

Default value: False

Returns:: bool

property image_options: ImageOptions

The options for image processing

Options controlling how images in the PDF are processed during OCR.

Returns:: pdftools_sdk.ocr.image_options.ImageOptions

property text_options: TextOptions

The options for text processing

Options controlling how existing text is processed during OCR.

Returns:: pdftools_sdk.ocr.text_options.TextOptions

property page_options: PageOptions

The options for page processing

Options controlling page-level OCR processing and tagging.

Returns:: pdftools_sdk.ocr.page_options.PageOptions