pdftools_sdk.ocr.ocr_options

Classes

OcrOptions()

The options for OCR processing

class pdftools_sdk.ocr.ocr_options.OcrOptions[source]

Bases: _NativeObject

The options for OCR processing

This class aggregates all OCR processing options including resolution settings, image processing, text processing and page processing.

__init__()[source]
property dpi: float

The default resolution in DPI used for OCR

Each page’s optimal OCR resolution is determined automatically, such that all images and text can be recognized. The default resolution is chosen if it is within the range of optimal resolutions.

The range should be within the resolutions supported by the OCR engine. Most OCR engines are optimized for resolutions around 300 DPI.

Default value: 300.0

Returns:

float

property min_dpi: float

The minimum resolution in DPI used for OCR

Default value: 200.0

Returns:

float

property max_dpi: float

The maximum resolution in DPI used for OCR

Default value: 400.0

Returns:

float

property process_embedded_files: bool

Whether to process embedded files recursively

If enabled, embedded PDF files are also processed with OCR. The default is to copy all embedded files as-is.

Default value: False

Returns:

bool

property image_options: ImageOptions

The options for image processing

Options controlling how images in the PDF are processed during OCR.

Returns:

pdftools_sdk.ocr.image_options.ImageOptions

property text_options: TextOptions

The options for text processing

Options controlling how existing text is processed during OCR.

Returns:

pdftools_sdk.ocr.text_options.TextOptions

property page_options: PageOptions

The options for page processing

Options controlling page-level OCR processing and tagging.

Returns:

pdftools_sdk.ocr.page_options.PageOptions