pdftools_sdk.extraction.text_options

Classes

TextOptions()

Options for text extraction

class pdftools_sdk.extraction.text_options.TextOptions[source]

Bases: _NativeObject

Options for text extraction

This class specifies the details of text extraction.

__init__()[source]
property extraction_format: TextExtractionFormat

Format of the extracted text.

Specifies the format of the extracted text.

Default value: pdftools_sdk.extraction.text_extraction_format.TextExtractionFormat.DOCUMENTORDER

Returns:

pdftools_sdk.extraction.text_extraction_format.TextExtractionFormat

property advance_width: float | None

The horizontal space in a PDF that corresponds to a character in monospaced text output.

If None, the horizontal space is 7.2pt.

Default value: None

Returns:

Optional[float]

property line_height: float | None

The vertical space in a PDF that triggers a new line in monospaced text output.

If None, no extra blank lines are added in the text output.

Default value: None

Returns:

Optional[float]

property word_separation_factor: float

This parameter defines a factor multiplied by the width of the space character to determine word boundaries. If the distance between two characters exceeds this calculated value, it is recognized as a word separation.

Default value: 0.3

Returns:

float