pdftools_sdk.extraction.text_options
Classes
Options for text extraction |
- class pdftools_sdk.extraction.text_options.TextOptions[source]
Bases:
_NativeObject
Options for text extraction
This class specifies the details of text extraction.
- property extraction_format: TextExtractionFormat
Format of the extracted text.
Specifies the format of the extracted text.
Default value:
pdftools_sdk.extraction.text_extraction_format.TextExtractionFormat.DOCUMENTORDER
- Returns:
pdftools_sdk.extraction.text_extraction_format.TextExtractionFormat
- property advance_width: float | None
The horizontal space in a PDF that corresponds to a character in monospaced text output.
If None, the horizontal space is 7.2pt.
Default value: None
- Returns:
Optional[float]
- property line_height: float | None
The vertical space in a PDF that triggers a new line in monospaced text output.
If None, no extra blank lines are added in the text output.
Default value: None
- Returns:
Optional[float]
- property word_separation_factor: float
This parameter defines a factor multiplied by the width of the space character to determine word boundaries. If the distance between two characters exceeds this calculated value, it is recognized as a word separation.
Default value: 0.3
- Returns:
float