pdftools_sdk.extraction.extractor

Classes

Extractor()

Allows for extracting page-wide content of a PDF.

class pdftools_sdk.extraction.extractor.Extractor[source]

Bases: _NativeObject

Allows for extracting page-wide content of a PDF.

__init__()[source]

extract_text(in_doc: Document, out_stream: IOBase, options: TextOptions | None = None, first_page: int | None = None, last_page: int | None = None) → None[source]

Extract text from a PDF document

Parameters:

inDoc (pdftools_sdk.pdf.document.Document) – The input PDF document.
outStream (io.IOBase) – The stream to which output file the extracted text is written.
options (Optional[pdftools_sdk.extraction.text_options.TextOptions]) – The option object that controls the text extraction.
firstPage (Optional[int]) –
Optional parameter denoting the index of the first page to be copied. This index is one-based. If set, the number must be in the range of 1 (first page) to pdftools_sdk.pdf.document.Document.page_count (last page).

If not set, 1 is used.
lastPage (Optional[int]) –
Optional parameter denoting the index of the last page to be copied. This index is one-based. If set, the number must be in the range of 1 (first page) to pdftools_sdk.pdf.document.Document.page_count (last page).

If not set, pdftools_sdk.pdf.document.Document.page_count is used.

Raises:

pdftools_sdk.license_error.LicenseError – The license check has failed.
pdftools_sdk.processing_error.ProcessingError – The processing has failed.
OSError – Writing to the output text file has failed.
pdftools_sdk.generic_error.GenericError – A generic error occurred.
ValueError – The firstPage or lastPage are not in the allowed range.