pdftools_sdk.extraction.extractor
Classes
Allows for extracting page-wide content of a PDF. |
- class pdftools_sdk.extraction.extractor.Extractor[source]
Bases:
_NativeObject
Allows for extracting page-wide content of a PDF.
- extract_text(in_doc: Document, out_stream: IOBase, options: TextOptions | None = None, first_page: int | None = None, last_page: int | None = None) None [source]
Extract text from a PDF document
- Parameters:
inDoc (pdftools_sdk.pdf.document.Document) – The input PDF document.
outStream (io.IOBase) – The stream to which output file the extracted text is written.
options (Optional[pdftools_sdk.extraction.text_options.TextOptions]) – The option object that controls the text extraction.
firstPage (Optional[int]) –
Optional parameter denoting the index of the first page to be copied. This index is one-based. If set, the number must be in the range of 1 (first page) to
pdftools_sdk.pdf.document.Document.page_count
(last page).If not set, 1 is used.
lastPage (Optional[int]) –
Optional parameter denoting the index of the last page to be copied. This index is one-based. If set, the number must be in the range of 1 (first page) to
pdftools_sdk.pdf.document.Document.page_count
(last page).If not set,
pdftools_sdk.pdf.document.Document.page_count
is used.
- Raises:
pdftools_sdk.license_error.LicenseError – The license check has failed.
pdftools_sdk.processing_error.ProcessingError – The processing has failed.
OSError – Writing to the output text file has failed.
pdftools_sdk.generic_error.GenericError – A generic error occurred.
ValueError – The firstPage or lastPage are not in the allowed range.