Class Extractor


  • public class Extractor
    extends NativeObject
    Allows for extracting page-wide content of a PDF.
    • Constructor Detail

      • Extractor

        public Extractor()
    • Method Detail

      • extractText

        public void extractText​(Document inDoc,
                                Stream outStream)
                         throws java.io.IOException,
                                GenericException,
                                LicenseException,
                                ProcessingException

        Extract text from a PDF document

        Parameters:
        inDoc - The input PDF document.
        outStream - The stream to which output file the extracted text is written.
        Throws:
        LicenseException - The license check has failed.
        ProcessingException - The processing has failed.
        java.io.IOException - Writing to the output text file has failed.
        GenericException - A generic error occurred.
        java.lang.IllegalArgumentException - The firstPage or lastPage are not in the allowed range.
        java.lang.IllegalArgumentException - if inDoc is null
        java.lang.IllegalArgumentException - if outStream is null
      • extractText

        public void extractText​(Document inDoc,
                                Stream outStream,
                                TextOptions options)
                         throws java.io.IOException,
                                GenericException,
                                LicenseException,
                                ProcessingException

        Extract text from a PDF document

        Parameters:
        inDoc - The input PDF document.
        outStream - The stream to which output file the extracted text is written.
        options - The option object that controls the text extraction.
        Throws:
        LicenseException - The license check has failed.
        ProcessingException - The processing has failed.
        java.io.IOException - Writing to the output text file has failed.
        GenericException - A generic error occurred.
        java.lang.IllegalArgumentException - The firstPage or lastPage are not in the allowed range.
        java.lang.IllegalArgumentException - if inDoc is null
        java.lang.IllegalArgumentException - if outStream is null
      • extractText

        public void extractText​(Document inDoc,
                                Stream outStream,
                                TextOptions options,
                                java.lang.Integer firstPage)
                         throws java.io.IOException,
                                GenericException,
                                LicenseException,
                                ProcessingException

        Extract text from a PDF document

        Parameters:
        inDoc - The input PDF document.
        outStream - The stream to which output file the extracted text is written.
        options - The option object that controls the text extraction.
        firstPage -

        Optional parameter denoting the index of the first page to be copied. This index is one-based. If set, the number must be in the range of 1 (first page) to pdftools.pdf.Document.getPageCount (last page).

        If not set, 1 is used.

        Throws:
        LicenseException - The license check has failed.
        ProcessingException - The processing has failed.
        java.io.IOException - Writing to the output text file has failed.
        GenericException - A generic error occurred.
        java.lang.IllegalArgumentException - The firstPage or lastPage are not in the allowed range.
        java.lang.IllegalArgumentException - if inDoc is null
        java.lang.IllegalArgumentException - if outStream is null
      • extractText

        public void extractText​(Document inDoc,
                                Stream outStream,
                                TextOptions options,
                                java.lang.Integer firstPage,
                                java.lang.Integer lastPage)
                         throws java.io.IOException,
                                GenericException,
                                LicenseException,
                                ProcessingException

        Extract text from a PDF document

        Parameters:
        inDoc - The input PDF document.
        outStream - The stream to which output file the extracted text is written.
        options - The option object that controls the text extraction.
        firstPage -

        Optional parameter denoting the index of the first page to be copied. This index is one-based. If set, the number must be in the range of 1 (first page) to pdftools.pdf.Document.getPageCount (last page).

        If not set, 1 is used.

        lastPage -

        Optional parameter denoting the index of the last page to be copied. This index is one-based. If set, the number must be in the range of 1 (first page) to pdftools.pdf.Document.getPageCount (last page).

        If not set, pdftools.pdf.Document.getPageCount is used.

        Throws:
        LicenseException - The license check has failed.
        ProcessingException - The processing has failed.
        java.io.IOException - Writing to the output text file has failed.
        GenericException - A generic error occurred.
        java.lang.IllegalArgumentException - The firstPage or lastPage are not in the allowed range.
        java.lang.IllegalArgumentException - if inDoc is null
        java.lang.IllegalArgumentException - if outStream is null