Class Processor
- java.lang.Object
-
- com.pdftools.internal.NativeBase
-
- com.pdftools.internal.NativeObject
-
- com.pdftools.ocr.Processor
-
public class Processor extends NativeObject
Process PDF documents with OCR
The processor applies Optical Character Recognition (OCR) to PDF documents. It can make scanned documents searchable, fix text extraction issues and generate PDF tagging/structure.
The processor is decoupled from the document - it takes a
pdftools.pdf.Documentas input and produces a newpdftools.pdf.Documentas output.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classProcessor.WarningEvent for warnings occurring during OCR processingstatic interfaceProcessor.WarningListenerListener interface for theProcessor.Warningevent.
-
Constructor Summary
Constructors Constructor Description Processor()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidaddWarningListener(Processor.WarningListener listener)Add a listener for theProcessor.Warningevent.Documentprocess(Document document, Engine engine, Stream outStream)Apply OCR to a PDF documentDocumentprocess(Document document, Engine engine, Stream outStream, OcrOptions options)Apply OCR to a PDF documentDocumentprocess(Document document, Engine engine, Stream outStream, OcrOptions options, OutputOptions outOptions)Apply OCR to a PDF documentvoidremoveWarningListener(Processor.WarningListener listener)Remove registered listener for theProcessor.Warningevent.-
Methods inherited from class com.pdftools.internal.NativeObject
equals, hashCode
-
-
-
-
Method Detail
-
addWarningListener
public void addWarningListener(Processor.WarningListener listener)
Add a listener for theProcessor.Warningevent.- Parameters:
listener- Listener for theProcessor.Warningevent. If a listener is added that is already registered, it is ignored.
-
removeWarningListener
public void removeWarningListener(Processor.WarningListener listener)
Remove registered listener for theProcessor.Warningevent.- Parameters:
listener- Listener for theProcessor.Warningevent that should be removed. If the listener is not registered, it is ignored.
-
process
public Document process(Document document, Engine engine, Stream outStream) throws java.io.IOException, GenericException, LicenseException, CorruptException, PasswordException, ConformanceException, UnsupportedFeatureException, ProcessingException
Apply OCR to a PDF document
Process the input PDF document with OCR according to the specified options. The processed document is written to the output stream.
Non-critical processing issues raise a
Processor.WarningListener. It is recommended to review theWarningCategoryand handle them if necessary for the application.- Parameters:
document- The input PDF document to processengine- The OCR engine to use for recognition. This parameter may benullfor operations that do not require OCR, such asImageProcessingMode.REMOVE_TEXT. For all other modes, a valid engine must be provided.outStream- The stream to which the output PDF is written. The stream must support both random read and write access.- Returns:
The resulting output PDF which can be used as a new input for further processing.
Note that this object must be disposed before the output stream object (method argument
outStream).- Throws:
LicenseException- The license check has failed.java.io.IOException- Writing to theoutStreamfailed.ProcessingException- The document could not be processed.java.lang.IllegalArgumentException- An OCR engine is required for the specified options butengineisnull.java.lang.IllegalArgumentException- Theoptionsspecifies invalid or contradictory settings.java.lang.IllegalArgumentException- TheoutOptionsspecifies document encryption for a PDF/A file, which is not allowed.GenericException- An unexpected failure occurred.CorruptException- An input image in the document is corrupt and cannot be read.PasswordException- The document is encrypted and the password is invalid.ConformanceException- The document has an invalid conformance level.UnsupportedFeatureException- The input PDF contains unrendered XFA form fields. Seepdftools.pdf.Document.getXfafor more information.java.lang.IllegalArgumentException- ifdocumentisnulljava.lang.IllegalArgumentException- ifoutStreamisnull
-
process
public Document process(Document document, Engine engine, Stream outStream, OcrOptions options) throws java.io.IOException, GenericException, LicenseException, CorruptException, PasswordException, ConformanceException, UnsupportedFeatureException, ProcessingException
Apply OCR to a PDF document
Process the input PDF document with OCR according to the specified options. The processed document is written to the output stream.
Non-critical processing issues raise a
Processor.WarningListener. It is recommended to review theWarningCategoryand handle them if necessary for the application.- Parameters:
document- The input PDF document to processengine- The OCR engine to use for recognition. This parameter may benullfor operations that do not require OCR, such asImageProcessingMode.REMOVE_TEXT. For all other modes, a valid engine must be provided.outStream- The stream to which the output PDF is written. The stream must support both random read and write access.options- The OCR processing options. Ifnull, default options are used.- Returns:
The resulting output PDF which can be used as a new input for further processing.
Note that this object must be disposed before the output stream object (method argument
outStream).- Throws:
LicenseException- The license check has failed.java.io.IOException- Writing to theoutStreamfailed.ProcessingException- The document could not be processed.java.lang.IllegalArgumentException- An OCR engine is required for the specified options butengineisnull.java.lang.IllegalArgumentException- Theoptionsspecifies invalid or contradictory settings.java.lang.IllegalArgumentException- TheoutOptionsspecifies document encryption for a PDF/A file, which is not allowed.GenericException- An unexpected failure occurred.CorruptException- An input image in the document is corrupt and cannot be read.PasswordException- The document is encrypted and the password is invalid.ConformanceException- The document has an invalid conformance level.UnsupportedFeatureException- The input PDF contains unrendered XFA form fields. Seepdftools.pdf.Document.getXfafor more information.java.lang.IllegalArgumentException- ifdocumentisnulljava.lang.IllegalArgumentException- ifoutStreamisnull
-
process
public Document process(Document document, Engine engine, Stream outStream, OcrOptions options, OutputOptions outOptions) throws java.io.IOException, GenericException, LicenseException, CorruptException, PasswordException, ConformanceException, UnsupportedFeatureException, ProcessingException
Apply OCR to a PDF document
Process the input PDF document with OCR according to the specified options. The processed document is written to the output stream.
Non-critical processing issues raise a
Processor.WarningListener. It is recommended to review theWarningCategoryand handle them if necessary for the application.- Parameters:
document- The input PDF document to processengine- The OCR engine to use for recognition. This parameter may benullfor operations that do not require OCR, such asImageProcessingMode.REMOVE_TEXT. For all other modes, a valid engine must be provided.outStream- The stream to which the output PDF is written. The stream must support both random read and write access.options- The OCR processing options. Ifnull, default options are used.outOptions- The PDF output options, e.g. to encrypt the output document.- Returns:
The resulting output PDF which can be used as a new input for further processing.
Note that this object must be disposed before the output stream object (method argument
outStream).- Throws:
LicenseException- The license check has failed.java.io.IOException- Writing to theoutStreamfailed.ProcessingException- The document could not be processed.java.lang.IllegalArgumentException- An OCR engine is required for the specified options butengineisnull.java.lang.IllegalArgumentException- Theoptionsspecifies invalid or contradictory settings.java.lang.IllegalArgumentException- TheoutOptionsspecifies document encryption for a PDF/A file, which is not allowed.GenericException- An unexpected failure occurred.CorruptException- An input image in the document is corrupt and cannot be read.PasswordException- The document is encrypted and the password is invalid.ConformanceException- The document has an invalid conformance level.UnsupportedFeatureException- The input PDF contains unrendered XFA form fields. Seepdftools.pdf.Document.getXfafor more information.java.lang.IllegalArgumentException- ifdocumentisnulljava.lang.IllegalArgumentException- ifoutStreamisnull
-
-