Pdftools SDK
|
Go to the source code of this file.
Macros | |
#define | PDFTOOLS_CALL |
#define PDFTOOLS_CALL |
PDFTOOLS_EXPORT BOOL PDFTOOLS_CALL PdfToolsExtraction_Extractor_ExtractText | ( | TPdfToolsExtraction_Extractor * | pExtractor, |
TPdfToolsPdf_Document * | pInDoc, | ||
const TPdfToolsSys_StreamDescriptor * | pOutStreamDesc, | ||
TPdfToolsExtraction_TextOptions * | pOptions, | ||
const int * | pFirstPage, | ||
const int * | pLastPage ) |
Extract text from a PDF document.
[in,out] | pExtractor | Acts as a handle to the native object of type TPdfToolsExtraction_Extractor. |
[in,out] | pInDoc | The input PDF document. |
[in,out] | pOutStreamDesc | The stream to which output file the extracted text is written. |
[in,out] | pOptions | The option object that controls the text extraction. |
[in] | pFirstPage | Optional parameter denoting the index of the first page to be copied. This index is one-based. If set, the number must be in the range of 1 (first page) to PdfToolsPdf_Document_GetPageCount (last page). If not set, 1 is used. |
[in] | pLastPage | Optional parameter denoting the index of the last page to be copied. This index is one-based. If set, the number must be in the range of 1 (first page) to PdfToolsPdf_Document_GetPageCount (last page). If not set, PdfToolsPdf_Document_GetPageCount is used. |
PDFTOOLS_EXPORT TPdfToolsExtraction_Extractor *PDFTOOLS_CALL PdfToolsExtraction_Extractor_New | ( | void | ) |
NULL
if there is an error.
NULL
was returned. Retrieve specific error code by calling PdfTools_GetLastError. Get the error message with PdfTools_GetLastErrorMessage. PDFTOOLS_EXPORT BOOL PDFTOOLS_CALL PdfToolsExtraction_TextOptions_GetAdvanceWidth | ( | TPdfToolsExtraction_TextOptions * | pTextOptions, |
double * | pAdvanceWidth ) |
The horizontal space in a PDF that corresponds to a character in monospaced text output.
If NULL
, the horizontal space is 7.2pt.
Default value: NULL
[in,out] | pTextOptions | Acts as a handle to the native object of type TPdfToolsExtraction_TextOptions. |
[out] | pAdvanceWidth | Retrieved value. |
[out]
argument returns NULL
. To determine if an error has occurred, check the error code as described in the note section below. PDFTOOLS_EXPORT TPdfToolsExtraction_TextExtractionFormat PDFTOOLS_CALL PdfToolsExtraction_TextOptions_GetExtractionFormat | ( | TPdfToolsExtraction_TextOptions * | pTextOptions | ) |
Format of the extracted text.
Specifies the format of the extracted text.
Default value: ePdfToolsExtraction_TextExtractionFormat_DocumentOrder
[in,out] | pTextOptions | Acts as a handle to the native object of type TPdfToolsExtraction_TextOptions. |
May indicate an error in certain scenarios. For further information see the note section below.
0
was returned. Retrieve specific error code by calling PdfTools_GetLastError. Get the error message with PdfTools_GetLastErrorMessage. PDFTOOLS_EXPORT BOOL PDFTOOLS_CALL PdfToolsExtraction_TextOptions_GetLineHeight | ( | TPdfToolsExtraction_TextOptions * | pTextOptions, |
double * | pLineHeight ) |
The vertical space in a PDF that triggers a new line in monospaced text output.
If NULL
, no extra blank lines are added in the text output.
Default value: NULL
[in,out] | pTextOptions | Acts as a handle to the native object of type TPdfToolsExtraction_TextOptions. |
[out] | pLineHeight | Retrieved value. |
[out]
argument returns NULL
. To determine if an error has occurred, check the error code as described in the note section below. PDFTOOLS_EXPORT double PDFTOOLS_CALL PdfToolsExtraction_TextOptions_GetWordSeparationFactor | ( | TPdfToolsExtraction_TextOptions * | pTextOptions | ) |
This parameter defines a factor multiplied by the width of the space character to determine word boundaries. If the distance between two characters exceeds this calculated value, it is recognized as a word separation.
Default value: 0.3
[in,out] | pTextOptions | Acts as a handle to the native object of type TPdfToolsExtraction_TextOptions. |
May indicate an error in certain scenarios. For further information see the note section below.
-1.0
was returned. Retrieve specific error code by calling PdfTools_GetLastError. Get the error message with PdfTools_GetLastErrorMessage. Possible error codes:PDFTOOLS_EXPORT TPdfToolsExtraction_TextOptions *PDFTOOLS_CALL PdfToolsExtraction_TextOptions_New | ( | void | ) |
NULL
if there is an error.
NULL
was returned. Retrieve specific error code by calling PdfTools_GetLastError. Get the error message with PdfTools_GetLastErrorMessage. PDFTOOLS_EXPORT BOOL PDFTOOLS_CALL PdfToolsExtraction_TextOptions_SetAdvanceWidth | ( | TPdfToolsExtraction_TextOptions * | pTextOptions, |
const double * | pAdvanceWidth ) |
The horizontal space in a PDF that corresponds to a character in monospaced text output.
If NULL
, the horizontal space is 7.2pt.
Default value: NULL
[in,out] | pTextOptions | Acts as a handle to the native object of type TPdfToolsExtraction_TextOptions. |
[in] | pAdvanceWidth | Set value. |
PDFTOOLS_EXPORT BOOL PDFTOOLS_CALL PdfToolsExtraction_TextOptions_SetExtractionFormat | ( | TPdfToolsExtraction_TextOptions * | pTextOptions, |
TPdfToolsExtraction_TextExtractionFormat | iExtractionFormat ) |
Format of the extracted text.
Specifies the format of the extracted text.
Default value: ePdfToolsExtraction_TextExtractionFormat_DocumentOrder
[in,out] | pTextOptions | Acts as a handle to the native object of type TPdfToolsExtraction_TextOptions. |
[in] | iExtractionFormat | Set value. |
PDFTOOLS_EXPORT BOOL PDFTOOLS_CALL PdfToolsExtraction_TextOptions_SetLineHeight | ( | TPdfToolsExtraction_TextOptions * | pTextOptions, |
const double * | pLineHeight ) |
The vertical space in a PDF that triggers a new line in monospaced text output.
If NULL
, no extra blank lines are added in the text output.
Default value: NULL
[in,out] | pTextOptions | Acts as a handle to the native object of type TPdfToolsExtraction_TextOptions. |
[in] | pLineHeight | Set value. |
PDFTOOLS_EXPORT BOOL PDFTOOLS_CALL PdfToolsExtraction_TextOptions_SetWordSeparationFactor | ( | TPdfToolsExtraction_TextOptions * | pTextOptions, |
double | dWordSeparationFactor ) |
This parameter defines a factor multiplied by the width of the space character to determine word boundaries. If the distance between two characters exceeds this calculated value, it is recognized as a word separation.
Default value: 0.3
[in,out] | pTextOptions | Acts as a handle to the native object of type TPdfToolsExtraction_TextOptions. |
[in] | dWordSeparationFactor | Set value. |