GdPictureTextExtraction Class Members
In This Topic
The following tables list the members exposed by GdPictureTextExtraction.
Public Constructors
| Name | Description |
 | GdPictureTextExtraction Constructor |
GdPictureTextExtraction is a streamlined class designed to effortlessly convert any GdPicture technology-supported document into plain text.
It provides a range of capabilities that allow for addressing various scenarios, including indexing and enhancing the performance of LLM inferences.
It employs internal logic to optimize extraction accuracy and minimize processing time through the dynamic utilization of page layout analysis, encoding detection, and OCR components.
The identical API serves for processing raster images, PDFs, CAD files, Email files, and office formats alike.
Documents can be loaded from file paths, Stream objects, or distant URIs.
|
Top
Public Properties
| Name | Description |
 | Dictionary |
Specifies the dictionary to be used during the optional OCR process.
|
 | EnableKeyValuePairsExtraction |
Specifies whether key value pairs extraction is enabled.
|
 | EnableOCR |
Specifies whether OCR is enabled.
|
 | EnableOrientationDetection |
Specifies whether document orientation detection is activated.
|
 | EnableTablesExtraction |
Specifies whether tables extraction is enabled.
|
 | PageRange |
Use this property before the loading step to specify a range of pages that will be subsequently processed.
This allows for speeding up the loading process.
|
 | ParagraphSeparator |
This property specifies the separator to be utilized for splitting paragraphs.
It takes effect solely when the PreserveParagraphs property is set to true.
|
 | PreserveParagraphs |
Specifies that the text extraction engine must preserve text paragraphs.
This functionality is particularly useful to improve the accuracy of NLP engines.
|
 | ResourcesFolder | Specifies the path to the directory containing the engine resources (mostly dictionaries). |
 | TimeoutMilliseconds | Specifies the timeout for any subsequent process, in milliseconds. The default value is -1, which means there is no timeout. |
Top
Public Methods
See Also