GdPicture Steps
The addition of the GdPicture SDK library has allowed the development of new, valuable steps that will present users more choice in the way documents are processed. The new steps added are below:
-
Validate PDFA
-
Linearize PDF
-
Convert Any File to PDF (GdPicture)
-
Combine Any File to PDF
-
Combine PDFs
-
PDF to JPEG
-
PDF to PNG
-
PDF to TIFF (GdPicture)
-
PDF to Text
-
PDF to Searchable PDF (GdPicture)
-
Create PDF Portfolio
-
Convert PDF to PDF/A
-
Compress PDF
The GdPicture library has many capabilities beyond the new steps that have been created, so users can expect more steps utilizing this library to be added in the future.
Useful information for the new steps can be found in the sections below.
PDFA Validation
When archiving pdf files, if the files successfully conform to an ISO standard as PDF/A files, it will ensure that the document will be able to be rendered in the future and appear as expected. Setting a file to a PDF/A version ensures it’s preservation, a necessity in certain industries when archiving for extended periods.
The PDFA Validation step ensures that files in a directory fit all the requirements of the selected PDF/A version:
-
If a file is in valid PDF/A format for the selected version, it will be copied to the output folder
-
If the file does not fit the selected format, the file will go the selected error folder for the job
Users can run the files in the error folder through the Convert PDF To PDFA step to create valid PDF/A files that conform to your selected PDF/A version.
Linearize PDF
This step optimizes PDFs by enabling Fast Web View mode for web-viewing, allowing the rendering the document one page at a time. This enhances the user experience when viewing larger PDFs on the web.
Convert Any File To PDF (GdPicture)
This step can convert a large variety of file types to PDF.
Description | Suffix |
Windows bitmap format | BMP |
Microsoft Word (.doc) binary file format | DOC |
Microsoft Word OpenXML | DOCX |
Microsoft Word Macro-Enabled OpenXML format | DOCM |
Enhanced Windows Meta-format | EMF |
Graphics Interchange Format | GIF |
HTML format | HTML |
Icon and cursor format (single or multi page) | ICO |
Joint Photographic Expert Group | JPEG |
Portable Gray-map File | PGM |
Portable Network Graphics Format | PNG |
Portable Pix-map File | PPM |
Microsoft Powerpoint Presentation format | PPTX |
Microsoft PowerPoint Macro-Enabled Presentation format | PPTM |
Rich Text File Format | RTF |
Tagged Image Format | TIFF |
Plain text file | TXT |
Standard Windows Meta-format | WMF |
Microsoft Excel (.xls) binary file format | XLS |
Microsoft Excel Spreadsheet format | XLSX |
Electronic Mail format | EML |
Outlook Item File Formal | MSG |
Scalable Vector Graphics File | SVG |
Device Independent Bitmap format | DIB |
24-bit compressed JPEG Graphic format | JPE |
MIME HTML format | MHTML |
OpenDocument Text file format | ODT |
Portable Bitmap Image file format | PBM |
Picture Exchange image file format | PCX |
Target raster graphics format | TGA |
This step uses the GdPicture engine to render the file and thus does not require an Office installation to process Office files.
Combine Any File To PDF
Converts a folder of files into PDF and then merges them, to create a single output PDF.
See Convert Any File To PDF (GdPicture) for the file formats.
This step uses the GdPicture engine to render the file and thus does not require an Office installation to process Office files.
Combine PDFs
Merges a folder of PDF files to create a single output PDF.
PDF To JPEG
Converts an input PDF page by page into a set of JPEG files using the GdPicture toolkit.
PDF To PNG
Converts an input PDF page by page into a set of PNG files using the GdPicture toolkit.
PDF To TIFF (GdPicture)
Converts an input PDF into a multipage TIFF file using the GdPicture toolkit.
PDF To Text
Extracts the searchable text from the pages of a PDF file and creates an output text file.
PDF To Searchable PDF (GdPicture)
Carries out Optical Character Recognition on the input PDF using the GdPicture toolkit, creating an invisible searchable text layer over the document.
-
OCR Language Codes
For the new GdPicture OCR step, a user can specify from over 100 languages from the table below by adding their code to the Additional Dictionary field. You can also specify multiple languages in this field by separating their code with a ‘+’ symbol. e.g. Using ‘deu+fra+spa’ will include all three dictionaries in the OCR process.
New language files need to be added to the “…\Autobahn DX\distribution\gdpicture\ocr” folder. Please download the OCR languages pack, including over 100 languages from: http://www.gdpicture.com/download/tesseract_ocr_4x_language_pack.zip
Language | Code | Language | Code | Language | Code | ||
Afrikaans | afr | German - Fraktur | deu_frak | Portuguese | por | ||
Albanian | sqi | Greek | ell | Pushto | pus | ||
Amharic | amh | GreekAncient | grc | Quechua | que | ||
Arabic | ara | Gujarati | guj | Romanian | ron | ||
Armenian | hye | HaitianCreole | hat | Russian | rus | ||
Assamese | asm | Hebrew | heb | Sanskrit | san | ||
Azerbaijani | aze | Hindi | hin | Scottish Gaelic | gla | ||
AzerbaijianiCyrillic | aze_cyrl | Hungarian | hun | Serbian | srp | ||
Basque | eus | Icelandic | isl | SerbianLatin | srp-latn | ||
Belarusian | bel | Indonesian | ind | Sindhi | snd | ||
Bengali | ben | Inuktitut | iku | Sinhala | sin | ||
Bosnian | bos | Irish | gle | Slovak | slk | ||
Breton | bre | Italian | ita | Slovak (Fraktur) | slk_frak | ||
Bulgarian | bul | Italian_Old | ita_old | Slovenian | slv | ||
Burmese | mya | Japanese | jpn | Spanish | spa | ||
CatalanValencian | cat | Javanese | jav | Spanish_Old | spa_old | ||
Cebuano | ceb | Kannada | kan | Sundanese | sun | ||
CentralKhmer | khm | Kazakh | kaz | Swahili | swa | ||
Cherokee | chr | Kirghiz | kir | Swedish | swe | ||
ChineseSimplified | chi_sim | Korean | kor | Syriac | syr | ||
ChineseTraditional | chi_tra | Kurdish | kur | Tagalog | tgl | ||
Corsican | cos | Kurmanji | kmr | Tajik | tgk | ||
Croatian | hrv | Lao | lao | Tamil | tam | ||
Czech | ces | Latin | lat | Tatar | tat | ||
Danish | dan | Latvian | lav | Telugu | tel | ||
Danish – Fraktur | dan_frak | Lithuanian | lit | Thai | tha | ||
Dutch | nld | Luxembourgish | ltz | Tibetan | bod | ||
Dzongkha | dzo | Macedonian | mkd | Tigrinya | tir | ||
English | eng | Malay | msa | Tonga | ton | ||
English (Middle) | enm | Malayalam | mal | Turkish | tur | ||
Esperanto | epo | Maltese | mlt | Uighur | uig | ||
Estonian | est | Maori | mri | Ukrainian | ukr | ||
Faroese | fao | Marathi | mar | Urdu | urd | ||
Filipino | fil | Maths | equ | Uzbek | uzb | ||
Finnish | fin | Mongolian | mon | UzbekCyrillic | uzb-cyrl | ||
Frankish | frk | Nepali | nep | Vietnamese | vie | ||
French | fra | Norwegian | nor | Welsh | cym | ||
French (Middle) | frm | Occitan | oci | Western Frisian | fry | ||
Galician | glg | Oriya | ori | Yiddish | yid | ||
Georgian | kat | Panjabi | pan | Yoruba | yor | ||
Georgian_Old | kat_old | Persian | fas | ||||
German | deu | Polish | pol |
PDF Portfolio
Creates a PDF Portfolio file by embedding files from various file types. On opening the PDF Portfolio, these files will be displayed on selection.
Convert to PDFA
Convert a PDF file to PDF/A format file.
Compression
Compress a PDF file to reduce the output file size.
Detect Signatures
Detect if a PDF file contains digital signatures.
Smart Redaction
Redact text in a PDF file based on common categories for sensitive information.
Key Value Pair Extraction
Extract important data pairs from PDF or supported image files.
Pattern Redaction
Redact text in a PDF file based on regex patterns or a terms list.
Pattern Highlighting
Highlights text in a PDF file based on regex patterns or a terms list.
Split PDF (GdPicture)
Split PDF files based on page ranges, bookmarks, or into single pages.
Split by Barcode
Split PDF pages based on barcodes found in the document.