Enhance document processing with the .NET SDK
The addition of Nutrient .NET SDK has allowed for the development of new, valuable steps that will present users with more choice in the way documents are processed.
Useful information about these steps can be found in the sections below. For detailed information on any of the steps, refer to the Nutrient .NET SDK steps guide.
PDF/A validation
When archiving PDF files, if the files successfully conform to an ISO standard as PDF/A files, archiving ensures a document can be rendered in the future and appear as expected. Setting a file to a PDF/A version ensures its preservation, which is a necessity in certain industries when archiving for extended periods.
The Validate PDFA step ensures files in a directory fit all the requirements of the selected PDF/A version:
-
If a file is in valid PDF/A format for the selected version, it’ll be copied to the output folder.
-
If the file doesn’t fit the selected format, the file will go into the selected error folder for the job.
Users can run the files in the error folder through the Convert PDF To PDFA step to create valid PDF/A files that conform to the selected PDF/A version.
Linearizing PDFs
This step optimizes PDFs by enabling Fast Web View mode for web viewing, allowing the rendering of a document, one page at a time. This enhances the user experience when viewing larger PDFs on the web.
Converting any file to PDF (Nutrient .NET SDK)
This step can convert a large variety of file types to PDF.
Description | Suffix |
---|---|
Windows bitmap | BMP |
Word (.doc) Binary File Format | DOC |
Word Open XML | DOCX |
Microsoft Word Document with Macros | DOCM |
Windows Enhanced Metafile | EMF |
Graphics Interchange Format | GIF |
HTML format | HTML |
Icon and cursor format (single or multi page) | ICO |
Joint Photographic Experts Group | JPEG |
Portable Graymap Format | PGM |
Portable Network Graphics | PNG |
Portable Pixmap Format | PPM |
Microsoft Powerpoint Presentation format | PPTX |
Microsoft PowerPoint Macro-Enabled Presentation format | PPTM |
Rich Text Format | RTF |
Tagged Image File Format | TIFF |
Plain text file | TXT |
Windows Metafile | WMF |
Microsoft Excel (.xls ) binary file format |
XLS |
Microsoft Excel Spreadsheet format | XLSX |
Electronic Mail format | EML |
Outlook Item | MSG |
Scalable Vector Graphics | SVG |
Device-independent bitmap | DIB |
24-bit compressed JPEG Graphic format | JPE |
MIME HTML | MHTML |
OpenDocument Text | ODT |
Portable bitmap format | PBM |
PiCture eXchange | PCX |
Truevision Graphics Adapter | TGA |
This step uses the Nutrient .NET SDK engine to render the file, and as a result, doesn’t require an Office installation to process Office files.
Combining any file to PDF
This converts a folder of files into PDF format and then merges them to create a single output PDF.
This step uses the Nutrient .NET SDK engine to render the file and thus doesn’t require an Office installation to process Office files.
Combining PDFs
This merges a folder of PDF files to create a single output PDF.
PDF to JPEG
This converts an input PDF page by page into a set of JPEG files using Nutrient .NET SDK.
PDF to PNG
This converts an input PDF page by page into a set of PNG files using Nutrient .NET SDK.
PDF to TIFF (Nutrient .NET SDK)
This converts an input PDF into a multipage TIFF file using Nutrient .NET SDK.
PDF to text
This extracts the searchable text from the pages of a PDF file and creates an output text file.
PDF to searchable PDF (Nutrient .NET SDK)
This carries out optical character recognition (OCR) on the input PDF using Nutrient .NET SDK, creating an invisible searchable text layer over the document.
-
OCR language codes
For the Nutrient .NET SDK OCR step, a user can choose from more than 100 languages from the table below by adding their code to the Additional Dictionary field. It’s also possible to specify multiple languages in this field by separating the code with a + symbol. For example, using deu+fra+spa will include all three dictionaries in the OCR process.
New language files need to be added to the “…\Autobahn DX\distribution\gdpicture\ocr”
folder. Download the OCR languages pack, including more than 100 languages, from the Tesseract OCR 4x Language Pack.
Language | Code | Language | Code | Language | Code | ||
---|---|---|---|---|---|---|---|
Afrikaans | afr | German - Fraktur | deu_frak | Portuguese | por | ||
Albanian | sqi | Greek | ell | Pushto | pus | ||
Amharic | amh | GreekAncient | grc | Quechua | que | ||
Arabic | ara | Gujarati | guj | Romanian | ron | ||
Armenian | hye | HaitianCreole | hat | Russian | rus | ||
Assamese | asm | Hebrew | heb | Sanskrit | san | ||
Azerbaijani | aze | Hindi | hin | Scottish Gaelic | gla | ||
AzerbaijianiCyrillic | aze_cyrl | Hungarian | hun | Serbian | srp | ||
Basque | eus | Icelandic | isl | SerbianLatin | srp-latn | ||
Belarusian | bel | Indonesian | ind | Sindhi | snd | ||
Bengali | ben | Inuktitut | iku | Sinhala | sin | ||
Bosnian | bos | Irish | gle | Slovak | slk | ||
Breton | bre | Italian | ita | Slovak (Fraktur) | slk_frak | ||
Bulgarian | bul | Italian_Old | ita_old | Slovenian | slv | ||
Burmese | mya | Japanese | jpn | Spanish | spa | ||
CatalanValencian | cat | Javanese | jav | Spanish_Old | spa_old | ||
Cebuano | ceb | Kannada | kan | Sundanese | sun | ||
CentralKhmer | khm | Kazakh | kaz | Swahili | swa | ||
Cherokee | chr | Kirghiz | kir | Swedish | swe | ||
ChineseSimplified | chi_sim | Korean | kor | Syriac | syr | ||
ChineseTraditional | chi_tra | Kurdish | kur | Tagalog | tgl | ||
Corsican | cos | Kurmanji | kmr | Tajik | tgk | ||
Croatian | hrv | Lao | lao | Tamil | tam | ||
Czech | ces | Latin | lat | Tatar | tat | ||
Danish | dan | Latvian | lav | Telugu | tel | ||
Danish – Fraktur | dan_frak | Lithuanian | lit | Thai | tha | ||
Dutch | nld | Luxembourgish | ltz | Tibetan | bod | ||
Dzongkha | dzo | Macedonian | mkd | Tigrinya | tir | ||
English | eng | Malay | msa | Tonga | ton | ||
English (Middle) | enm | Malayalam | mal | Turkish | tur | ||
Esperanto | epo | Maltese | mlt | Uighur | uig | ||
Estonian | est | Maori | mri | Ukrainian | ukr | ||
Faroese | fao | Marathi | mar | Urdu | urd | ||
Filipino | fil | Maths | equ | Uzbek | uzb | ||
Finnish | fin | Mongolian | mon | UzbekCyrillic | uzb-cyrl | ||
Frankish | frk | Nepali | nep | Vietnamese | vie | ||
French | fra | Norwegian | nor | Welsh | cym | ||
French (Middle) | frm | Occitan | oci | Western Frisian | fry | ||
Galician | glg | Oriya | ori | Yiddish | yid | ||
Georgian | kat | Panjabi | pan | Yoruba | yor | ||
Georgian_Old | kat_old | Persian | fas | ||||
German | deu | Polish | pol |
PDF Portfolio
This creates a PDF Portfolio file by embedding files from various file types. On opening the PDF Portfolio, these files will be displayed on selection.
Converting to PDF/A
This converts a PDF file to PDF/A format file.
Compression
This compresses a PDF file to reduce the output file size.
Detecting signatures
This detects if a PDF file contains digital signatures.
Smart redaction
This redacts text in a PDF file based on common categories for sensitive information.
Key-value pair extraction
This extracts important data pairs from PDF or supported image files.
Pattern redaction
This redacts text in a PDF file based on regex patterns or a terms list.
Pattern highlighting
This highlights text in a PDF file based on regex patterns or a terms list.
Splitting PDF (Nutrient .NET SDK)
This splits PDF files based on page ranges and bookmarks, or into single pages.
Splitting by barcode
This splits PDF pages based on barcodes found in a document.