Enhance document processing with the .NET SDK

The addition of Nutrient .NET SDK has allowed for the development of new, valuable steps that will present users with more choice in the way documents are processed.

Useful information about these steps can be found in the sections below. For detailed information on any of the steps, refer to the Nutrient .NET SDK steps guide.

PDF/A validation

When archiving PDF files, if the files successfully conform to an ISO standard as PDF/A files, archiving ensures a document can be rendered in the future and appear as expected. Setting a file to a PDF/A version ensures its preservation, which is a necessity in certain industries when archiving for extended periods.

The Validate PDFA step ensures files in a directory fit all the requirements of the selected PDF/A version:

If a file is in valid PDF/A format for the selected version, it’ll be copied to the output folder.
If the file doesn’t fit the selected format, the file will go into the selected error folder for the job.

Users can run the files in the error folder through the Convert PDF To PDFA step to create valid PDF/A files that conform to the selected PDF/A version.

Linearizing PDFs

This step optimizes PDFs by enabling Fast Web View mode for web viewing, allowing the rendering of a document, one page at a time. This enhances the user experience when viewing larger PDFs on the web.

Converting any file to PDF (Nutrient .NET SDK)

This step can convert a large variety of file types to PDF.

Description	Suffix
Windows bitmap	BMP
Word (.doc) Binary File Format	DOC
Word Open XML	DOCX
Microsoft Word Document with Macros	DOCM
Windows Enhanced Metafile	EMF
Graphics Interchange Format	GIF
HTML format	HTML
Icon and cursor format (single or multi page)	ICO
Joint Photographic Experts Group	JPEG
Portable Graymap Format	PGM
Portable Network Graphics	PNG
Portable Pixmap Format	PPM
Microsoft Powerpoint Presentation format	PPTX
Microsoft PowerPoint Macro-Enabled Presentation format	PPTM
Rich Text Format	RTF
Tagged Image File Format	TIFF
Plain text file	TXT
Windows Metafile	WMF
Microsoft Excel (`.xls`) binary file format	XLS
Microsoft Excel Spreadsheet format	XLSX
Electronic Mail format	EML
Outlook Item	MSG
Scalable Vector Graphics	SVG
Device-independent bitmap	DIB
24-bit compressed JPEG Graphic format	JPE
MIME HTML	MHTML
OpenDocument Text	ODT
Portable bitmap format	PBM
PiCture eXchange	PCX
Truevision Graphics Adapter	TGA

This step uses the Nutrient .NET SDK engine to render the file, and as a result, doesn’t require an Office installation to process Office files.

Combining any file to PDF

This converts a folder of files into PDF format and then merges them to create a single output PDF.

This step uses the Nutrient .NET SDK engine to render the file and thus doesn’t require an Office installation to process Office files.

Combining PDFs

This merges a folder of PDF files to create a single output PDF.

PDF to JPEG

This converts an input PDF page by page into a set of JPEG files using Nutrient .NET SDK.

PDF to PNG

This converts an input PDF page by page into a set of PNG files using Nutrient .NET SDK.

PDF to TIFF (Nutrient .NET SDK)

This converts an input PDF into a multipage TIFF file using Nutrient .NET SDK.

PDF to text

This extracts the searchable text from the pages of a PDF file and creates an output text file.

PDF to searchable PDF (Nutrient .NET SDK)

This carries out optical character recognition (OCR) on the input PDF using Nutrient .NET SDK, creating an invisible searchable text layer over the document.

OCR language codes

For the Nutrient .NET SDK OCR step, a user can choose from more than 100 languages from the table below by adding their code to the Additional Dictionary field. It’s also possible to specify multiple languages in this field by separating the code with a + symbol. For example, using deu+fra+spa will include all three dictionaries in the OCR process.

New language files need to be added to the “…\Autobahn DX\distribution\gdpicture\ocr” folder. Download the OCR languages pack, including more than 100 languages, from the Tesseract OCR 4x Language Pack.

Language	Code	Language	Code	Language	Code
Afrikaans	afr	German - Fraktur	deu_frak	Portuguese	por
Albanian	sqi	Greek	ell	Pushto	pus
Amharic	amh	GreekAncient	grc	Quechua	que
Arabic	ara	Gujarati	guj	Romanian	ron
Armenian	hye	HaitianCreole	hat	Russian	rus
Assamese	asm	Hebrew	heb	Sanskrit	san
Azerbaijani	aze	Hindi	hin	Scottish Gaelic	gla
AzerbaijianiCyrillic	aze_cyrl	Hungarian	hun	Serbian	srp
Basque	eus	Icelandic	isl	SerbianLatin	srp-latn
Belarusian	bel	Indonesian	ind	Sindhi	snd
Bengali	ben	Inuktitut	iku	Sinhala	sin
Bosnian	bos	Irish	gle	Slovak	slk
Breton	bre	Italian	ita	Slovak (Fraktur)	slk_frak
Bulgarian	bul	Italian_Old	ita_old	Slovenian	slv
Burmese	mya	Japanese	jpn	Spanish	spa
CatalanValencian	cat	Javanese	jav	Spanish_Old	spa_old
Cebuano	ceb	Kannada	kan	Sundanese	sun
CentralKhmer	khm	Kazakh	kaz	Swahili	swa
Cherokee	chr	Kirghiz	kir	Swedish	swe
ChineseSimplified	chi_sim	Korean	kor	Syriac	syr
ChineseTraditional	chi_tra	Kurdish	kur	Tagalog	tgl
Corsican	cos	Kurmanji	kmr	Tajik	tgk
Croatian	hrv	Lao	lao	Tamil	tam
Czech	ces	Latin	lat	Tatar	tat
Danish	dan	Latvian	lav	Telugu	tel
Danish – Fraktur	dan_frak	Lithuanian	lit	Thai	tha
Dutch	nld	Luxembourgish	ltz	Tibetan	bod
Dzongkha	dzo	Macedonian	mkd	Tigrinya	tir
English	eng	Malay	msa	Tonga	ton
English (Middle)	enm	Malayalam	mal	Turkish	tur
Esperanto	epo	Maltese	mlt	Uighur	uig
Estonian	est	Maori	mri	Ukrainian	ukr
Faroese	fao	Marathi	mar	Urdu	urd
Filipino	fil	Maths	equ	Uzbek	uzb
Finnish	fin	Mongolian	mon	UzbekCyrillic	uzb-cyrl
Frankish	frk	Nepali	nep	Vietnamese	vie
French	fra	Norwegian	nor	Welsh	cym
French (Middle)	frm	Occitan	oci	Western Frisian	fry
Galician	glg	Oriya	ori	Yiddish	yid
Georgian	kat	Panjabi	pan	Yoruba	yor
Georgian_Old	kat_old	Persian	fas
German	deu	Polish	pol

PDF Portfolio

This creates a PDF Portfolio file by embedding files from various file types. On opening the PDF Portfolio, these files will be displayed on selection.

Converting to PDF/A

This converts a PDF file to PDF/A format file.

Compression

This compresses a PDF file to reduce the output file size.

Detecting signatures

This detects if a PDF file contains digital signatures.

Smart redaction

This redacts text in a PDF file based on common categories for sensitive information.

Key-value pair extraction

This extracts important data pairs from PDF or supported image files.

Pattern redaction

This redacts text in a PDF file based on regex patterns or a terms list.

Pattern highlighting

This highlights text in a PDF file based on regex patterns or a terms list.

Splitting PDF (Nutrient .NET SDK)

This splits PDF files based on page ranges and bookmarks, or into single pages.

Splitting by barcode

This splits PDF pages based on barcodes found in a document.