Effortlessly design and manage document workflows

This allows definition and editing of a job definition, using a tree-list type model coupled with a Visual Studio – style property list. The different step types are listed on the left under the Designer Task group box. The step types have been grouped into sub categories, each step type will have its own icon. Drag and drop can be used to allow reordering of steps.

Configure Job Designer Tasks

Menu ItemsActions
Run NowExecutes the job that is being edited. The output is displayed in the Run tab.
SaveValidates the current job and if valid, save the current job definition to %JOBID%.xml in the %JOBDEFDIR% directory.
OCRThis expander contains the steps that perform OCR. Document Automation Server (DAS) will gray out the invalid steps. The step types in these groups are:
- Image to Searchable PDF (Standard)
- Image to Searchable PDF (Extended)
- PDF to Searchable PDF (Standard)
- PDF to Searchable PDF (Extended)
- Any File to Searchable PDF (Standard)
- Any File to Searchable PDF (Extended)
- Merge Image to Searchable PDF (Standard)
- Merge Image to Searchable PDF (Extended)
- PDF to Searchable PDF (GdPicture)
Convert- Convert PDF to TIFF
- Convert Any File to PDF
- Convert PDF to PDFA
- Convert Any File to PDF (GdPicture)
- Combine Any File to PDF
- PDF to JPEG
- PDF to PNG
- PDF to TIFF
- PDF to Text
- Convert PDF to Office
- Convert Any File to Office
Split and Merge- Merge PDF
- Split PDF
- Merge TIFF
- Split TIFF
- Combine PDFs
- Split PDF (GdPicture)
Connectors- Read Mailbox
- Send Documents
- SharePoint Download
- SharePoint Upload
- Azure Storage Download
- Azure Storage Upload
Barcode- Barcode TIFF/PDF
- Split PDF by Barcode
PDF Operations- Set PDF Properties
- Create XML Property File
- Extract Text from PDF File
- Optimize PDF
- Stamp PDF Files
- Modern Compress PDF
- Validate PDFA
- Linearize PDF
- Create PDF Portfolio
- Get Document Information
Advanced- Custom Script Step
- High Availability
- DAS Content Extraction (Kingfisher) Job
- Distributed Polling
- PDF Recognition to JSON
- Image to Searchable PDF (Microsoft Cloud)
- PDF to Searchable PDF (Microsoft Cloud)
- Image to Searchable PDF (Google Cloud)
- PDF to Searchable PDF (Google Cloud)
- Detect Signatures
- Smart Redaction
- Key Value Pair Extraction
- Pattern Redaction
- Pattern Highlight
- Pattern Enumeration
Delete StepDeletes the currently selected step node.
Clear ErrorClick this before you can run a job that is in an error state.
HelpTakes you to the ‘Help’ tab, which has links to many useful blogs, documents, and other resources. It also has contacts if you need help from our support or sales team.

Fields

Menu ItemDescription
Job IDA sequential Job ID is allocated for the Job by DAS. This cannot be changed.
Job NameA descriptive title for the job.
Source FolderThe folder containing the documents to be processed.
Destination FolderThe folder where the processed files will be placed if “Move input files to target folder after processing” is chosen.
Use Work FoldersBy default, DAS processes job steps by using a separate folder for each step. Hence files from the source folder are copied to a work folder, processed for each step to another work folder and then finally to the target folder. This approach ensures integrity (e.g., correctly processing files that are added to the source folder after a job has started) but can slow down large jobs.
Process Sub-FoldersIf checked, all sub-folders will be recursively processed.
Delete Empty Input FoldersChecking this property will delete empty folders under the source folder after we move or delete your input files.
Input filesThis option determines what happens to the input files once processing has been completed. The options are:
- Leave input files after processing: Files are left in the Source Folder.
- Move to archive after processing: Files are moved to the Archive Folder.
- Copy to archive after processing: Files are copied to the Archive Folder.
- Move input files to target folder after processing*: Input files are placed in the same folder as the output files.
- Delete input files after successful processing: Input files are deleted.
Rename Input FilesThis determines how input files will be renamed when moved to the Target or Archive folder. The default is:
%FILENAME%%TIMESTAMP%.%EXT%
You can also use %EMAILNAME% for files named in the email format. This will rename the file to its original name.
Filter FilesSee Filter File Option table below for more details.
Note: Work Folders must be used to enable the use of filters.
Filter ExpressionOne or more search options used to determine the files in the source folder that should be processed. Multiple expressions may be used, separated by spaces.
Examples:
- _.pdf
- _.doc
- _.ppt
- _.xls
Batch SizeLimits the number of documents to be processed to the given size. To use this feature, you must use a “Filter File Option” with “Document Count Limit”.
File OrderThe order that the files will be processed. There are UTC and local time variants of the date options, totaling nine options:
- Alphabetically
- Created Date (Ascending)
- Created Date (Descending)
- Modified Date (Ascending)
Modified Date (Descending)
Note: This setting does not work for “Merge Image to PDF” steps, the merge and OCR must be done in two separate job steps.
Log FilePath of the job log file. This will include %DATESTAMP%, which is the date of the day the job started. A new log file will be created for each day.
CSV Log FilePath of the job log file. This will include %DATESTAMP%, which is the date of the day the job started. A new CSV file will be created for each day. The columns in the CSV file are:
- Job Start – Time Job Started
- Source Files – Full path to the source file
- Target File – Full path to the target file
- Job Stopped – Time Job Finished
- Success – True or False; Files that could not be processed will have a value of False.
- Page counts (not all steps generate page counts and dependent on configuration setting)
Retention PeriodThis is an integer value representing the number of days the log file will be kept for before being deleted.
Leaving it blank or setting it to a number less than one will keep the log files indefinitely.
Max SizeSet the maximum log file size. If a log file is created above this size, it will be split into smaller segments.
Stop Processing on ErrorIf checked, the job will stop if it returns an error, and will not run again until the error is cleared from the Monitor screen.
Skip Long File NamesCheck this box to make DAS skip files with long filenames. If this box is not checked, DAS will throw an error if it encounters one of these files.
Skip Folders That Autobahn Can’t AccessCheck this box to make DAS skip folders it has no permissions to access. If this box is not checked, DAS will throw an error if it encounters one of these folders.
Archive FolderThe folder where the processed files will be placed if “Move to archive after processing” is chosen.
Work FolderThe folder where files will be temporarily stored during conversion and processing.
Error FolderSource documents that have errors during processing will be placed in the specified folder.
Temp FolderSome job steps can require a significant amount of temporary storage, particularly those steps involving OCR. This folder defines the location of the temporary space.
Trigger FileYou can find this setting under the Processing tab, if you provide a Trigger File value, DAS will not process a folder until the Trigger File is present, the file will be deleted after each job run.

Filter file option

Filter File OptionDescription
Include Files MatchingOnly files matching the Filter Expression are included.
ExcludeFiles matching the Filter Expression are excluded.
Include with Document Count LimitFor example, “*.pdf; 3000” would limit the job to 3000 PDF files.
Include Unprocessed PDFs OnlyThis would limit files selected to PDFs that have not been OCRed.
A file is deemed to have been OCRed if:
- It has a custom metadata tag AQUAFORESTOCR
Or it has one image per page and only has “invisible” text.
This should be used in conjunction with a “Non-Image PDF” setting of “Rasterize and OCR” to ensure that all PDF files are processed.
Include Unprocessed PDFs Only – with Document Count LimitAs above, but limited to the number of files specified in the filter.

Job scheduling

To use the Job Schedule, you will need to click the Schedule tab under the Designer Tab.

Set up the Job Schedule

The product supports three types of scheduling which are implemented via the DAS service:

Ad-hoc

This means that the job does not have any fixed schedule, but maybe explicitly run via the management GUI or via one of the API methods.

Watched folder / Continuous scheduling

This allows the job to be scheduled to run periodically between a start time and end time each day. The periods may be seconds, minutes, or hours. For example, a job may be specified to run every 30 seconds between 9:00 and 17:00.

If you check the “Run Continuously” checkbox, the job will run for 24 hours a day. This option is the default for all continuous jobs.

Daily scheduling

This allows the job to be scheduled to run at a specified time each day.

Alerts

This allows you to send Emails to your mailbox when the job is successful or fails, to get to the Alerts tab, you will need to click the Alerts tab under the Designer Tab.

Note: You will need to enter your SMTP setting in the Modules and Options tab before the email alerts will work properly.

Set up an Alert

Menu ItemAction
Send Email Alerts on Job CompletionIf checked, DAS will send an email if the job ends naturally or prematurely. This alert can be further tailored using the properties In the section below.
Only Send Email Alerts if:
At least one file was processedIf you check this option, DAS will not send any email until it processes at least one file in the job. This is meant to reduce the number of irrelevant messages you get.
Job Terminated PrematurelyCheck this if you only want to receive emails when an error occurs during the processing of a job.
Note: Individual file errors will not put the job in error, a job error occurs in a more fatal circumstance.
At least one file error occurredCheck this option if you only want to receive emails when individual file errors occur.
Attach Log FileCheck this option if you want DAS to attach the Log file of the job to the email alert.
Attach Job ReportCheck this option if you want DAS to attach a report/summary of the job to the email alert.
From Email AddressThe “from” email address that will be used for the message.
To Email AddressThe email address that the message will be sent to.
Email TitleThe title of the email.
Email MessageThe body of the email, this can be HTML content.

Alert variables

When sending emails, there are several variables that can be used to customize the alerts you send out, these variables are enclosed by two percent signs %%. DAS will replace any occurrences of the variables with an appropriate value at run time. The table below shows the possible variables that can be used.

VariableMeaning
%JOBID%The Job ID, this works with both the email title and email message.
%JOBNAME%The Job Name, this works with both the email title and email message.
%JOBSTATUS%The Job Status, this works with both the email title and email message.
%LOGFILE%The location of the log file, this works with both the email title and email message.
%JOBSOURCE%The Source Directory of the job, this works with the email message only.
%JOBTARGET%The Destination Directory of the job, this works with the email message only.
%DATESTAMP%The date that the alert was generated, this works with both the email title and email message.
%TIMESTAMP%The time the alert was generated, this works with both the email title and email message.

Workflow Processing versus In-Place Processing

DAS is designed as a Workflow product where there is an input folder and an output folder. At the end of the process, there are options to copy, delete or move the input files that have been successfully processed.

With “in-place” processing, the input documents are turned into searchable PDFs and returned to the same location. It is possible to replace the existing file if the output file format produces the same file name. The input files can be copied to an archive location if they need to be kept (this is recommended during the development process and during testing – if this is not set, the original file cannot be recovered).

DAS can be used for in-place processing, but we have an OCR product named Document Searchability that is designed specifically for in-place conversions to searchable PDFs, it may handle this Use-Case more effectively. Searchlight records all the files it processes, so is more efficient when there are a lot of files, as they do not need to be opened to be identified as previously processed.

Example in-place job setup

The job shown below will convert PDFs under the tree “C:\ADX Demo\Documents” to searchable PDFs, processing up to 5 files each time the job is run.

Set up In-Place Job Properties

The Source Folder and the Target Folder must be the same.

The Use Work Folders check box must be checked when processing in place. A message will be displayed when the folders are set to the same location in the UI and the check box set automatically.

Message When Folders are set to the Same Location

Select the Process Sub-Folders check box.

For Audit Purposes, the Input Files option should be set to Copy to archive after Processing.

To avoid re-processing files, select the Include Unprocessed PDFs Only – with Document Count Limit option in the Filter Files combo box.

Because the Filter Files option selected includes the Document Count Limit, the Batch Size of the job can be set to 5 files per run (You can increase this to a suitable batch size).

The Output file Name is set in the Conversion Settings for the step and should be configured to **%FILENAME.pdf** so that it will replace the input file.

Step types

This section explains each of the step types.

DAS Server edition is licensed to use Standard and GDPicture steps. The Extended edition adds the Extended OCR steps.

Step GroupStep Name
OCRImage to Searchable PDF (Standard)
OCRImage to Searchable PDF (Extended)
OCRPDF to Searchable PDF (Standard)
OCRPDF to Searchable PDF (Extended)
OCRAny File to Searchable PDF (Standard)
OCRAny File to Searchable PDF (Extended)
OCRMerge Image to Searchable PDF (Standard)
OCRMerge Image to Searchable PDF (Extended)
OCRPDF To Searchable PDF (GdPicture)
ConvertConvert PDF to TIFF
ConvertConvert Any File to PDF
ConvertConvert PDF to PDFA
ConvertConvert Any File To PDF (GdPicture)
ConvertCombine Any File To PDF
ConvertPDF To JPEG
ConvertPDF To PNG
ConvertPDF To TIFF
ConvertPDF To Text
ConvertConvert PDF To Office
ConvertConvert Any File To Office
Split and MergeMerge PDF
Split and MergeSplit PDF
Split and MergeMerge TIFF, JPEG, BMP, PNG, GIF
Split and MergeSplit TIFF
Split and MergeCombine PDFs
Split and MergeSplit PDF (GdPicture)
ConnectorsRead Mailbox
ConnectorsSend Documents
ConnectorsSharePoint Download
ConnectorsSharePoint Upload
ConnectorsAzure Storage Download
ConnectorsAzure Storage Upload
BarcodeBarcode TIFF/PDF
BarcodeSplit PDF by Barcode
PDF OperationsSet PDF Properties
PDF OperationsCreate XML Property File
PDF OperationsExtract Text from PDF File
PDF OperationsOptimize PDF
PDF OperationsStamp PDF Files
PDF OperationsModern Compress PDF
PDF OperationsValidate PDFA
PDF OperationsLinearize PDF
PDF OperationsCreate Pdf Portfolio
PDF OperationsGet Document Information
AdvancedCustom Script Step
AdvancedHigh Availability
AdvancedDAS Content Extraction Job
AdvancedDistributed Polling
AdvancedPDF Recognition to JSON
AdvancedImage to Searchable PDF (Microsoft Cloud OCR)
AdvancedPDF to Searchable PDF (Microsoft Cloud OCR)
AdvancedImage to Searchable PDF (Google Cloud OCR)
AdvancedPDF to Searchable PDF (Google Cloud OCR)
AdvancedDetect Signatures
AdvancedSmart Redaction
AdvancedKey Value Pair Extraction
AdvancedPattern Redaction
AdvancedPattern Highlighting
AdvancedPattern Enumeration

Image to searchable PDF

This step can be found under the OCR Expander. It creates a searchable PDF file from input image types e.g. .png, .tiff, .jpg, .gif, .bmp.

Depending upon the Step Type Properties chosen, a separate text, HTML and Office files may be produced from the OCR process.

This step is not available for the GDPicture engine; however, it can be replicated by using a combination of the Convert Any File To PDF (GdPicture) and PDF To Searchable PDF (GdPicture) steps

Standard engine

ParameterNotes
Output File NameTarget file template which can include %FILENAME (original filename without the extension) and %DIRNAME (directory name of the original file)
Create Directories if RequiredForce creation of any output directories if they do not already exist.
Continue on ErrorContinue processing TIFF files after an error occurs.
OCR- Choose “No” to generate an image-only PDF.
- Choose “Yes” to generate searchable PDF and/or text files.
OCR LanguageSelect the language the original file is written in. This will determine the dictionary that is used.
DeskewStraighten the image.
Auto-RotateAutomatically rotate pages so that text flows left to right.
DespeckleRemove specks below the specified pixel size from the image.
OCR to Text FileChoose “Yes” to Generate text Output
Output File- Plain Text (txt).
- Plain Text (txt) No PDF
- MS Word (rtf)
- HTML
PDF/A OptionsSelect the output PDF/A compliant version you would like the output PDF to be.
- PDF/A1-b
- PDF/A2-b
- PDF/A3-b
Validate PDF/AWhether or not to validate the PDF/A document after conversion
JBIG2 CompressionThis option will compress bitonal images in generated PDFs using JBIG2 compression rather than the default Group 4 compression scheme. This will result in smaller PDF file sizes, at a cost of increased processing time.
Box/Graphics OptionsBy default, if an area of the document is identified as a graphic area, then no OCR processing is run on that area. However, certain documents may include areas or boxes that are identified as “graphics” or “picture” areas but do contain useful text.
To ensure that the OCR engine can be forced to process such areas there are two options:
- Treat all Graphics Areas as Text: This option will ensure the entire document is processed as text.
- Remove Box Lines in OCR Processing: This option is ideal for forms where sometimes boxes around text can cause an area to be identified as graphics. This option removes boxes from the temporary copy of the imaged used by the OCR engine. It does not remove boxes from the final image. Technically, this option removes connected elements with a minimum area (by default 100 pixels).
Line Removal in OCR ProcessingThis removes lines and boxes during OCR processing to improve recognition – particularly in cases where characters “touch” lines.
MRCThis enables Mixed Raster Compression which can dramatically reduce the output size of PDFs comprising Color scans.
Save Pre-DespeckleThis will use the original image (i.e., before applying pre-processing) in the output PDF. The default value is true.
StampNameThis has been deprecated, use the Stamp PDF Files step.
StampValueThis has been deprecated, use the Stamp PDF Files step.
Advanced FlagsCommand line flags to be passed through to the underlying executable.
Maximum CoresThis specifies the number of parallel files you want to be processed at a given time.
Note: This needs a multi-core license and the number of cores used will depend on the availability of cores.
DebugSet this to true to execute the step in debug mode.

Extended engine

ParameterNotes
Output File NameThe output filename excluding the extension (which will be added according to the output file type).
Output File TypeOne or more of the following, separated by commas if more than one is required.
- CSV*
- DOCX
- EPUB
- EXCELML *
- HTM
- OPENTXT
- PDF
- RTF
- TXT
- WORDML
- XLSX *
- XPS
*These output formats are suitable for table-oriented pages that can be mapped onto a spreadsheet format.
Create Folders If RequiredCreate an output folder if it does not exist. Default true.
OCR EngineThe OCR engine to use. This must be set to use the IRIS engine.
OCR Language 1-8You can set up to eight different languages for OCR recognition on one page, only if they are in the same character set. English is available as a language
Automatic language detectionProperty that enables or disables the Auto Language Detection feature. The aim of this feature is to detect the most probable language of a single-language page.
If at least one language has been detected, recognition will be performed in the first language candidate that has been detected, and not in the language(s) set through Language or Languages. If it fails to detect a language, recognition will be performed using the language(s) set through Language or Languages.
Auto rotateDetect page orientation and correct if required
DeskewRotates the image to correct its skew angle.
Advanced DeskewSet this to true to define advanced deskew properties.
Force DeskewUnder certain circumstances, rotating the image to correct its skew angle may decrease the OCR accuracy. The extended engine is able to analyze the image and detect from an OCR accuracy point of view whether it's better to rotate the image or not. Because the skew angle may be visible in the output document (e.g. if KeepDeskew is set to 'true'), you can choose to force the deskew to rotate the image, even if it affects the accuracy.
If turned off, the image is analyzed before rotation and the engine may choose not to rotate the image depending on the analysis result.
If turned on, the image is rotated to correct skew angle.
Adjustment ModeSet the behavior regarding dimension adjustment for deskew operation.
DespeckleRemoves all the groups of connected pixels with a few pixels below the parameter. Suggested range: 1-20.
Advanced DespeckleSet the advanced despeckle settings, advanced despeckle provides advanced image noise reduction features by the image despeckle filter.
Remove White PixelsBy default, Advanced Despeckle removes black pixels. If this setting is set to 'true', white pixels will be removed instead of black pixels.
DilateDespeckle removes all the groups of connected pixels with a few pixels below the SpeckleSize parameter. Those connected pixels are not removed if the distance to a larger connected component is below this parameter. As a result, only the isolated pixels get deleted. The maximum value for this property is 20 pixels.
The default value is '0'.
LayoutThe layout for the docx or rtf document:
- Standard
- Flow
PDFVersionThis determines the PDF version of the generated PDF:
- 1.4
- 1.5
- 1.6
- 1.7
- 1.7 Extension Level 3
- 1.7 Extension Level 5
- 1.7 Extension Level 8
- PDF/A-1a
- PDF/A-1b
- PDF/A-2a
- PDF/A-2b
- PDF/A-3a
- PDF/A-3b
Remove Blank PageSet this to true to remove blank pages from Tiff or PDF documents. Value needs to be set for sensitivity (see below).
SensitivityThe sensitivity, from 1 to 100. With high sensitivity, fewer blank pages are detected.
Work DepthThis parameter (0 – 255) defines how deeply the OCR engine will analyze a page with 255 being the deepest. For poorer quality documents, higher values can give better recognition results.
JPEG QualityThis parameter (0 – 255) determines the compression/quality of color JPEG images in generated PDFs. 0 gives the smallest file size whilst 255 gives the best quality. The default value is 128.
JPEG2000 CompressionEnable/Disable JPEG2000 Compression.
JPEG2000 Compression ModeThe JPEG2000 Compression Mode to use.
JPEG2000 Compression ValueThe Value to set for the selected Compression Mode.
IHQC CompressionApply Intelligent High-Quality Compression
IHQC Compression LevelLevel 1 is the basic compression level while level 3 is the most advanced Intelligent High-Quality Compression Mode.
IHQC Quality FactorThe quality Factor for IHQC
No OCRWhether are not to perform OCR on the document (Yes to not perform OCR, No to perform OCR).
BinarizationWhether or not to perform binarization on the document.
BrightnessThe brightness (higher values will make the result darker).
ContrastThe contrast (lower values will make the result darker).
Smoothing LevelSmoothing may be useful to binarize text with a colored background to avoid noisy pixels (0 disables smoothing, higher values smooth more).
UnditheringWhether or not to use automatic undithering while processing a page.
Note: Automatic undithering will be applied only if smoothing is also activated (Smoothing Level).
Dithering is a scanning technique which consists in representing a color or grayscale image using only a limited color palette. This allows reducing file size while maintaining the general aspect of the image. This technique is known to create images more difficult to handle for OCR technology; therefore specific image preprocessing is needed to detect and revert it.
ThresholdSets the threshold for fixed threshold binarization (0 for automatic threshold computation).
Remove LinesWhether or not to remove lines from an image (The image must be black and white).
Horizontal Clean XThe parameter for cleaning noisy pixels attached to the horizontal lines.
Horizontal Clean YThe parameter for cleaning noisy pixels attached to the horizontal lines.
Vertical Clean XThe parameter for cleaning noisy pixels attached to the vertical lines.
Vertical Clean YThe parameter for cleaning noisy pixels attached to the vertical lines.
Horizontal DilateThe dilate parameter that helps the detection of horizontal lines.
Vertical DilateThe dilate parameter that helps the detection of vertical lines.
Horizontal Max GapThe maximum horizontal line gap to close. It is useful to remove broken lines.
Vertical Max GapThe maximum vertical line gap to close. It is useful to remove broken lines.
Horizontal Max ThicknessThe maximum thickness of the horizontal lines to remove. It is useful to keep vertical lines larger than this parameter. Can be also useful to keep vertical letter strokes.
Vertical Max ThicknessThe maximum thickness of the vertical lines to remove. It is useful to keep horizontal lines larger than this parameter. Can be also useful to keep horizontal letter strokes.
Horizontal Min LengthThe minimum length of the horizontal lines to remove.
Vertical Min LengthThe minimum length of the vertical lines to remove.
Remove Dark BordersRemoves the dark surrounding from bitonal, grayscale or color images. The dark surrounding of the image is whitened (Note: The dark border should be touching the edge of the page for this to work).
Punch Hole RemovalAttempts to remove punch holes from pages.
Note: The punch hole algorithm can be used on images with the following minimum dimensions width: 300px, height: 100px (computed for 300 DPI). The minimum height and width can vary with the image resolution.
InterpolationInterpolates the source image to the given resolution. This value (the target resolution) must be greater than the source image's resolution.
Interpolation ModeSets the interpolation mode.
Keep Original ImageSet this to true if you want to use the pre-processed image for OCR but keep the original image in the output document. The default value is 'true'.
Keep Deskewed ImageSet this to true if you want to use the deskewed image in the output document.
Note: This property only applies when Keep Original Image is set to No
Keep Despeckled ImageSet this to true if you want to use the despeckled image in the output document. This requires the source image to be black and white.
Note: This property only applies when Keep Original Image is set to No
Keep Dark Border RemovalSet this to true if you want to use the image after dark borders have been removed, in the output document.
Note: This property only applies when Keep Original Image is set to No
Keep Punch Hole RemovalSet this to true if you want to use the image after punch holes have been removed, in the output document.
Note: This property only applies when Keep Original Image is set to No

PDF to searchable PDF

Creates a searchable PDF file from the set of images from an image-only PDF file.

Depending upon the Step Type Properties chosen, a separate text, HTML and Office files may be produced from the OCR process.

Standard engine

ParameterNotes
Output File NameTarget file template which can include %FILENAME (original filename without the extension) and %DIRNAME (directory name of the original file).
Create Directories if RequiredForce creation of any output directories if they do not already exist.
Continue on ErrorContinue processing TIFF files after an error occurs.
OCR- Choose “No” to generate an image-only PDF.
- Choose “Yes” to generate searchable PDF and/or text files.
OCR LanguageSelect the language the original file is written in. This will determine the dictionary that is used.
DeskewStraighten the image.
Auto-RotateAutomatically rotate pages so that text flows left to right.
DespeckleRemove specks below the specified pixel size from the image.
OCR to Text FileChoose “Yes” to Generate text Output.
Output File- Plain Text (txt)
- Plain Text (txt) No PDF
- MS Word (rtf)
- HTML
Non-Image PDFsThis allows control over the treatment of non-image PDFs, i.e. PDFs that have some text in them as well as images. The options are:
- OCR: The document will be OCRed using the image method defined by “Image Method”.
- Raise Error: The task will terminate with an error. If “On Error Continue” is set, this then behaves as Skip. This is the default.
- Skip: The document will not be processed.
- Pass Through: The file will not be processed, but a copy of the document will be made and named as if the processing had occurred.
Remove Hidden TextThis applies only when a PDF is being used as the source for OCR. When set to true this will not include any searchable text layers that already exist from the source document. Such functionality might be useful if the source document was created by OCR of an image only PDF or other image file and the quality of the text from the previous OCR is poor.
Note: There is no way to distinguish text added as a result of OCR from text added by other means and as a result, this option should be used with care.
Convert to TIFFChoose the method for PDF image extraction:
- No – (Native)
- Yes – (Convert to TIFF)
DPIWhen OCRing a PDF, the PDF is rasterized to produce a TIFF file which is then OCRed. By default, the TIFF image resolution is determined from the images embedded in the source PDF but this flag can be used to override default processing and specify the DPI of the TIFF that will be generated.
TIFF CompressionSets the Compression for the TIFF file used if the “Convert To TIFF” Option above is used.
- Auto (Selects Group 4 if the page is Black AND White else it uses LZW Compression)
- Group 4 (Black and White)
- LZW (Colored)
Retain MetadataCopy metadata from the source PDF to the Searchable result PDF.
Retain BookmarksCopy bookmarks from the source PDF to the Searchable result PDF.
Retain Viewer PreferencesRetains any PDF Viewer Preferences, Page Mode and Page Layout from the source file in the output when using Convert To TIFF=’Yes’.
PDF/A OptionsSelect the output PDF/A compliant version you would like the output PDF to be:
- PDF/A1-b
- PDF/A2-b
- PDF/A3-b
Validate PDF/AWhether or not to validate the PDF/A document after conversion.
Box/Graphics ProcessingBy default, if an area of the document is identified as a graphic area then no OCR processing is run on that area. However, certain documents may include areas or boxes that are identified as “graphics” or “picture” areas but that actually do contain useful text.
To ensure that the OCR engine can be forced to process such areas there are two options:
- Treat all Graphics Areas as Text: This option will ensure the entire document is processed as text.
- Remove Box Lines in OCR Processing: This option is ideal for forms where sometimes boxes around text can cause an area to be identified as graphics. This option removes boxes from the temporary copy of the imaged used by the OCR engine. It does not remove boxes from the final image. Technically, this option removes connected elements with a minimum area (by default 100 pixels).
Line Removal in OCR ProcessingThis removes lines and boxes during OCR processing to improve recognition – particularly in cases where characters “touch” lines.
JBIG2 CompressionThis option will compress bitonal images in generated PDFs using JBIG2 compression rather than the default Group 4 compression scheme. This will result in smaller PDF file sizes, at a cost of increased processing time.
MRC CompressionApplies Mixed Raster Compression which can drastically reduce the size of PDF documents.
Save Pre-DespeckleThis will use the original image (i.e. before applying pre-processing) in the output PDF. The default value is true.
StampNameThis has been deprecated, use the Stamp PDF Files step.
StampValueThis has been deprecated, use the Stamp PDF Files step.
Advanced FlagsCommand line flags to be passed through to the underlying executable.
Maximum CoresThis specifies the number of parallel files you want to be processed at a given time.
Note: This needs a multi-core license and the number of cores used will depend on the availability of cores.
Password FilesThis option specifies what DAS does when it encounters a password protected PDF file. The file will be copied to the password sub directory in the Error Folder:
- Take no action.
- Move to Error Folder
- Copy to Error Folder
DebugSet this to true to execute the step in debug mode.

Extended engine

ParameterNotes
Output File NameThe output filename excluding the extension (which will be added according to the output file type).
Output File TypeOne or more of the following, separated by commas if more than one is required:
- CSV*
- DOCX
- EPUB
- EXCELML*
- HTM
- OPENTXT
- PDF
- RTF
- TXT
- WORDML
- XLSX*
- XPS
*These output formats are suitable for table-oriented pages that can be mapped onto a spreadsheet format.
OCR EngineThe OCR engine to use. This must be set to use the IRIS engine.
OCR Language 1-8You can set up to eight different languages for OCR recognition in one page as long as they are in the same character set.
Automatic Language DetectionProperty that enables or disables the Auto Language Detection feature. The aim of this feature is to detect the most probable language of a single-language page.
If at least one language has been detected, recognition will be performed in the first language candidate that has been detected, and not in the language(s) set through Language or Languages. If it fails to detect a language, recognition will be performed using the language(s) set through Language or Languages.
Auto RotateDetect page orientation and correct if required
DeskewRotates the image to correct its skew angle.
Advanced DeskewSet this to true to define advanced deskew properties.
Force DeskewUnder certain circumstances, rotating the image to correct its skew angle may decrease the OCR accuracy. The extended engine is able to analyze the image and detect from an OCR accuracy point of view whether it's better to rotate the image or not. Because the skew angle may be visible in the output document (e.g. if KeepDeskew is set to 'true'), you can choose to force the deskew to rotate the image, even if it affects the accuracy.
If turned off, the image is analyzed before rotation and the engine may choose not to rotate the image depending on the analysis result.
If turned on, the image is rotated to correct skew angle.
Adjustment ModeSet the behavior regarding dimension adjustment for deskew operation.
DespeckleRemoves all the groups of connected pixels with a number of pixels below the parameter. Suggested range: 1-20.
Advanced DespeckleSet the advanced despeckle settings, advanced despeckle provides advanced image noise reduction features by the image despeckle filter.
Remove White PixelsBy default, Advanced Despeckle removes black pixels. If this setting is set to 'true', white pixels will be removed instead of black pixels.
DilateDespeckle removes all the groups of connected pixels with a few pixels below the SpeckleSize parameter. Those connected pixels are not removed if the distance to a larger connected component is below this parameter. As a result, only the isolated pixels get deleted. The maximum value for this property is 20 pixels.
The default value is '0'.
Retain BookmarkThis option allows you to retain the bookmarks in the new PDF if the old PDF was Converted to TIFF before it was OCRed.
Note: This will only work if Extract Images Method = Convert to TIFF.
Retain MetadataThis option allows you to retain the metadata in the new PDF if the old PDF was Converted to TIFF before it was OCRed.
Note: This will only work if Convert to TIFF = Yes.
LayoutThe layout for the docx or rtf document:
- Standard
- Flow
PDFVersionThis determines the PDF version of the generated PDF:
- 1.4
- 1.5
- 1.6
- 1.7
- 1.7 Extension Level 3
- 1.7 Extension Level 5
- 1.7 Extension Level 8
- PDF/A-1a
- PDF/A-1b
- PDF/A-2a
- PDF/A-2b
- PDF/A-3a
- PDF/A-3b
Note: This will only work if Extract Images Method = Convert to TIFF.
Extract Images MethodWhether to convert the images in a PDF document to TIFF or not:
- Convert to TIFF: The pages in the PDF document are rasterized and saved as TIFF images
- Native: This method places the OCRed text directly into a copy of the original PDF rather than creating an entirely new PDF.
Remove Blank PageSet this to true to remove blank pages from Tiff or PDF documents. Value needs to be set for sensitivity (see below).
SensitivityThe sensitivity, from 1 to 100. With high sensitivity, fewer blank pages are detected.
Work DepthThis parameter (0 – 255) defines how deeply the OCR engine will analyze a page with 255 being the deepest. For poorer quality documents, higher values can give better recognition results.
JPEG QualityThis parameter (0 – 255) determines the compression/quality of Color JPEG images in generated PDFs. 0 gives the smallest file size whilst 255 gives the best quality. The default value is 128.
JPEG2000 CompressionEnable/Disable JPEG2000 Compression.
JPEG2000 Compression ModeThe JPEG2000 Compression Mode to use.
JPEG2000 Compression ValueThe Value to set for the selected Compression Mode.
IHQC CompressionApply Intelligent High-Quality Compression
IHQC Compression LevelLevel 1 is the basic compression level while level 3 is the most advanced Intelligent High-Quality Compression Mode.
IHQC Quality FactorThe quality Factor for IHQC
BinarizationWhether or not to perform binarization on the document.
BrightnessThe brightness (higher values will make the result darker).
ContrastThe contrast (lower values will make the result darker).
Smoothing LevelSmoothing may be useful to binarize text with a colored background to avoid noisy pixels (0 disables smoothing, higher values smooth more).
UnditheringWhether or not to use automatic undithering while processing a page. NOTE: Automatic undithering will be applied only if smoothing is also activated (Smoothing Level).
Dithering is a scanning technique which consists in representing a color or grayscale image using only a limited color palette. This allows reducing file size while maintaining the general aspect of the image. This technique is known to create images more difficult to handle for OCR technology; therefore specific image preprocessing is needed to detect and revert it.
ThresholdSets the threshold for fixed threshold binarization (0 for automatic threshold computation).
Remove LinesWhether or not to remove lines from an image (The image must be black and white).
Horizontal Clean XThe parameter for cleaning noisy pixels attached to the horizontal lines.
Horizontal Clean YThe parameter for cleaning noisy pixels attached to the horizontal lines.
Vertical Clean XThe parameter for cleaning noisy pixels attached to the vertical lines.
Vertical Clean YThe parameter for cleaning noisy pixels attached to the vertical lines.
Horizontal DilateThe dilate parameter that helps the detection of horizontal lines.
Vertical DilateThe dilate parameter that helps the detection of vertical lines.
Horizontal Max GapThe maximum horizontal line gap to close. It is useful to remove broken lines.
Vertical Max GapThe maximum vertical line gap to close. It is useful to remove broken lines.
Horizontal Max ThicknessThe maximum thickness of the horizontal lines to remove. It is useful to keep vertical lines larger than this parameter. Can be also useful to keep vertical letter strokes.
Vertical Max ThicknessThe maximum thickness of the vertical lines to remove. It is useful to keep horizontal lines larger than this parameter. Can be also useful to keep horizontal letter strokes.
Horizontal Min LengthThe minimum length of the horizontal lines to remove.
Vertical Min LengthThe minimum length of the vertical lines to remove.
Remove Dark BordersRemoves the dark surrounding from bitonal, grayscale or color images. The dark surrounding of the image is whitened (Note: The dark border should be touching the edge of the page for this to work).
Punch Hole RemovalAttempts to remove punch holes from pages.
Note: The punch hole algorithm can be used on images with the following minimum dimensions width: 300px, height: 100px (computed for 300 DPI). The minimum height and width can vary with the image resolution.
InterpolationInterpolates the source image to the given resolution. This value (the target resolution) must be greater than the source image's resolution.
Interpolation ModeSets the interpolation mode.
Keep Original ImageSet this to true if you want to use the pre-processed image for OCR but keep the original image in the output document. The default value is 'true'.
Note: This property only applies when processing PDF files with the Convert To TIFF set to Yes.
Keep Deskewed ImageSet this to true if you want to use the deskewed image in the output document.
Note: This property only applies when Keep Original Image is set to No.
Keep Despeckled ImageSet this to true if you want to use the despeckled image in the output document. This requires the source image to be black and white.
Note: This property only applies when Keep Original Image is set to No.
Keep Dark Border RemovalSet this to true if you want to use the image after dark borders have been removed, in the output document.
Note: This property only applies when Keep Original Image is set to No.
Keep Punch Hole RemovalSet this to true if you want to use the image after punch holes have been removed, in the output document.
Note: This property only applies when Keep Original Image is set to No.

Merge TIFFs to PDF

This step first merges the input images in a folder into a multi-page PDF file, then performs an OCR on the file. Depending upon the Step Type Properties chosen, a separate text, HTML and Office files may be produced from the OCR process.

Standard engine

ParameterNotes
Output File NameTarget file template which can include %DIRNAME (directory name of the original file).
Create Directories if RequiredForce creation of any output directories if they do not already exist.
OCR OptionsChoose “No OCR” to generate an image-only PDF.
Choose “OCR” to generate searchable PDF and/or text files.
Continue on ErrorContinue processing TIFF files after an error occurs.
OCR LanguageSelect the language the original file is written in. This will determine the dictionary that is used.
DeskewStraighten the image.
Auto-RotateAutomatically rotate pages so that text flows left to right.
DespeckleRemove specks below the specified pixel size from the image.
Save Pre-DespeckleThis will use the original image (i.e. before applying pre-processing) in the output PDF. The default value is true.
Output PDFChoose “Yes” to Generate a PDF file.
Output TXTChoose “Yes” to generate a .txt file (only applicable if OCR is specified).
Output RTFChoose “Yes” to generate a .rtf file (only applicable if OCR is specified).
Output HTMLChoose “Yes” to generate a .htm file (only applicable if OCR is specified).
Advanced FlagsCommand line flags to be passed through to the underlying executable.
PDF/A OptionsSelect the output PDF/A compliant version you would like the output PDF to be:
- PDF/A1-b
- PDF/A2-b
- PDF/A3-b
Validate PDF/AWhether or not to validate the PDF/A document after conversion.

Convert any file to PDF

This converts any printable document to PDF, such as Microsoft Word, Excel, PowerPoint, HTML, etc. subject to the native application being available on the server. See ToPDF (BCL easyPDF) for more details.

ParameterNotes
Output File NameTarget file template which can include %FILENAME (original filename without the extension) and %DIRNAME (directory name of the original file).
Continue on ErrorContinue processing files after an error occurs.
Conversion Timeout (ms)Limits the amount of time in milliseconds that can be spent on conversion. A value of zero means there is no time limit.
Convert BookmarksFor MS Word, convert bookmarks.
Bookmark DepthThis property will take effect only when the Convert Bookmarks property is set to True. Numbers defining bookmark levels must be equal to or larger than one. Word style names must not repeat in the string. The string must not start or end with the delimiter. When this property is empty, the default style mapping (Heading one through nine will be mapped to level one through nine) will be used. Therefore, an empty string is functionally equivalent to
Heading 1 mapped to 1, Heading 2 mapped to 2, Heading 3 mapped to 3, Heading 4 mapped to 4, Heading 5 mapped to 5, Heading 6 mapped to 6, Heading 7 mapped to 7, Heading 8 mapped to 8, Heading 9 mapped to 9.
Note: If you use a non-English version of Microsoft Word, then you may need to replace the word "Heading" with its localized version.
Convert HyperlinksSets the flag to indicate whether to convert Word hyperlinks to PDF hyperlinks.
Print All Sheets (Excel)The flag that indicates whether to print all Excel worksheets or not.
Print Background Color (IE)For files printed via IE Sets the flag that indicates whether to print background color or not when printing.
Print Scale % (Visio)For Visio files, sets the print scale.
Header (IE)This property modifies Internet Explorer's header setting.
Footer (IE)This property modifies Internet Explorer's footer setting.
Image CompressionIf you want a lossless image compression, use PRN_IMAGE_COMPRESS_ZIP (ZIP compression).
Image DownsizingIf this property is set to Yes, then the resolution of images is reduced to the DPI value specified in the Downsize Resolution DPI property.
Downsize Resolution DPIIf the Image Downsizing property is set to True, then the resolution of images is reduced to the DPI value specified in this property.
Image JPEG QualityThe allowed value range is from 5 to 100 with 100 being the highest quality.
Font EmbeddingThe option PRN_FONT_EMBED_FULLSET (embedding a full set of fonts) will cause a significant increase in PDF file size, especially for CJK font, and therefore not recommended. If you need to embed the font, PRN_FONT_EMBED_SUBSET (embed subset of fonts) will be a better choice.
Font SubstitutionFor the PRN_FONT_SUBST_TABLE (use font substitution table) option, you need to configure the substitution table. The table is stored under the "Device Setting" section of the printer driver properties (can be accessed from the Control Panel).
Embed Fonts as Type 0This option is recommended if you have non-standard fonts like barcode font.
Top MarginSets top margin. (Inches)
Bottom MarginSets bottom margin. (Inches)
Left MarginSets left a margin. (Inches)
Right MarginSets right margin. (Inches)
Page WidthSets a custom page width. (Inches)
Page HeightSets a custom page height. (Inches)
Paper OrientationSets paper orientation to:
- Default (Maintain Source Orientation)
- Landscape
- Portrait
PDF ComplianceAllows the User To choose PDF/A or PDF/X Compliant files
- None (No PDF/A Output)
- PDF/A-1b (PDF/A-1b compliant)
- PDF/X-1a (PDF/X-1a compliant)
- PDF/X-3 (PDF/X-3 compliant)
Convert MSG AttachmentsIf you set this to true, DAS will convert both MSG files and their Attachments to a single PDF file.
Attach MSG Attachments to PDFIf set to true, DAS will Attach Msg Attachments that are converted as PDF Attachments.
If set to false, DAS will merge Msg Attachments that are converted to the PDF file generated by the body.
Preserve Word AttachmentsDetermines whether embedded and linked files will be preserved during conversion. Default value: False (disabled).
Note: This will work with WordExtensionEX only.
Convert PDF Attachments (PDF)Convert PDF Attachments to create a combined PDF file.
Merge PDF Attachments (PDF)Set this flag to true if you want to convert pdf attachments and merge them into the output pdf file. Otherwise, the converted files will be merged back to the pdf.
Retain PDF Attachment (PDF)Switch this on to retain the original PDF attachments if you set Merge PDF Attachments to true.
Retain Properties (Office)Set this flag if you want the MS Office properties to be transferred to the target pdf document.
Color Type (PowerPoint)Use this property to set PowerPoint to print with either color, grayscale, or black and white.
Handout Order (PowerPoint)Sets the handout order, this flag only applies to PowerPoint jobs. The possible values are:
- Vertical First
- Horizontal First
Output Type (PowerPoint)Sets the output type, it only works with the PowerPoint files. The possible values are:
- Slides
- Build slides
- Two slides handouts
- Three slides handouts
- Four slides handouts
- Six slides handouts
- Nine slides handouts
- Notes
- Outline
Print Graphics (Publisher)Sets the graphics setting for printing:
- Print Full Resolution
- Print Low Resolution
- Print Graphics
Frame Slides (PowerPoint)Indicate whether to draw a frame around the border of the slides.
Zoom (Excel)Sets printing zoom of the worksheet. The allowed value range is from 10 to 400.
Fit to Pages Wide (Excel)Sets number of pages wide the worksheet will be scaled to. This property is ignored if the Zoom property is set.
Fit to Pages Tall (Excel)Sets number of pages tall the worksheet will be scaled to. This property is ignored if the Zoom property is set.
Include Document MarkupsDetermines whether document markups are retained.
When this property is False (the default), document markups are omitted.
When this property is True, markups are included.
Advanced FlagsCommand line flags to be passed through to the underlying executable.
Maximum CoresThe number of parallel files DAS will attempt to process at the same time.
Password FilesThis option specifies what DAS does when it encounters a password protected PDF file. The file will be copied to the password sub directory in the Error Folder.
- Take no action.
- Move to Error Folder
- Copy to Error Folder
DebugSet this to true to execute the step in debug mode.

Set PDF properties

This is used to set PDF Metadata properties (such as Author, Title, etc.), Security settings and Document Display properties.

ParameterNotes
Output File NameTarget file template which can include %FILENAME (original filename without the extension), %DIRNAME (directory name of the original file), %UNIQUEn (e.g. %UNIQUE4 for 4 digits), %BOOKMARK and %PAGEn (e.g. %PAGE4 for 4 digits).
Encryption StrengthMust be set to 128 bits if security attributes are to be set.
User PasswordA password that will be required to open the document.
Owner PasswordA password that will be required to change the document permissions.
Allow PrintingAllow high-quality printing.
Allow Modify ContentsAllow assembly and other document modifications.
Allow CopyAllow text and graphics copying and extraction.
Allow Modify AnnotationsAllow modification of annotations.
Allow FillingAllow filling of form fields.
Allow Screen ReadersAllow extraction of text and graphics in support of accessibility.
Allow AssemblyAllow rotation, insertion or deletion of pages.
Allow Degraded PrintingAllow low-quality printing.
AuthorSets the Author property.
TitleSets the Title property.
SubjectSets the Subject property.
KeywordsSets the Keywords property.
CreatorSets the Creator property.
Page LayoutThe setting for the initial document page display.
Page ModeThe setting for initial viewer mode.
Non-Full Screen ModeOnly applicable where Page Mode=Full Screen. The setting for document page display when exiting Full-Screen mode.
Hide Menu BarThe viewer's menu bar will be hidden.
Hide Window UIThe viewer's UI elements (scrollbars etc.) will be hidden.
Hide Tool BarThe viewer's toolbar will be hidden.
Fit WindowThe viewer will resize the document's window to fit the size of the first displayed page.
Center WindowThe document window will be positioned in the center of the screen.

Custom script

This can be used to support a custom scripted step in the process. See Scripting Custom Steps for more details.

ParameterNotes
Custom Script FileName of the custom script file to be run located in the DAS custom folder.
Job ID(Optional) Will send an additional flag with the jobdef file location. For example, a value of 1024 will give the flag "/jobdef:C:\Aquaforest\Autobahn DX/jobdef/1024.xml" given that DAS is installed on the default C drive location.

Stamp PDF files

This step can be used to add stamps to PDF pages, we have given the user the ability to customize these stamps extensively in a very simple manner. See the step properties below.

ParameterNotes
Output File NameTarget file template which can include %FILENAME (original filename without the extension), %DIRNAME (directory name of the original file).
Stamp OperationDAS has different ways to apply stamps to a page, this gives the user some level of flexibility.
- StampTextAsString: When this operation has selected the text passed as the StampObject will be stamped on the PDF document as text.
- StampPDFText: When this operation is selected the text passed as the StampObject will be stamped on the PDF document as an image.
- StampPageNumber: When this operation is selected, every page in the PDF file will be stamped with a page number, starting from the start number. For example, if StartNumber = 6 the first-page number will start from 6.
- StampPageNumberBates: When this operation is selected, every page in the PDF file will be stamped with a bate number, starting from the start number. For example, if StartNumber = 6 the first-page number will start from 000006.
- StampVariable: This option allows a user to specify a variable like a date, filename or time. The variable specified by the StampObject will be stamped on the document. Check the table below for different Stamp variables provided.
- StampPDFImage: When this operation is selected the text passed as the StampObject is the address of the image to be stamped on the PDF document.
Stamp PlacementThe property specifies the location in a page a stamp can be placed. Below is a list of options available.
- Bottom Center
- Bottom Left
- Bottom Right
- Center
- Center Left
- Center Right
- Top Center
- Top Left
- Top Right
Stamp DirectionThis represents the direction of the stamp on the output PDF.
- Normal
- Diagonal Up
- Diagonal Down
Stamp TextEnter any static text to be stamped on a PDF page, this works with the StampPDFText stamp operation.
Stamp VariableEnter a stamp variable to be stamped on a PDF page, this works with the StampVariable stamp operation. See "Stamp Variables" table below for more details.
Image PathThe path to the image if you are using the StampPDFImage operation.
Page RangeSet of page ranges separated by commas that define which pages from the original should be stamped. Using * or leaving it blank will process all pages.
Start NumberThe number that the page numbering will start with, works with StampPageNumber and StampPageNumberBates.
Start PageSpecifies the page that the stamping should start.
End PageSpecifies the page that the stamping should stop.
Bates PrefixSpecifies the prefix of the Bates stamp.
Bates SuffixSpecifies the suffix of the Bates stamp.
Bates LengthSpecifies the length of the Bates stamp.
Stamp ColorThe color of non-image stamps. Enter a valid color name or black will be used.
Stamp OpacityThe opacity of non-image stamps. Enter a valid color name or black will be used.
Font NameThe font name of non-image stamps. Choose the font you want from a drop-down list of different fonts.
Font SizeThe font size of non-image stamps, default value = 20.
Stamp Text as ImageSet this to Yes if you want DAS to convert text-based stamps to images before applying it to the PDF page.
Image Background ColorWhen you set Stamp Text as Image to yes, use this property to set the background color of the image(rectangle) that the text is converted to.
Maximum CoresThe number of parallel files DAS will attempt to process at the same time.
Password FilesThis option specifies what DAS does when it encounters a password protected PDF file. The file will be copied to the password sub directory in the Error Folder.
- Take no action
- Move to Error Folder
- Copy to Error Folder
DebugSet this to true to execute the step in debug mode.

Stamp variables

The table below shows different Stamp variables supported by DAS. The idea is that DAS will replace an occurrence of the variable with the appropriate value in a text string before applying the stamp. For example, to Stamp Today is Monday on a PDF page, use the following Stamp variable “Today is %A”.

| Variable | Stamp | | -------- | ------------------------------------------------------ | --- | | %a | Short Day (Mon) | | %A | Long Day (Monday) | | %b | Short Month (Jan) | | %B | Long Month (January) | | %c | Date and time (30 October 2013 17:21) | | %C | Date and Time with seconds (30 October 2013 17:21:50) | | %d | Month and Year (October 2013) | | %D | Day and Month (30 October) | | %e | Short Year (13) | | %E | Long Year (2013) | | %f | Short Time of Day (17:21) | | %F | Time of Day with Seconds (17:21:20) | | %G | Full Date and time (Wed, 30 October 2013 17:21:50 GMT) | | %Y | File Name |     |

Merge PDF

Merges a folder of PDF files into a single file.

ParameterNotes
Output File NameTarget file template which can include %DIRNAME (directory name of the original file).
Create Directories if RequiredForce creation of any output directories if they do not already exist.
Retain BookmarksGenerated files will include bookmarks from the original file.
Retain MetadataGenerated files will include metadata (such as Author and Title) from the original file.
File Names as BookmarksGenerate bookmarks in the output PDF using filenames of source PDF files.
Continue on ErrorContinue processing if an error occurs.
Advanced FlagsCommand line flags to be passed through to the underlying executable.
Maximum CoresThe number of parallel files DAS will attempt to process at the same time.
Password FilesThis option specifies what DAS does when it encounters a password protected PDF file.
- Take no action.
- Move to Error Folder
- Copy to Error Folder
DebugSet this to true to execute the step in debug mode.

Split PDF

Splits each input PDF file into a set of files, either a single page per file or by page ranges.

ParameterNotes
Output File NameThe target file template which can include %UNIQUEn (a unique number starting at 1, zero padded to n digits) %FILENAME (original filename without the extension) and %DIRNAME (directory name of the original file).
Create Directories if RequiredForce creation of any output directories if they do not already exist.
Retain BookmarksGenerated files will include bookmarks from the original file.
Retain MetadataGenerated files will include metadata (such as Author and Title) from the original file.
Split Type- Single Pages: Splits the file into single pages.
- Page Ranges: Splits the file based on the range
- Repeated Ranges: Splits the file based on the range and the repeated range.
- Bookmarks: Splits the file based on the original bookmarks.
Ranges (e.g. 1,3-10)Set of page ranges separated by commas that define which pages from the original should be extracted.
Repeat Every (Pages)Apply the page range to each set of Page Ranges within the document. For example, if 2-4 is specified for page ranges, and 4 is specified as the repeating range, then the range is re-applied every 4 pages.
Continue on ErrorContinue processing if an error occurs.
Advanced FlagsCommand line flags to be passed through to the underlying executable.
Maximum CoresThe number of parallel files DAS will attempt to process at the same time.
Password FilesThis option specifies what DAS does when it encounters a password protected PDF file. The file will be copied to the password sub directory in the Error Folder.
- Take no action.
- Move to Error Folder
- Copy to Error Folder
DebugSet this to true to execute the step in debug mode.

Merge TIFFs

Merges a folder of TIFF files into a single file.

ParametersNotes
Output File NameTarget file template which can include %DIRNAME (directory name of the original file)
Create Directories if RequiredForce creation of any output directories if they do not already exist.
Advanced FlagsCommand line flags to be passed through to the underlying executable.
Maximum CoresThe number of parallel files DAS will attempt to process at the same time.
Continue on ErrorContinue processing if an error occurs.
DebugSet this to true to execute the step in debug mode.

Split TIFF

Splits each input TIFF file into a set of files, either a single page per file or by page ranges.

ParametersNotes
Output File NameThe target file template which can include %UNIQUEn (a unique number starting at 1, zero padded to n digits) %FILENAME (original filename without the extension) and %DIRNAME (directory name of the original file).
Create Directories if RequiredForce creation of any output directories if they do not already exist.
Split Type- Single Pages: Splits the file into single pages
- Page Ranges: Splits the file based on the range
- Repeated Ranges: Splits the file based on the range and the repeated range
Ranges (e.g. 1,3-10)Set of page ranges separated by commas that define which pages from the original should be extracted.
Repeat Every (Pages)Apply the page range to each set of Page Ranges within the document. For example, if 2-4 is specified for page ranges, and 4 is specified as the repeating range, then the range is re-applied every 4 pages.
Advanced FlagsCommand line flags to be passed through to the underlying executable.
Maximum CoresThe number of parallel files DAS will attempt to process at the same time.
Continue on ErrorContinue processing if an error occurs.
DebugSet this to true to execute the step in debug mode.

Read inbox

This can read mailboxes and extract attachments using IMAP4 or OAuth2 (Modern) Authentication, in accordance with the parameters specified below. Use of this step type requires a Server License.

Check with your System Administrator and ensure the following for IMAP4:

  • IMAP4 is enabled for the mail server and your account.

  • You have the IMAP address of the mail server.

For OAuth2, you require an access token from the Microsoft Identity Platform, which will supply you with the credentials to use our email steps with Modern Authentication.

Note: The files will be downloaded in the following format, name@timestamp@[email protected] where:

  • name = Filename

  • timestamp= Date of the email

  • email= ‘From’ address

Example: file1@[email protected]@[email protected]

ParameterNotes
Authentication ModeChoose between IMAP and Modern Authentication
IMAP ServerThe IMAP server address e.g. imap.company.co.uk
Require AuthenticationIf anonymous authentication is set up on your server, a username and password is not needed when setting this option to ‘No’
UsernameThe username for the account to access the IMAP server
PasswordPassword for the account. This is held encrypted
Azure Client IDThe Client ID for OAuth2 Authentication
Azure TenantThe Tenant for OAuth2 Authentication
Azure AD InstanceThe address of the Azure AD Instance. For example, https://login.microsoftonline.com
Credential TypeSelect the credential type for OAuth2 Authentication. The options are Client Secret or Certification.
Client SecretThe client secret generated by Azure
Certificate PathThe path to the certificate generated by Azure
Certificate PasswordThe password of the certificate generated by Azure
Source Email AccountThe email account to be read. For example, [email protected]
MailboxMailbox to read. For example, Inbox
Processed MailboxMailbox to move processed email to. For example, Deleted Items. If left blank, the emails will be left in the inbox which can be useful for testing
Output TemplateThe template for the name of the output file. This can include %FILENAME% for the original filename, %TIMESTAMP% for the job timestamp, and %FROMADDRESS% for the ‘From’ email address
IncludeRegular expression. If specified, only files matching the expression will be processed. For example, *.tif. This allows alternate jobs to be created for different file types
ExcludeRegular expression. If specified, files matching the expression will not be processed. For example, *.pdf
Subject FilterDAS will only download attachments from email with the subject filter in their subject
DebugSet this to true to execute the step in debug mode

Send documents

Use of this step type requires a Server License. Attachment limit is 50MB but email provider’s limits are normally lower.

Note: The input file of this step must be in the format of name@timestamp@[email protected]

where:

  • name = Filename

  • timestamp= date of the email

  • email= the address where we will send the output files

Example: file1@[email protected]@[email protected]

ParameterNotes
Authentication ModeChoose between SMTP and Modern Authentication
DomainThe sending domain. For example, nutrient.io
SMTP ServerSMTP Server address. For example, smtp.nutrient.io
Require AuthenticationIf anonymous authentication is set up on your server, a username and password is not needed when setting this option to ‘No’
UsernameThe username for the account to access the SMTP server
PasswordPassword for the account. This is held encrypted
Azure Client IDThe Client ID for OAuth2 Authentication
Azure TenantThe Tenant for OAuth2 Authentication
Azure AD InstanceThe address of the Azure AD Instance. For example, https://login.microsoftonline.com
Credential TypeSelect the credential type for OAuth2 Authentication. The options are Client Secret or Certification
Client SecretThe Client secret generated by Azure
Certificate PathThe path to the certificate generated by Azure
Certificate PasswordThe password of the certificate generated by Azure
Sender NameName of the sending user. For example, John
From Email AddressSending user. For example, [email protected]
CC AddressesEmail list of CC’d email addresses. Separate addresses with a comma. For example, [email protected], [email protected]
BCC AddressesEmail list of Bcc’d email addresses. Separate addresses with a comma. For example, [email protected], [email protected]
Email TitleThe title of the Email
Email BodyThe body of the Email
Allow Multiple AttachmentsBy default, DAS sends files as individual emails. If set to ‘Yes’ DAS will try to group files by destination and send multiple files in one email
Attachment Number LimitSetting this number limits the number of files that can be attached to one email sent by Autobahn
Attachment Total Size LimitIn MB. This value limits the total size of all the files sent in each individual email by Autobahn
Use Original FilenameInput filenames must fit a specific format. Select true if you want the final attachment to revert to its original name
DebugSet this to true to execute the step in debug mode

Convert PDF to TIFF

Rasterizes a PDF file, converting into a multi-page TIFF file.

ParameterNotes
Output File NameTarget file template which can include %FILENAME (original filename without the extension)
CompressionGroup 4 (For bitonal images) or LZW (for color).
ResolutionThe DPI of the resulting TIFF File.
Continue on ErrorContinue processing if an error occurs.
Advanced FlagsCommand line flags to be passed through to the underlying executable.
Maximum CoresThe number of parallel files DAS will attempt to process at the same time.
Password FilesThis option specifies what DAS does when it encounters a password protected PDF file. The file will be copied to the password sub directory in the Error Folder.
- Take no action
- Move to Error Folder
- Copy to Error Folder
DebugSet this to true to execute the step in debug mode.

Extract text from PDF

Extracts the raw text from a searchable PDF.

Note:

  • This does not perform an OCR process, it just extracts the existing text from the PDF file.
  • There is a GDPicture based step (PDF to Text).
ParameterNotes
Output File NameTarget file template which can include %FILENAME (original filename without the extension).
Continue on ErrorContinue processing if an error occurs.
Page FromThe start of the range of pages from which to extract text. If not specified, a start page of 1 is assumed.
Page ToThe end of the range of pages from which to extract text. If not specified, the last page is assumed.
Page SeparatorThis allows the definition of an optional page separator string in the output text file.
Page Separator PlacementSpecifies whether the Page Separator will appear at the beginning or the end of the page.
Extract Text EngineThe Extract Text Engine to use:
- 0 = PDFBox with Formatting
- 1 = BCL
- 2 = PDFBox
Copy Input PDF to Target FolderSet to true if you want DAS to copy the input PDF file to the target folder.
Maximum CoresThe number of parallel files DAS will attempt to process at the same time.
Password FilesThis option specifies what DAS does when it encounters a password protected PDF file. The file will be copied to the password sub directory in the Error Folder.
- Take no action
- Move to Error Folder
- Copy to Error Folder
DebugSet this to true to execute the step in debug mode.

SharePoint download

This step downloads documents from the specified SharePoint document library ready for processing.

ParameterNotes
SharePoint Site URLSite, the URL of the SharePoint site that you want to access. For example, http://localhost/testsite
SharePoint Online (Office 365)Whether or not the upload location is in SharePoint Online (Office 365).
Use ADFSSwitch this on if you use Active Directory for your SharePoint User Management.
UsernameThe username used to connect to the SharePoint site. Leave empty to use Windows Credentials (for local SharePoint only).
PasswordThe password used to connect to the SharePoint site. Leave empty to use Windows Credentials (for local SharePoint only).
ADFS HostProvide the name of the Active Directory server.
ADFS Relying Party IdentifierProvide the Relying Party Trust identifier for your SharePoint.
SharePoint LibraryLibrary, the name of the library that you want to access. For example, "Test Library"
SharePoint Sub FolderDownload documents from the specified subfolder in the SharePoint library only.
Extension FilterAn optional extension mask that limits those files to manipulate. For example, “pdf,tiff”
Recurse SharePoint LibraryIf set to “Yes” sub-folders of the SharePoint Library are handled.
Include PatternDAS will only include the files that match this pattern.
Exclude PatternAny file that matches this pattern will be excluded.
DebugSet to “Yes” to see more processing information on the console.
Continue on ErrorContinue processing if an error occurs.

SharePoint upload

This step uploads documents to the specified SharePoint document library.

ParameterNotes
SharePoint Site URLThe URL of the SharePoint site that you want to access. For example, http://localhost/testsite
SharePoint Online (Office 365)Whether or not the upload location is in SharePoint Online (Office 365).
Use ADFSSwitch this on if you use Active Directory for your SharePoint User Management.
UsernameThe username used to connect to the SharePoint site.
PasswordThe password used to connect to the SharePoint site.
ADFS HostProvide the name of the Active Directory server.
ADFS Relying Party IdentifierProvide the Relying Party Trust identifier for your SharePoint.
SharePoint LibraryThe name of the library that you want to access. For example, "Test Library"
SharePoint Sub FolderThe subfolder inside the SharePoint library to upload the files into. The subfolder should be present in the library or else the following message will be displayed:
“The remote server returned an error: (409) Conflict.”
Extension FilterAn optional extension mask that limits those files to manipulate. For example, “pdf,tiff”
Recurse Source FolderRecurse the source folder and its subfolders for files to upload and create the folders in SharePoint if they do not already exist.
Note: If “Use Work Folders” is checked, then “Process Sub-Folders” must also be checked for this to work.
Create Directories if RequiredForce creation of any output directories if they do not already exist.
Include PatternOnly files that match this pattern will be included.
Exclude PatternAny file that matches this pattern will be excluded.
DebugIf set to “Yes” the user will see more processing information on the console.
Continue on ErrorContinue processing if an error occurs.

Azure storage download

This step will download files to your local machine from an Azure storage Container.

ParameterNotes
Storage Account NameThe name of the Azure storage account you want to download files from.
Azure Account KeyKey 1 under the accesskeys section of the storage account in the portal.
Container NameThe name of the Azure blob container you want to download files from.
Extension FilterFile extension filters separated by commas. For example, .tif,.pdf
Recurse Azure StorageDownload documents from folders and subfolders in the SharePoint Library
DebugIf set to “Yes” the user will see more processing information on the console.

Azure storage upload

This step will upload files from your local machine to an Azure storage Container.

ParameterNotes
Storage Account NameThe name of the Azure storage account you want to upload files to.
Azure Account KeyKey 1 under the accesskeys section of the storage account in the portal.
Container NameThe name of the Azure blob container you want to upload files to.
Extension FilterFile extension filters separated by commas. For example, .tif,.pdf
Recurse Local FolderUpload documents from folders and subfolders of the local folder.
Replace Invalid Characters WithA pattern to replace any invalid character Windows File Storage in the file name before downloading. Invalid characters are: " * : \ < > ?
DebugIf set to “Yes” the user will see more processing information on the console.

Create XML property file

This step takes a PDF input file and generates an XML output file.

ParameterNotes
Copy the Source PDF to Target FolderSet to true if you want DAS to copy the input PDF file to the target folder.
Continue on ErrorContinue processing files after an error occurs.
DebugSet this to true to execute the step in debug mode.

Optimize PDF

This allows the creation of Web Optimized (Linearize) PDFs.

ParameterNotes
Linearize – Fast Web ViewSet to true to Linearize a PDF file.
Continue on ErrorContinue processing files after an error occurs.
Maximum CoresThe number of parallel files DAS will attempt to process at the same time.
Password FilesThis option specifies what DAS does when it encounters a password protected PDF file. The file will be copied to the password sub directory in the Error Folder.
- Take no action
- Move to Error Folder
- Copy to Error Folder
DebugSet this to true to execute the step in debug mode.

OCR any file to PDF

This step attempts to convert all files to searchable PDFs, DAS may have the following OCR engines.

  • Standard Engine

  • GdPicture Engine

  • Extended Engine

See Standard OCR vs Extended OCR(opens in a new tab) for the differences.

Standard engine

ParameterNotes
General Settings
Output File NameTarget file template which can include %FILENAME (original filename without the extension) and %DIRNAME (directory name of the original file).
Create Directories if RequiredForce creation of any output directories if they do not already exist.
Continue on ErrorContinue processing TIFF files after an error occurs.
Overwrite ExistingOverwrites the target document if it exists.
Advanced FlagsCommand line flags to be passed through to the underlying executable.
Password FilesThis option specifies what DAS does when it encounters a password protected PDF file. The file will be copied to the password sub directory in the Error Folder.
- Take no action.
- Move to Error Folder
- Copy to Error Folder
Maximum CoresThis specifies the number of parallel files you want to be processed at a given time.
Note: You need the Multicore license for this.
DebugSet this to true to execute the step in debug mode.
Standard OCR Settings
OCR LanguageSelect the language the original file is written in. This will determine the dictionary that is used.
DeskewStraighten the image.
Auto-RotateAutomatically rotate pages so that text flows left to right.
DespeckleRemove specks below the specified pixel size from the image.
OCR to Text FileChoose “Yes” to Generate text Output.
Output File- Plain Text (txt)
- Plain Text (txt) No PDF
- MS Word (rtf)
- HTML
Non-Image PDFsThis allows control over the treatment of non-image PDFs. For example, PDFs that have some text in them as well as images. The options are:
- OCR: The document will be OCRed using the image method defined by “Image Method”.
- Raise Error: The task will terminate with an error. If “On Error Continue” is set this then behaves as Skip. This is the default.
- Skip: The document will not be processed.
- Pass Through: The file will not be processed, but a copy of the document will be made and named as if the processing had occurred.
Remove Hidden TextThis applies only when a PDF is being used as the source for OCR. When set to true this will not include any searchable text layers that already exist from the source document. Such functionality might be useful if the source document was created by OCR of an image only PDF or other image file and the quality of the text from the previous OCR is poor.
Note: There is no way to distinguish text added as a result of OCR from text added by other means and as a result, this option should be used with care.
Convert to TIFFChoose the method for PDF image extraction.
- No: (Native)
- Yes: (Convert to TIFF)
DPIWhen OCRing a PDF, the PDF is rasterized to produce a TIFF file which is then OCRed. By default, the TIFF image resolution is determined from the images embedded in the source PDF but this flag can be used to override default processing and specify the DPI of the TIFF that will be generated.
TIFF CompressionSets the Compression for the TIFF file used if the “Convert To TIFF” Option above is used.
- Auto (Selects Group 4 if the page is Black AND White else it uses LZW Compression)
- Group 4 (Black and White)
- LZW (Colored)
Retain MetadataCopy metadata from the source PDF to the Searchable result PDF.
Retain BookmarksCopy bookmarks from the source PDF to the Searchable result PDF.
Retain Viewer PreferencesRetains any PDF Viewer Preferences, Page Mode and Page Layout from the source file in the output when using Convert To TIFF='Yes'.
PDF/A OptionsSelect the output PDF/A compliant version you would like the output PDF to be.
- PDF/A1-b
- PDF/A2-b
- PDF/A3-b
Validate PDF/AWhether or not to validate the PDF/A document after conversion.
Box/Graphics ProcessingBy default, if an area of the document is identified as a graphic area then no OCR processing is run on that area. However, certain documents may include areas or boxes that are identified as “graphics” or “picture” areas but that actually do contain useful text.
To ensure that the OCR engine can be forced to process such areas there are two options:
- Treat all Graphics Areas as Text: This option will ensure the entire document is processed as text.
- Remove Box Lines in OCR Processing: This option is ideal for forms where sometimes boxes around text can cause an area to be identified as graphics. This option removes boxes from the temporary copy of the imaged used by the OCR engine. It does not remove boxes from the final image. Technically, this option removes connected elements with a minimum area (by default 100 pixels).
Line Removal in OCR ProcessingThis removes lines and boxes during OCR processing to improve recognition – particularly in cases where characters “touch” lines.
JBIG2 CompressionThis option will compress bitonal images in generated PDFs using JBIG2 compression rather than the default Group 4 compression scheme. This will result in smaller PDF file sizes, at a cost of increased processing time.
MRC CompressionApplies Mixed Raster Compression which can drastically reduce the size of PDF documents.
Save Pre-DespeckleThis will use the original image (i.e. before applying pre-processing) in the output PDF. The default value is true.
StampNameThis has been deprecated, use the Stamp PDF Files step.
StampValueThis has been deprecated, use the Stamp PDF Files step.
Any File To PDF Conversion Settings
Conversion Timeout (ms)Limits the amount of time in milliseconds that can be spent on conversion. A value of zero means waits indefinitely.
Convert BookmarksFor MS Word, convert bookmarks
Bookmark DepthThis property will take effect only when the Convert Bookmarks property is set to True. Numbers defining bookmark levels must be equal to or larger than one. Word style names must not repeat in the string. The string must not start or end with the delimiter. When this property is empty, the default style mapping (Heading one through nine will be mapped to level one through nine) will be used. Therefore, an empty string is functionally equivalent to
Heading 1 mapped to 1, Heading 2 mapped to 2, Heading 3 mapped to 3, Heading 4 mapped to 4, Heading 5 mapped to 5, Heading 6 mapped to 6, Heading 7 mapped to 7, Heading 8 mapped to 8, Heading 9 mapped to 9.
Note: If you use a non-English version of Microsoft Word, then you may need to replace the word "Heading" with its localized version.
Convert HyperlinksSets the flag to indicate whether to convert Word hyperlinks to PDF hyperlinks.
Print All Sheets (Excel)The flag that indicates whether to print all Excel worksheets or not.
Print Background Color (IE)For files printed via IE Sets the flag that indicates whether to print background color or not when printing.
Print Scale % (Visio)For Visio files, sets the print scale.
Header (IE)This property modifies Internet Explorer's header setting.
Footer (IE)This property modifies Internet Explorer's footer setting.
Image CompressionIf you want a lossless image compression, use PRN_IMAGE_COMPRESS_ZIP (ZIP compression).
Image DownsizingIf this property is set to Yes, then the resolution of images is reduced to the DPI value specified in the Downsize Resolution DPI property.
Downsize Resolution DPIIf the Image Downsizing property is set to True, then the resolution of images is reduced to the DPI value specified in this property.
Image JPEG QualityThe allowed value range is from 5 to 100 with 100 being the highest quality.
Font EmbeddingThe option PRN_FONT_EMBED_FULLSET (embedding a full set of fonts) will cause a significant increase in PDF file size, especially for CJK font, and therefore not recommended. If you need to embed the font, PRN_FONT_EMBED_SUBSET (embed subset of fonts) will be a better choice.
Font SubstitutionFor the PRN_FONT_SUBST_TABLE (use font substitution table) option, you need to configure the substitution table. The table is stored under the "Device Setting" section of the printer driver properties (can be accessed from the Control Panel).
Embed Fonts as Type 0This option is recommended if you have non-standard fonts like barcode font.
Top MarginSets top margin. (Inches)
Bottom MarginSets bottom margin. (Inches)
Left MarginSets left margin. (Inches)
Right MarginSets right margin. (Inches)
Page WidthSets a custom page width. (Inches)
Page HeightSets a custom page height. (Inches)
Paper OrientationSets paper orientation to
- Default (Maintain Source Orientation)
- Landscape
- Portrait
PDF ComplianceAllows the User To choose PDF/A or PDF/X Compliant files:
- None (No PDF/A Output)
- PDF/A-1b (PDF/A-1b compliant)
- PDF/X-1a (PDF/X-1a compliant)
- PDF/X-3 (PDF/X-3 compliant)
Convert MSG AttachmentsIf you set this to true, DAS will convert both MSG files and their Attachments to a single PDF file.
Attach MSG Attachments to PDFIf set to true, DAS will Attach Msg Attachments that are converted as PDF Attachments.
If set to false, DAS will merge Msg Attachments that are converted to the PDF file generated by the body.
Preserve Word AttachmentsDetermines whether embedded and linked files will be preserved during conversion. Default value: False (disabled).
Note: This will work with WordExtensionEX only
Convert PDF Attachments (PDF)Convert PDF Attachments to create a combined PDF file.
Merge PDF Attachments (PDF)Set this flag to true if you want to convert pdf attachments and merge them into the output pdf file. Otherwise, the converted files will be merged back to the pdf.
Retain PDF Attachment (PDF)Switch this on to Retain the Original PDF attachments if you set Merge PDF Attachments to true.
Retain Properties (Office)Set this flag if you want the MS Office properties to be transferred to the target pdf document.
Color Type (PowerPoint)Use this property to set PowerPoint to print with either color, grayscale, or black and white.
Handout Order (PowerPoint)Sets the handout order, this flag only applies to PowerPoint jobs. The possible values are:
- Vertical First
- Horizontal First
Output Type (PowerPoint)Sets the output type, it only works with the PowerPoint files. The possible values are:
- Slides
- Build slides
- Two slides handouts
- Three slides handouts
- Four slides handouts
- Six slides handouts
- Nine slides handouts
- Notes
- Outline
Print Graphics (Publisher)Sets the graphics setting for printing.
- Print Full Resolution
- Print Low Resolution
- Print Graphics
Frame Slides (PowerPoint)Indicate whether to draw a frame around the border of the slides.
Zoom (Excel)Sets printing zoom of the worksheet. The allowed value range is from 10 to 400.
Fit to Pages Wide (Excel)Sets number of pages wide the worksheet will be scaled to. This property is ignored if the Zoom property is set.
Fit to Pages Tall (Excel)Sets number of pages tall the worksheet will be scaled to. This property is ignored if the Zoom property is set.
Include Document MarkupsDetermines whether document markups are retained.
When this property is False (the default), document markups are omitted.
When this property is True, markups are included.

Extended engine

ParameterNotes
General Settings
Output File NameTarget file template which can include %FILENAME (original filename without the extension) and %DIRNAME (directory name of the original file)
Create Directories if RequiredForce creation of any output directories if they do not already exist.
Continue on ErrorContinue processing TIFF files after an error occurs.
Overwrite ExistingOverwrites the target document if it exists.
Advanced FlagsCommand line flags to be passed through to the underlying executable.
Password FilesThis option specifies what DAS does when it encounters a password protected PDF file. The file will be copied to the password sub directory in the Error Folder.
- Take no action.
- Move to Error Folder
- Copy to Error Folder
Maximum CoresThis specifies the number of parallel files you want to be processed at a given time.
Note: You need the multi-core license for this.
DebugSet this to true to execute the step in debug mode.
Extended OCR Settings
Output File TypeOne or more of the following, separated by commas if more than one is required.
- CSV _
- DOCX
- EPUB
- EXCELML _
- HTM
- OPENTXT
- PDF
- RTF
- TXT
- WORDML
- XLSX *
- XPS
*These output formats are suitable for table-oriented pages that can be mapped onto a spreadsheet format.
OCR EngineThe OCR engine to use. This must be set to use the IRIS engine.
OCR Language 1-8You can set up to eight different languages for OCR recognition in one page, as long as they are in the same character set.
Automatic language detectionProperty that enables or disables the Auto Language Detection feature. The aim of this feature is to detect the most probable language of a single-language page.
If at least one language has been detected, recognition will be performed in the first language candidate that has been detected, and not in the language(s) set through Language or Languages. If it fails to detect a language, recognition will be performed using the language(s) set through Language or Languages.
Auto rotateDetect page orientation and correct if required
DeskewRotates the image to correct its skew angle.
Advanced DeskewSet this to true to define advanced deskew properties.
Force DeskewUnder certain circumstances, rotating the image to correct its skew angle may decrease the OCR accuracy. The extended engine is able to analyze the image and detect from an OCR accuracy point of view whether it's better to rotate the image or not. Because the skew angle may be visible in the output document (For example, if KeepDeskew is set to 'true'), you can choose to force the deskew to rotate the image, even if it affects the accuracy.
If turned off, the image is analyzed before rotation and the engine may choose not to rotate the image depending on the analysis result.
If turned on, the image is rotated to correct skew angle.
Adjustment ModeSet the behavior regarding dimension adjustment for deskew operation.
DespeckleRemoves all the groups of connected pixels with a few pixels below the parameter. Suggested range: 1-20.
Advanced DespeckleSet the advanced despeckle settings, advanced despeckle provides advanced image noise reduction features by the image despeckle filter.
Remove White PixelsBy default, Advanced Despeckle removes black pixels. If this setting is set to 'true', white pixels will be removed instead of black pixels.
DilateDespeckle removes all the groups of connected pixels with a few pixels below the SpeckleSize parameter. Those connected pixels are not removed if the distance to a larger connected component is below this parameter. As a result, only the isolated pixels get deleted. The maximum value for this property is 20 pixels.
The default value is '0'.
LayoutThe layout for the docx or rtf document:
- Standard
- Flow
PDF VersionThis determines the PDF version of the generated PDF:
- 1.4
- 1.5
- 1.6
- 1.7
- 1.7 Extension Level 3
- 1.7 Extension Level 5
- 1.7 Extension Level 8
- PDF/A-1a
- PDF/A-1b
- PDF/A-2a
- PDF/A-2b
- PDF/A-3a
- PDF/A-3b
Remove Blank PageSet this to true to remove blank pages from Tiff or PDF documents. Value needs to be set for sensitivity (see below).
SensitivityThe sensitivity, from 1 to 100. With high sensitivity, fewer blank pages are detected.
Work DepthThis parameter (0 – 255) defines how deeply the OCR engine will analyze a page with 255 being the deepest. For poorer quality documents, higher values can give better recognition results.
JPEG QualityThis parameter (0–255) determines the compression/quality of color JPEG images in generated PDFs. 0 gives the smallest file size whilst 255 gives the best quality. The default value is 128.
JPEG2000 CompressionEnable/Disable JPEG2000 Compression.
JPEG2000 Compression ModeThe JPEG2000 Compression Mode to use.
JPEG2000 Compression ValueThe Value to set for the selected Compression Mode.
IHQC CompressionApply Intelligent High-Quality Compression.
IHQC Compression LevelLevel 1 is the basic compression level while level 3 is the most advanced Intelligent High-Quality Compression Mode.
IHQC Quality FactorThe quality Factor for IHQC.
No OCRWhether are not to perform OCR on the document (Yes to not perform OCR, No to perform OCR).
BinarizationWhether or not to perform binarization on the document.
BrightnessThe brightness (higher values will make the result darker).
ContrastThe contrast (lower values will make the result darker).
Smoothing LevelSmoothing may be useful to binarize text with a colored background to avoid noisy pixels (0 disables smoothing, higher values smooth more).
UnditheringWhether or not to use automatic undithering while processing a page.
Note: Automatic undithering will be applied only if smoothing is also activated (Smoothing Level).
Dithering is a scanning technique which consists in representing a color or grayscale image using only a limited color palette. This allows reducing file size while maintaining the general aspect of the image. This technique is known to create images more difficult to handle for OCR technology; therefore specific image preprocessing is needed to detect and revert it.
ThresholdSets the threshold for fixed threshold binarization (0 for automatic threshold computation).
Remove LinesWhether or not to remove lines from an image (The image must be black and white).
Horizontal Clean XThe parameter for cleaning noisy pixels attached to the horizontal lines.
Horizontal Clean YThe parameter for cleaning noisy pixels attached to the horizontal lines.
Vertical Clean XThe parameter for cleaning noisy pixels attached to the vertical lines.
Vertical Clean YThe parameter for cleaning noisy pixels attached to the vertical lines.
Horizontal DilateThe dilate parameter that helps the detection of horizontal lines.
Vertical DilateThe dilate parameter that helps the detection of vertical lines.
Horizontal Max GapThe maximum horizontal line gap to close. It is useful to remove broken lines.
Vertical Max GapThe maximum vertical line gap to close. It is useful to remove broken lines.
Horizontal Max ThicknessThe maximum thickness of the horizontal lines to remove. It is useful to keep vertical lines larger than this parameter. Can be also useful to keep vertical letter strokes.
Vertical Max ThicknessThe maximum thickness of the vertical lines to remove. It is useful to keep horizontal lines larger than this parameter. Can be also useful to keep horizontal letter strokes.
Horizontal Min LengthThe minimum length of the horizontal lines to remove.
Vertical Min LengthThe minimum length of the vertical lines to remove.
Remove Dark BordersRemoves the dark surrounding from bitonal, grayscale or color images. The dark surrounding of the image is whitened (Note: The dark border should be touching the edge of the page for this to work).
Punch Hole RemovalAttempts to remove punch holes from pages.
Note: The punch hole algorithm can be used on images with the following minimum dimensions width: 300px, height: 100px (computed for 300 DPI). The minimum height and width can vary with the image resolution.
InterpolationInterpolates the source image to the given resolution. This value (the target resolution) must be greater than the source image's resolution.
Interpolation ModeSets the interpolation mode.
Keep Original ImageSet this to true if you want to use the pre-processed image for OCR but keep the original image in the output document. The default value is 'true'.
Note: This property only applies when processing image files or PDF files with the Convert To TIFF set to Yes.
Keep Deskewed ImageSet this to true if you want to use the deskewed image in the output document.
Note: This property only applies when Keep Original Image is set to No.
Keep Despeckled ImageSet this to true if you want to use the despeckled image in the output document. This requires the source image to be black and white.
Note: This property only applies when Keep Original Image is set to No.
Keep Dark Border RemovalSet this to true if you want to use the image after dark borders have been removed, in the output document.
Note: This property only applies when Keep Original Image is set to No.
Keep Punch Hole RemovalSet this to true if you want to use the image after punch holes have been removed, in the output document.
Note: This property only applies when Keep Original Image is set to No.
Any File To PDF Conversion Settings
Conversion Timeout (ms)Limits the amount of time in milliseconds that can be spent on conversion. A value of zero means waits indefinitely.
Convert BookmarksFor MS Word, convert bookmarks.
Bookmark DepthThis property will take effect only when the Convert Bookmarks property is set to True. Numbers defining bookmark levels must be equal to or larger than one. Word style names must not repeat in the string. The string must not start or end with the delimiter. When this property is empty, the default style mapping (Heading one through nine will be mapped to level one through nine) will be used. Therefore, an empty string is functionally equivalent to:
Heading 1 mapped to 1, Heading 2 mapped to 2, Heading 3 mapped to 3, Heading 4 mapped to 4, Heading 5 mapped to 5, Heading 6 mapped to 6, Heading 7 mapped to 7, Heading 8 mapped to 8, Heading 9 mapped to 9.
Note: If you use a non-English version of Microsoft Word, then you may need to replace the word "Heading" with its localized version.
Convert HyperlinksSets the flag to indicate whether to convert Word hyperlinks to PDF hyperlinks.
Print All Sheets (Excel)The flag that indicates whether to print all Excel worksheets or not.
Print Background Color (IE)For files printed via IE Sets the flag that indicates whether to print background color or not when printing.
Print Scale % (Visio)For Visio files, sets the print scale.
Header (IE)This property modifies Internet Explorer's header setting.
Footer (IE)This property modifies Internet Explorer's footer setting.
Image CompressionIf you want a lossless image compression, use PRN_IMAGE_COMPRESS_ZIP (ZIP compression).
Image DownsizingIf this property is set to Yes, then the resolution of images is reduced to the DPI value specified in the Downsize Resolution DPI property.
Downsize Resolution DPIIf the Image Downsizing property is set to True, then the resolution of images is reduced to the DPI value specified in this property.
Image JPEG QualityThe allowed value range is from 5 to 100 with 100 being the highest quality.
Font EmbeddingThe option PRN_FONT_EMBED_FULLSET (embedding a full set of fonts) will cause a significant increase in PDF file size, especially for CJK font, and therefore not recommended. If you need to embed the font, PRN_FONT_EMBED_SUBSET (embed subset of fonts) will be a better choice.
Font SubstitutionFor the PRN_FONT_SUBST_TABLE (use font substitution table) option, you need to configure the substitution table. The table is stored under the "Device Setting" section of the printer driver properties (can be accessed from the Control Panel).
Embed Fonts as Type 0This option is recommended if you have non-standard fonts like barcode font.
Top MarginSets top margin. (Inches)
Bottom MarginSets bottom margin. (Inches)
Left MarginSets left margin. (Inches)
Right MarginSets right margin. (Inches)
Page WidthSets a custom page width. (Inches)
Page HeightSets a custom page height. (Inches)
Paper OrientationSets paper orientation to:
- Default (Maintain Source Orientation)
- Landscape
- Portrait
PDF ComplianceAllows the User To choose PDF/A or PDF/X Compliant files:
- None (No PDF/A Output)
- PDF/A-1b (PDF/A-1b compliant)
- PDF/X-1a (PDF/X-1a compliant)
- PDF/X-3 (PDF/X-3 compliant)
Convert MSG AttachmentsIf you set this to true, DAS will convert both MSG files and their Attachments to a single PDF file.
Attach MSG Attachments to PDFIf set to true, DAS will Attach Msg Attachments that are converted as PDF Attachments.
If set to false, DAS will merge Msg Attachments that are converted to the PDF file generated by the body.
Preserve Word AttachmentsDetermines whether embedded and linked files will be preserved during conversion. Default value: False (disabled).
Note: This will work with WordExtensionEX only.
Convert PDF Attachments (PDF)Convert PDF Attachments to create a combined PDF file.
Merge PDF Attachments (PDF)Set this flag to true if you want to convert pdf attachments and merge them into the output pdf file. Otherwise, the converted files will be merged back to the pdf.
Retain PDF Attachment (PDF)Switch this on to Retain the Original PDF attachments if you set Merge PDF Attachments to true.
Retain Properties (Office)Set this flag if you want the MS Office properties to be transferred to the target pdf document.
Color Type (PowerPoint)Use this property to set PowerPoint to print with either color, grayscale, or black and white.
Handout Order (PowerPoint)Sets the handout order, this flag only applies to PowerPoint jobs. The possible values are:
- Vertical First
- Horizontal First
Output Type (PowerPoint)Sets the output type, it only works with the PowerPoint files. The possible values are:
- Slides
- Build slides
- Two slides handouts
- Three slides handouts
- Four slides handouts
- Six slides handouts
- Nine slides handouts
- Notes
- Outline
Print Graphics (Publisher)Sets the graphics setting for printing.
- Print Full Resolution
- Print Low Resolution
- Print Graphics
Frame Slides (PowerPoint)Indicate whether to draw a frame around the border of the slides.
Zoom (Excel)Sets printing zoom of the worksheet. The allowed value range is from 10 to 400.
Fit to Pages Wide (Excel)Sets number of pages wide the worksheet will be scaled to. This property is ignored if the Zoom property is set.
Fit to Pages Tall (Excel)Sets number of pages tall the worksheet will be scaled to. This property is ignored if the Zoom property is set.
Include Document MarkupsDetermines whether document markups are retained.
When this property is False (the default), document markups are omitted.
When this property is True, markups are included.

Barcode TIFF/PDF

This step can detect barcodes in TIFF/PDF files and either Split/Rename the file based on the barcodes detected.

Screen Field/ButtonDescription
Output File NameThe output file path template where the split files will be saved.
- %VALUE%: Replaced by the barcode value found.
- %INDEX%: Replaced by the current split index.
- %FILENAME%: Replaced by the file name
Output File Name (No Barcode)The renaming template to use for page ranges where no barcodes were identified. Allowed templates:
- %INDEX%: Replaced by the current split index.
- %FILENAME%: Replaced by the filename of the source file.
Barcode OperationSelect between Split by Barcode or Rename by Barcode.
- Split by Barcode: Choose this option to split the TIFF/ PDF file by Barcode.
- Rename by Barcode: Choose this option to rename the TIFF/PDF file based on Barcode.
Split ModeVarious Options for splitting Files by Barcode:
- Barcode on First Page
- Barcode on Last Page
- Remove Barcode Page
Barcode FormatBarcode formats supported.
Try HarderSpend more time to try to find a barcode; optimize for accuracy, not speed. The default is true.
Overwrite ExistingOverwrites any file that exists with the same name in the output folder.
Note: If you have the same barcode in different pages or files, they will be overwritten if this is set to true.
Metadata NameChoose the Metadata field you want to set the ‘Metadata Value’ for. The named fields below will have the value added to them when set.
- Author
- Creator
- Keywords
- Producer
- Subject
- Title
- Trapped
Any other entry will be used as the name for a new custom metadata item.
Metadata ValueEnter a value for the Metadata Value. Alternatively, you can use the following file naming variables:
- %VALUE%: Replaced by the barcode value found.
- %INDEX%: Replaced by the current split index.
- %FILENAME%: Replaced by the file name
Note: ‘Trapped’ metadata only accepts either ‘True’, ‘False’ or ‘Unknown’ as a value.
Perform Pre-processingDo not enable this option unless instructed by Nutrient support.
BinarizeSet this to true to get better results from colored files.
DeskewStraighten the image.
Remove LinesWhether or not to remove lines from an image.
DespeckleRemove specks below the specified pixel size from the image.
Box SizeThis option is ideal for forms where sometimes boxes around text can cause an area to be identified as graphics. This option removes boxes from the temporary copy of the imaged used by the barcode reader. Technically, this option removes connected elements with a minimum area (in pixels and defined by this property). This option is currently only applied for bitonal images.
ZonesOnly examine the region specified for barcode(s).
Note: To specify the zone, you need to set the following in the step properties:
- Left
- Top
- Width
- Height
PDF DPIThe DPI of TIFF images generated from the source PDF file. These images are then used for barcode recognition.
TIFF CompressionThe compression to set to the TIFF images generated or converted from the source PDF file. These images are then used for barcode recognition.
Advanced FlagsAdditionally advanced command-line flags may be entered here (see Advanced Flags)
Continue on ErrorContinue processing TIFF/PDF files after an error occurs.
Maximum CoresThe number of parallel files DAS will attempt to process at the same time.
DebugSet this to true to execute the step in debug mode.

High availability

The high availability step in DAS is designed to utilize two instances of the product running on separate hosts.

Screen Field/ButtonDescription
Current Job IDThe Job ID on the current host.
Default StatusSelect the Default status of the current host (Controller
Shared Status FileEnter the shared.txt file location – this needs to be on a shared network location accessible to both hosts.
HostnameName of the paired host.
ADX Install PathInstall path of DAS on the paired host.
Job IDThe Job ID on the paired host

Distributed polling

This step can be used to implement load balancing in DAS. It achieves this by copying a fraction of the files from a central input location to the local system where DAS is running. Multiple DAS servers can point to one input folder, as a result, the files will be shared across several servers and the processing will be more optimized. See Distributed Polling for more details.

Screen Field/ButtonDescription
Autobahn Job IDThe Job ID of the Job that will be processing your input files.
Note: The Source Folder of this job will be the Destination Folder of the Distributed Polling Job.
LimitThe maximum number of files to be copied to the shared folder per run.
ExtensionsEnter the file extensions you want us to copy separated by a comma. For example, “.pdf,.tif,tiff”
Process Sub FolderSelect true if you want to copy subfolders.
DebugSelect true if you want to see more debug output.

DAS content extraction job

This step allows a DAS Content Extraction job to be integrated as an DAS step. See DAS Content Extraction Job Step for more details.

Screen Field/ButtonDescription
Kingfisher Job IDThe DAS Content Extraction Job ID

PDF to PDFA job

This step uses GDPicture libraries to convert a PDF document to a PDFA format.

Screen Field/ButtonDescription
Output File NameThe template for the output file, which can include %FILENAME to give the input file name without extensions.
Continue on ErrorSet to true if the job should continue processing files after a file has failed.
PDF/A Output TypeSelect the type of PDF/A to output. The selection is: PDF/A-1a, PDF/A-1b, PDF/A-2a, PDF/A-2b, PDF/A-2u, PDF/A-3a, PDF/A-3b, PDF/A-3u, PDF/A-4, PDF/A-4e, PDF/A-4f
Allow VectorizationIf set to false, the job will attempt to create the PDF/A files without Vectorization.
Allow RasterizationIf set to false, the job will attempt to create the PDF/A files without Rasterization.
DebugSelect true if you want to see more debug output.

PDF recognition to JSON job

This step extracts important data from PDF files in the form of key/value pairs. Users can define their expected keys and easily retrieve the data from those fields. No templates are needed.

Screen Field/ButtonDescription
Output Expected Key JSONCreates a JSON file of expected key-values as output.
Output Expected Key Values By Page JSONCreates a JSON file of expected key-values by page as output.
Output PDF Data Pages TextCreates a .txt file of the pdf data by page.
Output PDF Data Page DetailsCreates a .txt file of key + bounding box, Values + Bounding Boxes by page
Output PDF Data Pages As CSVCreates a CSV containing page number, key, key bounding box, value, value bounding box, page number, page dimensions
Output PDF Data Pages As JSONCreates a CSV containing page number, key, key bounding box, value, value bounding box, page number, page dimensions
List PDF Data Pages As JSONIf true, the results of 'Output PDF Date Pages as JSON' will be included in the logging
Date FormatSet to input date format.
Use Currency SymbolsSet to false if you want symbols and strings to be removed before returning currency values.
Page LimitMaximum number of pages to be processed.
Page RangeA string representation of the page numbers you want to process. For example, 1,3-4.
Current CultureChoose the expected format of date times if ambiguous For example, 03/07/12
Expected Keys File PathsFile paths of the text files containing expected keys. (use '
Ignore Case Expected KeysChoose if Casing is ignored when comparing recognition values to the Expected Keys set.
Custom Keys File PathsFile path of the text files containing custom keys. (use '
Ignore Case Custom KeysChoose if Casing is ignored when comparing recognition values to the Custom Keys set.
Custom Keys Default File PathThe default file path of the text file containing custom keys. (use '
Load Default Custom KeysSet to true if you want custom keys to be taken from the default path.
Skip Line WidthThis value will be multiplied by page width and any line with its width below this calculated value will NOT be skipped.
Skip Line Word CountDo not skip line if the number of words in the line is less than this value.
Skip Line Word SpaceAny line with an average space greater than this value will NOT be skipped.
Ignore Don’t Skip SpaceThe only time special chunks are broken into smaller chunks is if the space between two adjacent words in the chunk is greater than this value.
Chunk Break SpaceAny chunk that has two adjacent words with a space between them greater than this value will be chunked.
Chunk Break MinimumIf the average space of words in a chunk is smaller than this value, 'Chunk break space' will be used to break the chunk instead of this value.
Chunk Header Font SizeAny chunk with an average font size below this value will not be considered as a header candidate.
Chunk Break Space HeaderAny header chunk that has two adjacent words with a space between them greater than this value will be chunked.
Break Words By DelimiterSwitch this to true to break words by any of the Chunk Delimiters available (wordDelimiter, chunkDelimiter and chunkSpaceDelimiter).
Word DelimiterEnter one delimiter per index. If any series of characters match this pattern, we will break the word on that index.
Chunk DelimiterEnter one delimiter per line. If any word ends with any of these delimiters, they will be broken into chunks.
Chunk Space DelimiterEnter one delimiter per line.
Max Horizontal SpaceSkip analyzing key/value chunks that have a horizontal space greater than this value (points) between them.
Max Vertical SpaceSkip analyzing key/value chunks that have a vertical space greater than this value (points) between them.
Data Types To SplitChoose the data types that the Chunker will attempt to split into smaller chunks.
Data Types To CheckChoose the data types that will not be split once identified.
Data Types To RemoveChoose the unwanted data types that will be removed in post processing.
Error On No Expected KeysWhen set to 'Yes', a file that does not contain any values for expected keys will be considered an error.
Regex Dictionary Terms File PathFile path of a text file containing regex dictionary terms. (leave blank for default)
Plain Dictionary Terms File PathFile path of a text file containing plain dictionary terms. (leave blank for default)
DebugSelect true if you want to see more debug output.

Modern compress PDF

This step uses GDPicture libraries to compress PDF documents with various options.

Screen Field/ButtonDescription
Output File NameThe template for the output file, which can include %FILENAME to give the input file name without extensions.
Continue on ErrorSet to true if the job should continue processing files after a file has failed.
Remove AnnotationsSelect 'Yes' if you want to remove annotations.
Remove Blank PagesSelect 'Yes' if you want to remove blank pages.
Remove BookmarksSelect 'Yes' if you want to remove bookmarks.
Remove Embedded FilesSelect 'Yes' if you want to remove embedded files.
Remove Form FieldsSelect 'Yes' if you want to remove form fields.
Remove HyperlinksSelect 'Yes' if you want to remove hyperlinks.
Remove JavaScriptSelect 'Yes' if you want to remove JavaScript.
Remove MetadataSelect 'Yes' if you want to remove metadata.
Remove Page ThumbnailsSelect 'Yes' if you want to remove page thumbnails.
Pack FontsSelect 'Yes' if you want to pack fonts. This greatly optimizes output file size by focusing on fonts.
Pack DocumentsSelect 'Yes' if you want to pack document content before saving.
Recompress ImagesSelect 'Yes' if you want to recompress images.
Enable MRCSelect 'Yes' if you want to enable MRC.
Downscale Resolution MRCSet the downscale resolution of the MRC compression. The default value is 100.
Preserve SmoothingSelect 'Yes' if you want to preserve smoothing.
Image QualityChoose which Image Quality the output files will be. The default value is Medium.
Downscale ImagesSelect 'Yes' if you want to downscale images.
Downscale ResolutionSet the downscale resolution of the compression. The default value is 150.
Enable Color DetectionSelect 'Yes' if you want to enable automatic color detection.
Enable Char RepairSelect 'Yes' if you want to enable character repair.
Enable JPEG2000Select 'Yes' if you want to enable JPEG2000.
Enable JBIG2Select 'Yes' if you want to enable JBIG2.
JBIG2 PMS ThresholdSet the threshold of the JBIG2 pattern matching and substitution. The default value is 0.85.
DebugSelect true if you want to see more debug output.

Validate PDFA

This step uses GdPicture libraries to validate if the input PDF document conforms to the selected PDFA version.

Screen Field/ButtonDescription
Output File NameThe template for the output file, which can include %FILENAME to give the input file name without extensions.
Continue on ErrorSet to true if the job should continue processing files after a file has failed.
PDF/A Validation TypeChoose which PDF/A version the files will be validated against.
Report LocationTarget folder to save reports for files that failed to validate. The location must already exist, or the report will not save.
DebugSelect true if you want to see more debug output.

Linearize PDF

This step uses GdPicture libraries to optimize PDFs for web-viewing, rendering the document one page at a time.

Screen Field/ButtonDescription
Output File NameThe template for the output file, which can include %FILENAME to give the input file name without extensions.
Continue on ErrorSet to true if the job should continue processing files after a file has failed.
Pack DocumentSelect 'Yes' if you want the document to be packed before it is saved, reducing its size.
Enable CompressionSelect 'Yes' if you want to enable compression on the output pdf.
DebugSelect true if you want to see more debug output.

Convert any file to PDF (GdPicture)

This step uses GdPicture libraries to convert a large variety of file types to PDF. This step does not require an Office installation to process Office files.

Screen Field/ButtonDescription
Output File NameThe template for the output file, which can include %FILENAME to give the input file name without extensions.
Continue on ErrorSet to true if the job should continue processing files after a file has failed.
AuthorSet the Author metadata field in the output PDF. This can include %FILENAME% (original filename without the extension) or %DIRNAME% (directory name of original file)
TitleSet the Title metadata field in the output PDF. This can include %FILENAME% or %DIRNAME%
SubjectSet the Subject metadata field in the output PDF. This can include %FILENAME% or %DIRNAME%
KeywordsSet the Keywords metadata field in the output PDF. This can include %FILENAME% or %DIRNAME%
ProducerSet the Producer metadata field in the output PDF. This can include %FILENAME% or %DIRNAME%
MetadataSet the Metadata field in the output PDF. This can include %FILENAME% or %DIRNAME%.
Convert Email AttachmentsSelect 'Yes' if you want to convert email attachments to PDF.
Attach Email Attachments To PdfSelect 'Yes' if you want to attach the email attachments to the output PDF. If set to 'No', the files will be merged to the PDF if they have been converted to PDF, otherwise they will be removed.
Email Page HeightSpecifies the page height, in points, of the resulting document when converting from the source Email file.
Email Page WidthSpecifies the page width, in points, of the resulting document when converting from the source Email file.
Email Page Margin BottomSpecifies the bottom page margin, in points, of the resulting document when converting from the source Email file.
Email Page Margin LeftSpecifies the left page margin, in points, of the resulting document when converting from the source Email file.
Email Page Margin RightSpecifies the right page margin, in points, of the resulting document when converting from the source Email file.
Email Page Margin TopSpecifies the top page margin, in points, of the resulting document when converting from the source Email file.
Email Prefer One PageSelect 'Yes' if you want the email to be converted to a single page PDF if possible.
Enable ICCSpecifies if the converter shall favor preserving the International Color Consortium (ICC) profile, if present in the loaded document, during the conversion.
Html Emulation TypeSpecifies a type of a media to emulate.
Html Page HeightSpecifies the page height, in points, of the resulting document when converting from the source Html file.
Html Page WidthSpecifies the page width, in points, of the resulting document when converting from the source Html file.
Html Page Margin BottomSpecifies the bottom page margin, in points, of the resulting document when converting from the source Html file.
Html Page Margin LeftSpecifies the left page margin, in points, of the resulting document when converting from the source Html file.
Html Page Margin RightSpecifies the right page margin, in points, of the resulting document when converting from the source Html file.
Html Page Margin TopSpecifies the top page margin, in points, of the resulting document when converting from the source Html file.
Html Prefer CSS Page SizeGive any CSS page size declared in the page priority over what is declared in Html Page Width and Html Page Height. If set to false, the renderer will scale the content to fit the paper size.
Html Prefer One PageSpecifies whether the output document should contain a single page.
Load Only First PageSpecifies that all executed actions with the loaded document will be processed using only the first page of the document.
Page RangeUse the string of "1-5" for pages 1 to 5, or use the string of "1,5,6" to specify pages 1 and 5 and 6. You can use the string of "1,5,8-12" to specify pages 1, 5, 8 and all pages from page 8 to page 12, etc.
Pdf Bitonal Image CompressionSets the scheme to be used to compress bitonal image data when converting/saving the currently loaded document to PDF format.
ID Scheme
0 None
1 Flate
2 CCITT4
3 JPEG
4 JBIG2
5 JPEG2000
JBIG2 PMS ThresholdSets the threshold of the JBIG2 pattern matching and substitution. The default value is 0.85.
Pdf Color Image CompressionSets the scheme to be used to compress color image data when converting/saving the currently loaded document to PDF format.
Pdf Enable Color DetectionEnables or disables the automatic color detection feature when converting/saving the currently loaded document to PDF format.
Pdf Image QualitySets the level of quality used to compress images with a lossy compression scheme, which are embedded in the newly produced PDF document when converting/saving the currently loaded document to PDF format. It must be a value from 0 to 100. 0 means the worst quality and the best compression, 100 means the best quality and the worst compression.
PDF Use Deflate On JPEGSpecifies if the converter shall use additional Deflate compression for JPEG images in PDF output.
Rasterization DPISets the rendering resolution to be used when converting vector content to raster content, if any is included in the currently loaded document.
Tiff Enable Exif RotateSpecifies whether tiff encoder is using Exif rotate flag to handle page rotations.
Timeout MillisecondsSpecifies the timeout of the subsequent conversion process, in milliseconds. Default value is -1, which means no timeout.
Txt Font BoldSpecifies whether the font used for the resulting document when converting from the source txt file must have a bold style.
Txt Font ItalicSpecifies whether the font used for the resulting document when converting from the source txt file must have an italic style.
Txt Font FamilySpecifies the name of the font to be used for the resulting document when converting from the source txt file.
Txt Font SizeSpecifies the text size, in points, to be used for the resulting document when converting from the source txt file.
Txt Horizontal Text AlignmentSpecifies the horizontal text alignment of the resulting document when converting from the source txt file.
Txt Page HeightSpecifies the page height, in points, of the resulting document when converting from the source Txt file.
Txt Page WidthSpecifies the page width, in points, of the resulting document when converting from the source Txt file.
Txt Page Margin BottomSpecifies the bottom page margin, in points, of the resulting document when converting from the source Txt file.
Txt Page Margin LeftSpecifies the left page margin, in points, of the resulting document when converting from the source Txt file.
Txt Page Margin RightSpecifies the right page margin, in points, of the resulting document when converting from the source Txt file.
Txt Page Margin TopSpecifies the top page margin, in points, of the resulting document when converting from the source Txt file.
DebugSelect true if you want to see more debug output.

Combine any file to PDF

This step uses GDPicture libraries to convert a large variety of file types to PDF, and then merges them to create a single output PDF. This step does not require an Office installation to process Office files.

Screen Field/ButtonDescription
Output File NameThe template for the output file, which can include %DIRNAME (original directory name)
Continue on ErrorSet to true if the job should continue processing files after a file has failed.
AuthorSet the Author metadata field in the output PDF. This can include %FILENAME% (original filename without the extension) or %DIRNAME% (directory name of original file)
TitleSet the Title metadata field in the output PDF. This can include %FILENAME% or %DIRNAME%
SubjectSet the Subject metadata field in the output PDF. This can include %FILENAME% or %DIRNAME%
KeywordsSet the Keywords metadata field in the output PDF. This can include %FILENAME% or %DIRNAME%
ProducerSet the Producer metadata field in the output PDF. This can include %FILENAME% or %DIRNAME%
MetadataSet the Metadata field in the output PDF. This can include %FILENAME% or %DIRNAME%
Convert Email AttachmentsSelect 'Yes' if you want to convert email attachments to PDF.
Attach Email Attachments To PdfSelect 'Yes' if you want to attach the email attachments to the output PDF. If set to 'No', the files will be merged to the PDF if they have been converted to PDF, otherwise they will be removed.
Email Page HeightSpecifies the page height, in points, of the resulting document when converting from the source Email file.
Email Page WidthSpecifies the page width, in points, of the resulting document when converting from the source Email file.
Email Page Margin BottomSpecifies the bottom page margin, in points, of the resulting document when converting from the source Email file.
Email Page Margin LeftSpecifies the left page margin, in points, of the resulting document when converting from the source Email file.
Email Page Margin RightSpecifies the right page margin, in points, of the resulting document when converting from the source Email file.
Email Page Margin TopSpecifies the top page margin, in points, of the resulting document when converting from the source Email file.
Email Prefer One PageSelect 'Yes' if you want the email to be converted to a single page PDF if possible.
Enable ICCSpecifies if the converter shall favor preserving the ICC profile, if present in the loaded document, during the conversion.
Html Emulation TypeSpecifies a type of a media to emulate.
Html Page HeightSpecifies the page height, in points, of the resulting document when converting from the source Html file.
Html Page WidthSpecifies the page width, in points, of the resulting document when converting from the source Html file.
Html Page Margin BottomSpecifies the bottom page margin, in points, of the resulting document when converting from the source Html file.
Html Page Margin LeftSpecifies the left page margin, in points, of the resulting document when converting from the source Html file.
Html Page Margin RightSpecifies the right page margin, in points, of the resulting document when converting from the source Html file.
Html Page Margin TopSpecifies the top page margin, in points, of the resulting document when converting from the source Html file.
Html Prefer CSS Page SizeGive any CSS page size declared in the page priority over what is declared in Html Page Width and Html Page Height. If set to false, the renderer will scale the content to fit the paper size.
Html Prefer One PageSpecifies whether the output document should contain a single page.
Load Only First PageSpecifies that all executed actions with the loaded document will be processed using only the first page of the document.
Page RangeUse the string of "1-5" for pages 1 to 5, or use the string of "1,5,6" to specify pages 1 and 5 and 6. You can use the string of "1,5,8-12" to specify pages 1, 5, 8 and all pages from page 8 to page 12, etc.
Pdf Bitonal Image CompressionSets the scheme to be used to compress bitonal image data when converting/saving the currently loaded document to PDF format.
ID Scheme
0 None
1 Flate
2 CCITT4
3 JPEG
4 JBIG2
5 JPEG2000
JBIG2 PMS ThresholdSets the threshold of the JBIG2 pattern matching and substitution. The default value is 0.85.
Pdf Color Image CompressionSets the scheme to be used to compress color image data when converting/saving the currently loaded document to PDF format.
Pdf Enable Color DetectionEnables or disables the automatic color detection feature when converting/saving the currently loaded document to PDF format.
Pdf Image QualitySets the level of quality used to compress images with a lossy compression scheme, which are embedded in the newly produced PDF document when converting/saving the currently loaded document to PDF format. It must be a value from 0 to 100. 0 means the worst quality and the best compression, 100 means the best quality and the worst compression.
Pdf Use Deflate On JPEGSpecifies if the converter shall use additional Deflate compression for JPEG images in PDF output.
Rasterization DPISets the rendering resolution to be used when converting vector content to raster content, if any is included in the currently loaded document.
Tiff Enable Exif RotateSpecifies whether tiff encoder is using Exif rotate flag to handle page rotations.
Timeout MillisecondsSpecifies the timeout of the subsequent conversion process, in milliseconds. Default value is -1, which means no timeout.
Txt Font BoldSpecifies whether the font used for the resulting document when converting from the source txt file must have a bold style.
Txt Font ItalicSpecifies whether the font used for the resulting document when converting from the source txt file must have an italic style.
Txt Font FamilySpecifies the name of the font to be used for the resulting document when converting from the source txt file.
Txt Font SizeSpecifies the text size, in points, to be used for the resulting document when converting from the source txt file.
Txt Horizontal Text AlignmentSpecifies the horizontal text alignment of the resulting document when converting from the source txt file.
Txt Page HeightSpecifies the page height, in points, of the resulting document when converting from the source Txt file.
Txt Page WidthSpecifies the page width, in points, of the resulting document when converting from the source Txt file.
Txt Page Margin BottomSpecifies the bottom page margin, in points, of the resulting document when converting from the source Txt file.
Txt Page Margin LeftSpecifies the left page margin, in points, of the resulting document when converting from the source Txt file.
Txt Page Margin RightSpecifies the right page margin, in points, of the resulting document when converting from the source Txt file.
Txt Page Margin TopSpecifies the top page margin, in points, of the resulting document when converting from the source Txt file.
DebugSelect true if you want to see more debug output.

Combine PDFs

This step uses GDPicture libraries to convert a large variety of file types to PDF, and then merges them to create a single output PDF. This step does not require an Office installation to process Office files.

Screen Field/ButtonDescription
Output File NameThe template for the output file, which can include %DIRNAME (original directory name).
Continue on ErrorSet to true if the job should continue processing files after a file has failed.
Enable Numerical OrderingWhen enabled, documents will be merged in numerical order for example, file1, file3, file11, file20, file101. Otherwise it will be ordered lexographically for example, file1, file101, file11, file20, file3
DebugSelect true if you want to see more debug output.

PDF to JPEG/PDF to PNG/ PDF to Tiff

These steps use GDPicture libraries to convert PDF files into the JPEG, PNG or TIFF format.

Screen Field/ButtonDescription
Output File NameThe template for the output file, which can include %FILENAME (original file name).
Continue on ErrorSet to true if the job should continue processing files after a file has failed.
Tiff Compression (PDF to TIFF only)Specifies the TIFF compression when saving images in TIFF format.
DPIThe DPI resolution to be used for rendering. A value of 72 will give the same result as Acrobat when zoom level is 100%. Values over 300 will cause excessive memory usage.
BrightnessAdjust the Brightness of the output image. Value must be between -100 and 100.
ContrastAdjust the Contrast of the output image. Value must be between -100 and 100.
SaturationAdjust the Saturation of the output image. Value must be between -100 and 100.
GammaAdjust the Gamma of the output image. Value must be between -100 and 100.
Threshold 1BPPIf set, converts the output image to a 1-bit BW indexed color image specifying a threshold value. Pixel values less than the threshold will be turned black, while the values equal to or larger will be turned white. Value must be between 0 and 255.
Auto DeskewSelect 'Yes' to try to deskew the image to about 15 degrees. Deskewing an image can help a lot to do OCR, OMR, barcode detection or just improve the readability of an image.
Crop Black BordersDetects and removes margins consisting of black color around the image.
Crop Black Borders ExDetects and sets to White, margins consisting of black color around the image. This does not have the same behavior as Crop Black Borders; The black borders are not removed but are set to blank. Therefore, the image dimensions are kept the same.
Crop Area HeightSpecifies the page height, in pixels, of the resulting document when cropping.
Crop Area WidthSpecifies the page width, in pixels, of the resulting document when cropping.
Crop Location LeftSpecifies the distance, in pixels, to crop from the left of the resulting document.
Crop Location BottomSpecifies the distance, in pixels, to crop from the bottom of the resulting document.
DespecklePerforms a 3x3 despeckle filter. It can remove black noise pixels from white backgrounds and visa versa. It also can remove random noise from multicolored backgrounds.
Despeckle MorePerforms a 5x5 despeckle filter. It can remove black noise pixels from white backgrounds and visa versa. It also can remove random noise from multicolored backgrounds.
Enable ICMSpecifies if color correction is used for images embedding an ICC profile. Enables ICM results in automatic pixel transformation while opening image including an ICC profile.
Remove Hole PunchRemoves all punch holes situated on the margins of your image.
Remove LinesPerforms line removal on the image in the direction specified.
Resize New HeightNew image height in pixels, of the resulting document when resizing.
Resize New WidthNew image width in pixels, of the resulting document when resizing.
Resize Interpolation ModeThe interpolation mode to use when resizing the image.
Rotate By AngleSelects whether to rotate by an angle specified, or by a preset type of rotation.
Rotation AngleThe angle of rotation for the image.
Rotation TypeThe method of rotation to apply to the image.
Page RangeUse the string of "1-5" for pages 1 to 5, or use the string of "1,5,6" to specify pages 1 and 5 and 6. You can use the string of "1,5,8-12" to specify pages 1, 5, 8 and all pages from page 8 to page 12, etc.
DebugSelect true if you want to see more debug output.

PDF to text

This step uses GDPicture libraries to extract the searchable text from the pages of a PDF file, and creates an output text file. If the page is non-searchable, there is the option to enable OCR.

Screen Field/ButtonDescription
Output File NameThe template for the output file, which can include %FILENAME (original file name).
Continue on ErrorSet to true if the job should continue processing files after a file has failed.
Page RangeUse the string of "1-5" for pages 1 to 5, or use the string of "1,5,6" to specify pages 1 and 5 and 6. You can use the string of "1,5,8-12" to specify pages 1, 5, 8 and all pages from page 8 to page 12, etc.
Page SeparatorA text separator that will go between the text of pages
Page Separator PlacementThe placement of the Page Separator. It can go above or below each page of text
Copy Input PDF To Target FolderSet to true to copy the input pdf to the output location after the text in extracted
Preserve ParagraphSpecifies that the text extraction engine must preserve text paragraphs.
Paragraph SeparatorThis property specifies the separator to be utilized for splitting paragraphs. It only takes effect when the PreserveParagraphs property is set to Yes.
Enable OCREnables the use of the GdPicture OCR engine if the page in non-searchable.
OCR DictionaryAdd the code of languages for OCR, separated by '+'. For example, 'eng+deu+fra' would add English, German, and French.
DebugSelect true if you want to see more debug output.

PDF to searchable PDF (GdPicture)

This step uses GDPicture libraries to carry out Optical Character Recognition on the input PDF, creating an invisible searchable text layer over the document.

Screen Field/ButtonDescription
Output File NameThe template for the output file, which can include %FILENAME (original file name).
Continue on ErrorSet to true if the job should continue processing files after a file has failed.
OCR DictionaryAdd the code of any additional languages for OCR, separated by '+'. For example, 'eng+deu+fra' would add English, German and French. Codes can be found in the OCR Language Codes section.
DPIDPI of TIFF images generated or converted from the source PDF File. These images are then OCRed to create the searchable PDF.
Page RangeUse the string of "1-5" for pages 1 to 5, or use the string of "1,5,6" to specify pages 1 and 5 and 6. You can use the string of "1,5,8-12" to specify pages 1, 5, 8 and all pages from page 8 to page 12, etc.
Thread LimitThe GdPicture OCR engine processes multiple pages concurrently for optimal performance. This can take a heavy toll on the CPU. If needed, this option allows the number of pages processed consecutively to be limited.
DebugSelect true if you want to see more debug output.

PDF portfolio

This step uses GDPicture libraries to combine a folder of files into an integrated PDF unit. There are a wide range of file types that can be used to create the PDF Portfolio.

Screen Field/ButtonDescription
Output File NameThe template for the output file, which can include %DIRNAME (original directory name).
Continue on ErrorSet to true if the job should continue processing files after a file has failed.
Pdf Portfolio TypeThe initial view mode for the PDF Portfolio. This affects the way the user views the component files after opening the PDF Portfolio file.
DebugSelect true if you want to see more debug output.

Smart redaction

This step uses GDPicture libraries to identify and redact selected sensitive information in the input document.

Screen Field/ButtonDescription
Output File NameThe template for the output file, which can include %FILENAME (original file name).
Continue on ErrorSet to true if the job should continue processing files after a file has failed.
Redact Credit Card NumbersSet to true if you want to redact Credit Card Numbers.
Redact Email AddressesSet to true if you want to redact Email Addresses.
Redact IBANsSet to true if you want to redact IBANs.
Redact Phone NumbersSet to true if you want to redact Phone Numbers.
Redact URIsSet to true if you want to redact URIs.
Redact VAT IDsSet to true if you want to redact VAT IDs.
Redact Vehicle Identification NumbersSet to true if you want to redact Vehicle Identification Numbers.
Redact Social Security NumbersSet to true if you want to redact Social Security Numbers.
Redact Postal AddressesSet to true if you want to redact Postal Addresses.
Redaction ColorChoose which color will be used for redacting.
OCR DictionaryAdd the code of any additional languages for OCR, separated by '+'. For example, 'eng+deu+fra' would add English, German and French. To install additional dictionaries, see the language codes.
Detect OrientationSelect ‘Yes’ if you want to auto detect orientation.
Page RangeUse the string of "1-5" for pages 1 to 5, or use the string of "1,5,6" to specify pages 1 and 5 and 6. You can use the string of "1,5,8-12" to specify pages 1, 5, 8 and all pages from page 8 to page 12, etc.
Redaction Timeout (ms)Limits the amount of time in milliseconds that can be spent on a redaction. A value of zero means it will wait indefinitely.
DebugSelect true if you want to see more debug output.

Detect signatures

This step uses GDPicture libraries to identify pdf documents that contain digital signatures.

Any step that alters a digitally signed PDF will invalidate that PDF’s signature. This step allows signed files to be identified, and either copied or moved to a specified folder so the signature can be preserved.

If the Copy option is selected, the original signed file can also be attached to the copy that is processed. This means that the original is attached to the file that can be subsequently processed.

Flow diagram on how digital signature is detected

Screen Field/ButtonDescription
Output File NameThe template for the output file, which can include %FILENAME (original file name).
Continue on ErrorSet to true if the job should continue processing files after a file has failed.
Signed File NameSigned file name template which can include %FILENAME (original file name).
Signed File PathThe full path (excluding file name) for the location to copy/move the signed file before processing.
Create Signed PathSetting this to 'Yes' will create the signed file path directory if it does not exist.
The file processing will fail if a signed file is processed, the signed path does not exist, and this is set to ‘No’.
Overwrite SignedSetting this to 'Yes' will automatically overwrite any file in the signed file path with the same name as the current signed file. The file processing will fail if the signed file already exists and overwrite is set to false.
Signed ActionThe action to take if a signed file is detected. It can either be copied or moved to the Signed File Path.
Attach Signed Document to OutputSetting this to 'Yes' will attach a copy of the signed document to itself before being saved in the output location. This ensures a signed copy will remain with the copy that is processed.
DebugSelect true if you want to see more debug information.

Key value pair extraction

This step uses the GDPicture engine to extract information about key-value pairs in pdf document. The extra information included can be the Key or Value Bounding Box, Page Number, Confidence, and Data Type.

The user can also use JSON file to declare Expected Keys. These specific keys will be added to a separate output file if a value is found. Synonyms can also be declared for each Expected Key, so that a match for any of the synonyms will be counted as a match for the Expected Key. An example is below.

For example, we have used total and invoice number as the expected keys. grand total is a synonym for ‘total’, and there are two synonyms for invoice number in invoice no and inv no.

[
{
"expectedKey":"total",
"synonyms":\["grand total"\]
},
{
"expectedKey":"invoice number",
"synonyms":\["invoice no", "inv no"\]
}
]

CSV output warning

CSV is a format commonly used by spreadsheet programs. These programs commonly transform numerical data or formula, and will save these transformations, overwriting the original data. To prevent these transformations, we add an apostrophe to the start of any possible transformations.

For example, the phone number +44 115 496 0999 will appear as ‘+44 115 496 0999 in the CSV only.

The transformations are listed below.

  • Formula - these are generally for values that begin with +, -, =, or @, we add an apostrophe at the beginning for the CSV output. This prevents the CSV from producing unintended formulas and functions from these values.

  • Dates/Times – this covers many cases of date and time formats, as data can often be mistaken as a date or time, and then irreversibly transformed.

  • Long Numbers – this covers numbers that are 11 digits or longer, as they are transformed to decimal notation

We recommend removing the apostrophes when extracting the data. This only affects CSV output, so it may be easier to extract data from the other formats if possible.

Screen Field/ButtonDescription
Output File NameThe template for the output file, which can include %FILENAME (original file name).
Continue on ErrorSet to true if the job should continue processing files after a file has failed.
OCR LanguageAdd the codes of the languages for OCR and KVP extraction, separated by ‘+’. For example, eng+fra. Codes can be found in the OCR Language Codes section.
DPIDPI used when performing OCR on the file as part of the KVP extraction process.
KVP Output FormatThis setting determines the file output format(s). KVP data can be output in JSON, CSV and XML. e.g. json,csv,xml.
Page RangeUse the string of "1-5" for pages 1 to 5, or use the string of "1,5,6" to specify pages 1 and 5 and 6. You can use the string of "1,5,8-12" to specify pages 1, 5, 8 and all pages from page 8 to page 12, etc.
AutorotateAutomatically rotate the page if the text does not have the correct orientation.
Trim SymbolsSetting this to 'Yes' will remove any symbols from the start/end of values, with the exception of the hash '#' or period '.' symbols.
Include Key Bounding BoxSetting this to 'Yes' will include the bounding box values for the key in the output.
Include Value Bounding BoxSetting this to 'Yes' will include the bounding box values for the value in the output.
Include Page NumberSetting this to 'Yes' will include the page number of the key value pair in the output.
Include ConfidenceSetting this to 'Yes' will include the confidence score of the key value pair in the output. Confidence is measured between 0 (no confidence) and 100 (full confidence).
Confidence ThresholdThe value of confidence (0-100) that a KVP must reach to be included in the output. Results under this confidence threshold will be discarded.
Include TypeSetting this to 'Yes' will include the data type of the key value pair in the output.
Expected KeysThe path to a JSON file for the expected keys and synonyms.
DebugSelect true if you want to see more debug information.

Pattern redaction/pattern highlight

These steps use GDPicture libraries to identify and redact sensitive information (Redaction) or highlight important information (Highlight) in the input document based on a regular expression or terms list.

Screen Field/ButtonDescription
Output File NameThe template for the output file, which can include %FILENAME (original file name).
Continue on ErrorSet to true if the job should continue processing files after a file has failed.
PatternA Regex pattern. The input pdf will be searched for matches to this Regex pattern, and any matches will be redacted/highlighted.
Terms FilepathThe path to a text file containing a list of terms to redact/highlight. Each line will be treated as a pattern, and any matches will be redacted/highlighted.
Case SensitiveDetermined whether or not the regex pattern matching should be case sensitive.
RedThe amount of red color to be used for the redaction/highlighted region color. Use a value between 0 and 255. Default is 0.
GreenThe amount of green color to be used for the redaction/highlighted region color. Use a value between 0 and 255. Default is 0.
BlueThe amount of blue color to be used for the redaction/highlighted region color. Use a value between 0 and 255. Default is 0.
AlphaThe transparency value of the resulting region color. Use the value between 0 (full transparency) and 255 (full opacity). Default is 255.
DebugSelect true if you want to see more debug output.

Split PDF (GdPicture)

This step uses GDPicture libraries to split PDF files based on the ranges, bookmarks, or into single pages.

Screen Field/ButtonDescription
Output File NameTarget file template which can include %UNIQUEn (unique number starting at 1, zero padded to n digits) %FILENAME (original filename without the extension) and %PAGEn (first page of split, zero padded to n digits)
Continue on ErrorSet to true if the job should continue processing files after a file has failed.
Retain MetadataGenerated files will include metadata(such as Author and Title) from the original file.
Split TypeSets the way that the input file will be split. One of:
- Split into single pages
- Split by ranges (See below)
- Split by repeating ranges (See below)
- Split by bookmarks
RangesSet of page ranges separated by commas that defines which pages from the original should be extracted.
Repeat Every (Pages)Apply the page range to each set of Page Ranges pages within the document. For example, if 2-4 is specified for page ranges, and 4 is specified as the repeating range, then the range is re-applied every 4 pages.
Remove Unused ResourcesRemoves unused resources from a pdf file to minimize file size.

Split PDF by barcode

This step uses GDPicture libraries to identify different barcode types in a PDF, and split the PDF document at each instance of a barcode.

Screen Field/ButtonDescription
Output File NameTarget file template which can include %UNIQUEn or %INDEXn (unique number starting at 1, zero padded to n digits) %FILENAME (original filename without the extension) and %PAGEn (first page of split, zero padded to n digits)
Continue on ErrorSet to true if the job should continue processing files after a file has failed.
Read QRCodeSet this to true to recognize QRCode barcodes.
Read MicroQRSet this to true to recognize MicroQR barcodes.
Read DataMatrixSet this to true to recognize DataMatrix barcodes.
Read PDF417Set this to true to recognize PDF417 barcodes.
Read AztecSet this to true to recognize Aztec barcodes.
Read MaxiCodeSet this to true to recognize MaxiCode barcodes.
Read Industrial2of5Set this to true to recognize Industrial2of5 barcodes.
Read Inverted2of5Set this to true to recognize Inverted2of5 barcodes.
Read Interleaved2of5Set this to true to recognize Interleaved2of5 barcodes.
Read Iata2of5Set this to true to recognize Iata2of5 barcodes.
Read Matrix2of5Set this to true to recognize Matrix2of5 barcodes.
Read Code39Set this to true to recognize Code39 barcodes.
Read CodabarSet this to true to recognize Codabar barcodes.
Read BcdMatrixSet this to true to recognize BcdMatrix barcodes.
Read DataLogic2of5Set this to true to recognize DataLogic2of5 barcodes.
Read Code128Set this to true to recognize Code128 barcodes.
Read Code93Set this to true to recognize Code93 barcodes.
Read EAN13Set this to true to recognize EAN13 barcodes.
Read EAN8Set this to true to recognize EAN8 barcodes.
Read UPCASet this to true to recognize UPCA barcodes.
Read UPCESet this to true to recognize UPCE barcodes.
Read ADD5Set this to true to recognize ADD5 barcodes.
Read ADD2Set this to true to recognize ADD2 barcodes.
Page RangeSpecifies the page range to be scanned for barcodes. A value of * will scan every page for barcodes.
PatternA Regex pattern. The input pdf will be searched for matches to this Regex pattern, and any matches will be redacted.
DPIDPI of TIFF images generated or converted from the source PDF File. These images are then scanned for barcodes.
Retain MetadataGenerated files will include metadata(such as Author and Title) from the original file.
Remove Unused ResourcesRemoves unused resources from a pdf file to minimize file size.
LeftX coordinate of the Top Left Point of the rectangle you want to recognize the barcode.
TopY coordinate of the Top Left Point of the rectangle you want to recognize the barcode.
WidthWidth of the rectangle you want to recognize the barcode.
HeightHeight of the rectangle you want to recognize the barcode.

Pattern enumeration

This step uses GdPicture libraries to identify terms and/or a pattern, and it’ll produce a report based on the frequency of each term.

Screen field/buttonDescription
Output File NameThe template for the output file, which can include %FILENAME (original file name).
Continue on ErrorSet to true if the job should continue processing files after a file has failed.
PatternA regex pattern. The input PDF will be searched for matches to this regex pattern, and any matches will be redacted/highlighted.
Terms FilepathThe path to a text file containing a list of terms to redact/highlight. Each line will be treated as a pattern, and any matches will be redacted/highlighted.
Case SensitiveDetermines whether or not the regex pattern matching should be case sensitive.
Pass ThroughDetermines whether or not the input PDF will be copied to the output folder.
DebugSelect true if you want to see more debug output.

Get document information

This step uses GdPicture libraries to produce a report on the number of PDF pages that are searchable vs. an image. It’ll also calculate how many searchable pages are visible text pages vs. hidden text layer.

Screen field/buttonDescription
Output File NameThe template for the output file, which can include %FILENAME (original file name).
Continue on ErrorSet to true if the job should continue processing files after a file has failed.
Output FormatChoose the output format for the report:
- .txt
- .csv
- .json
- .xml
Pass ThroughDetermines whether or not the input PDF will be copied to the output folder.
DebugSelect true if you want to see more debug output.

Convert PDF to office

This step uses GdPicture libraries to convert PDF input files to various Office output formats, including .docx, .pptx, .xlsx, and .svg.

Screen field/buttonDescription
Output File NameThe template for the output file, which can include %FILENAME to give the input file name without extensions.
Continue on ErrorSet to true if the job should continue processing files after a file has failed.
Output FormatChoose the output format for the report:
- .docs
- .pptx
- .xlsx
- .svg
Enable ICCSpecifies if the converter shall favor preserving the ICC profile if present in the loaded document during the conversion.
Page RangeUse the string of "1-5" for pages 1 to 5, or use the string of "1,5,6" to specify pages 1 and 5 and 6. You can use the string of "1,5,8-12" to specify pages 1, 5, 8 and all pages from page 8 to page 12, etc.
Timeout MillisecondsSpecifies the timeout of the subsequent conversion process in milliseconds. Default value is -1, which means no timeout.
DebugSelect true if you want to see more debug output.

Convert any file to office

This step uses GdPicture libraries to convert various input file types to various Office output formats, including .docx, .pptx, .xlsx, and .svg. Not all file conversions are supported.

Screen field/buttonDescription
Output File NameThe template for the output file, which can include %FILENAME to give the input file name without extensions.
Continue on ErrorSet to true if the job should continue processing files after a file has failed.
Output FormatChoose the output format for the report:
- .doc
- .pptx
- .xlsx
- .svg
AuthorSet the Author metadata field in the output PDF. This can include %FILENAME% (original filen ame without the extension) or %DIRNAME% (directory name of the original file).
TitleSet the Title metadata field in the output PDF. This can include %FILENAME% or %DIRNAME%.
SubjectSet the Subject metadata field in the output PDF. This can include %FILENAME% or %DIRNAME%.
KeywordsSet the Keywords metadata field in the output PDF. This can include %FILENAME% or %DIRNAME%.
ProducerSet the Producer metadata field in the output PDF. This can include %FILENAME% or %DIRNAME%.
MetadataSet the Metadata field in the output PDF. This can include %FILENAME% or %DIRNAME%.
Inject Email HeaderSpecifies whether the email header should be injected into the output document.
Convert Email Attachments To OfficeSelect Yes if you want to convert email attachments to Office.
Email Attachments FilterA regular expression that specifies the attachments that will be converted to Office format. Attachments that don’t match will be skipped.
Email Page HeightSpecifies the page height, in points, of the resulting document when converting from the source email file.
Email Page WidthSpecifies the page width, in points, of the resulting document when converting from the source email file.
Email Page Margin BottomSpecifies the bottom page margin, in points, of the resulting document when converting from the source email file.
Email Page Margin LeftSpecifies the left page margin, in points, of the resulting document when converting from the source email file.
Email Page Margin RightSpecifies the right page margin, in points, of the resulting document when converting from the source email file.
Email Page Margin TopSpecifies the top page margin, in points, of the resulting document when converting from the source email file.
Email Prefer One PageSelect Yes if you want the email to be converted to a single page PDF if possible.
Enable ICCSpecifies if the converter shall favor preserving the ICC profile, if present in the loaded document during the conversion.
Html Emulation TypeSpecifies a type of media to emulate.
Html Page HeightSpecifies the page height, in points, of the resulting document when converting from the source HTML file.
Html Page WidthSpecifies the page width, in points, of the resulting document when converting from the source HTML file.
Html Page Margin BottomSpecifies the bottom page margin, in points, of the resulting document when converting from the source HTML file.
Html Page Margin LeftSpecifies the left page margin, in points, of the resulting document when converting from the source HTML file.
Html Page Margin RightSpecifies the right page margin, in points, of the resulting document when converting from the source HTML file.
Html Page Margin TopSpecifies the top page margin, in points, of the resulting document when converting from the source HTML file.
Html Prefer CSS Page SizeGive any CSS page size declared in the page priority over what is declared in Html Page Width and Html Page Height. If set to false, the renderer will scale the content to fit the paper size.
Html Prefer One PageSpecifies whether the output document should contain a single page.
Load Only First PageSpecifies that all executed actions with the loaded document will be processed using only the first page of the document.
Page RangeUse the string of "1-5" for pages 1 to 5, or use the string of "1,5,6" to specify pages 1 and 5 and 6. You can use the string of "1,5,8-12" to specify pages 1, 5, 8 and all pages from page 8 to page 12, etc.
Pdf Bitonal Image CompressionSets the scheme to be used to compress bitonal image data when converting/saving the currently loaded document to PDF format.
- 0: None
- 1: Flate
- 2: CCITT4
- 3: JPEG
- 4: JBIG2
- 5: JPEG2000
JBIG2 PMS ThresholdSets the threshold of the JBIG2 pattern matching and substitution. The default value is 0.85.
Pdf Color Image CompressionSets the scheme to be used to compress color image data when converting/saving the currently loaded document to PDF format.
Pdf Enable Color DetectionEnables or disables the automatic color detection feature when converting/saving the currently loaded document to PDF format.
Pdf Image QualitySets the level of quality used to compress images with a lossy compression scheme, which are embedded in the newly produced PDF document when converting/saving the currently loaded document to PDF format. It must be a value from 0 to 100. 0 means the worst quality and the best compression, while 100 means the best quality and the worst compression.
Pdf Use Deflate On JPEGSpecifies if the converter shall use additional Deflate compression for JPEG images in PDF output.
Rasterization DPISets the rendering resolution to be used when converting vector content to raster content, if any is included in the currently loaded document.
Render Sheets Headers and FootersSpecifies that the .xls and .xlsx headers and footers should be rendered. Affects XLSX/XLS input only.
Split Excel Sheets Into PagesSpecifies that .xls and .xlsx sheets should be split into pages according to the PageSetup element of each sheet. Affects XLSX/XLS input only.
Spreadsheet Bottom Margin OverrideSpecifies the spreadsheet bottom margin height in millimeters. If the height isn’t given or is negative, the margin specified in the document will be used instead. Affects XLSX/XLS input only.
Spreadsheet Left Margin OverrideSpecifies the spreadsheet left margin width in millimeters. If the height isn’t given or is negative, the margin specified in the document will be used instead. Affects XLSX/XLS input only.
Spreadsheet Maximum Content Height Per SheetDecimal value indicating the maximum height of the sheet content, in millimeters. Maximum content height ignores header and footer height. Affects XLSX/XLS input only.
Spreadsheet Maximum Content Width Per SheetDecimal value indicating the maximum width of the sheet content, in millimeters. Maximum content width ignores margins. Affects XLSX/XLS input only.
Spreadsheet Page Height OverrideSpecifies the spreadsheet page height in millimeters. If the height isn’t given or is negative, the page height specified in the document will be used instead. Affects XLSX/XLS input only.
Spreadsheet Page Width OverrideSpecifies the spreadsheet page width in millimeters. If the width isn’t given or is positive, the page width specified in the document will be used instead. Affects XLSX/XLS input only.
Spreadsheet Render Only Print AreaFor spreadsheet, specifies that for each sheets only the print areas must be rendered. If no print area exists, the whole sheets will be rendered. Affects XLSX/XLS input only.
Spreadsheet Right Margin OverrideSpecifies the spreadsheet right margin width in millimeters. If the width isn’t given or is positive, the page height specified in the document will be used instead. Affects XLSX/XLS input only.
Spreadsheet Top Margin OverrideSpecifies the spreadsheet top margin height in millimeters. If the width isn’t given or is positive, the page width specified in the document will be used instead. Affects XLSX/XLS input only.
Tiff Enable Exif RotateSpecifies whether TIFF encoder is using the Exif rotate flag to handle page rotations.
Timeout MillisecondsSpecifies the timeout of the subsequent conversion process, in milliseconds. Default value is -1, which means no timeout.
Txt Font BoldSpecifies whether the font used for the resulting document when converting from the source TXT file must have a bold style.
Txt Font ItalicSpecifies whether the font used for the resulting document when converting from the source TXT file must have an italic style.
Txt Font FamilySpecifies the name of the font to be used for the resulting document when converting from the source TXT file.
Txt Font SizeSpecifies the text size, in points, to be used for the resulting document when converting from the source TXT file.
Txt Horizontal Text AlignmentSpecifies the horizontal text alignment of the resulting document when converting from the source TXT file.
Txt Page HeightSpecifies the page height, in points, of the resulting document when converting from the source TXT file.
Txt Page WidthSpecifies the page width, in points, of the resulting document when converting from the source TXT file.
Txt Page Margin BottomSpecifies the bottom page margin, in points, of the resulting document when converting from the source TXT file.
Txt Page Margin LeftSpecifies the left page margin, in points, of the resulting document when converting from the source TXT file.
Txt Page Margin RightSpecifies the right page margin, in points, of the resulting document when converting from the source TXT file.
Txt Page Margin TopSpecifies the top page margin, in points, of the resulting document when converting from the source TXT file.
DebugSelect true if you want to see more debug output.

Step type properties

Each of the Step Types referred to in the previous section will have a set of properties such as that shown below for “Convert any File to PDF”. Each property has a description associated with it which is displayed when the property is highlighted.

Conversion Settings for Step Type

To look for a property, you can either use the scroll bar on the right-hand side or the search bar at the top. The search bar looks for an exact match of the text that you type but will offer suggestions that start with the text you have currently typed. Selecting a suggestion will jump you to the property and select it for editing.