Job Designer

This allows definition and editing of a job definition, using a tree-list type model coupled with a Visual Studio – style property list. The different step types are listed on the left under the Designer Task group box. The step types have been grouped into sub categories, each step type will have its own icon. Drag and drop can be used to allow reordering of steps.

Menu ItemAction
Run NowExecutes the job that is being edited, the output is displayed in the Run tab screen.
SaveValidates the current job and if valid, save the current job definition to %JOBID%.xml in the %JOBDEFDIR% directory.
OCR

This expander contains the steps that perform OCR, Autobahn will gray out the invalid steps. The step types in these groups are:

  • Image to Searchable PDF (Standard)

  • Image to Searchable PDF (Extended)

  • PDF to Searchable PDF (Standard)

  • PDF to Searchable PDF (Extended)

  • Any File to Searchable PDF (Standard)

  • Any File to Searchable PDF (Extended)

  • Merge Image to Searchable PDF (Standard)

  • Merge Image to Searchable PDF (Extended)

  • PDF to Searchable PDF (GdPicture)

Convert
  • Convert PDF to TIFF

  • Convert Any File to PDF

  • Convert PDF to PDFA

  • Convert Any File to PDF (GdPicture)

  • Combine Any File to PDF

  • PDF to JPEG

  • PDF to PNG

  • PDF to TIFF

  • PDF to Text

Split and Merge
  • Merge PDF

  • Split PDF

  • Merge TIFF

  • Split TIFF

  • Combine PDFs

Connectors
  • Read Mailbox

  • Send Documents

  • SharePoint Download

  • SharePoint Upload

  • Azure Storage Download

  • Azure Storage Upload

Barcode
  • Barcode TIFF/PDF

PDF Operations
  • Set PDF Properties

  • Create XML Property File

  • Extract Text from PDF File

  • Optimize PDF

  • Stamp PDF Files

  • Modern Compress PDF

  • Validate PDFA

  • Linearize PDF

  • Create PDF Portfolio

Advanced
  • Custom Script Step

  • High Availability

  • Kingfisher Job

  • Distributed Polling

  • PDF Recognition to JSON

  • Image to Searchable PDF (Microsoft Cloud)

  • PDF to Searchable PDF (Microsoft Cloud)

  • Image to Searchable PDF (Google Cloud)

  • PDF to Searchable PDF (Google Cloud)

  • Smart Redaction

  • Key Value Pair Extraction

  • Pattern Redaction

  • Split PDF (GdPicture)

  • Split by Barcode

Delete StepDeletes the currently selected step node.
Clear ErrorClick this before you can run a job that is in an error state
HelpTakes you to the ‘Help’ tab, which has links to many useful blogs, documents, and other resources. It also has contacts if you need help from our support or sales team.

Fields

Menu Item Description
Job ID A sequential Job ID is allocated for the Job by Autobahn DX. This cannot be changed.
Job Name A descriptive title for the job.
Source Folder The folder containing the documents to be processed.
Destination Folder The folder where the processed files will be placed if “Move input files to target folder after processing” is chosen.
Use Work Folders By default, Autobahn DX processes job steps by using a separate folder for each step. Hence files from the source folder are copied to a work folder, processed for each step to another work folder and then finally to the target folder. This approach ensures integrity (e.g., correctly processing files that are added to the source folder after a job has started) but can slow down large jobs.
Process Sub-Folders If checked, all sub-folders will be recursively processed.
Delete Empty Input Folders Checking this property will delete empty folders under the source folder after we move or delete your input files.
Input files This option determines what happens to the input files once processing has been completed. The options are:
Leave input files after processing: Files are left in the Source Folder.
Move to archive after processing: Files are moved to the Archive Folder.
Copy to archive after processing: Files are copied to the Archive Folder.
- Move input files to target folder after processing*:* Input files are placed in the same folder as the output files.
Delete input files after successful processing: Input files are deleted.
Rename Input Files This determines how input files will be renamed when moved to the Target or Archive folder. The default is: %FILENAME%%TIMESTAMP%.%EXT%. You can also use %EMAILNAME% for files named in the email format. This will rename the file to it’s original name.
Filter Files
Filter Files OptionDescription
Include Files MatchingOnly files matching the Filter Expression are included.
ExcludeFiles matching the Filter Expression are excluded.
Include with Document Count LimitFor example, “*.pdf; 3000” would limit the job to 3000 PDF files.
Include Unprocessed PDFs Only

This would limit files selected to PDFs that have not been OCRed.

A file is deemed to have been OCRed if either

  1. It has a custom metadata tag “AQUAFORESTOCR”

  2. It has one image per page and only has “invisible” text.

This should be used in conjunction with a “Non-Image PDF” setting of “Rasterize and OCR” to ensure that all PDF files are processed.

Include Unprocessed PDFs Only – with Document Count LimitAs above, but limited to the number of files specified in the filter.

N.B.: Work Folders must be used to enable the use of filters.

Filter Expression One or more search options used to determine the files in the source folder that should be processed. Multiple expressions may be used, separated by spaces. Examples: *.pdf, *.doc *.ppt *.xls
Batch Size Limits the number of documents to be processed to the given size. To use this feature, you must use a “Filter File Option” with “Document Count Limit”.
File Order The order that the files will be processed. There are UTC and local time variants of the date options, totaling nine options:Alphabetically, Created Date (Ascending), Created Date (Descending), Modified Date (Ascending), Modified Date (Descending). Note: this setting does not work for “Merge Image to PDF…” steps, the merge and OCR must be done in two separate job steps.
Log File Path of the job log file. This will include %DATESTAMP%, which is the date of the day the job started. A new log file will be created for each day.
CSV Log File Path of the job log file. This will include %DATESTAMP%, which is the date of the day the job started. A new CSV file will be created for each day. The columns in the CSV file are:
Job Start – Time Job Started
Source Files – Full path to the source file
Target File – Full path to the target file
Job Stopped – Time Job Finished
Success – True or False; Files that could not be processed will have a value of False.
Page counts (not all steps generate page counts and dependent on configuration setting)
Retention Period This is an integer value representing the number of days the log file will be kept for before being deleted. Leaving it blank or setting it to a number less than one will keep the log files indefinitely.
Max Size Set the maximum log file size. If a log file is created above this size, it will be split into smaller segments.
Stop Processing on Error If checked, the job will stop if it returns an error, and will not run again until the error is cleared from the Monitor screen.
Skip Long File Names Check this box to make Autobahn DX skip files with long filenames. If this box is not checked, Autobahn DX will throw an error if it encounters one of these files.
Skip Folders That Autobahn Can’t Access Check this box to make Autobahn DX Folders it has no permissions to access, if this box is not checked, Autobahn DX will throw an error if it encounters one of these folders.
Archive Folder The folder where the processed files will be placed if “Move to archive after processing” is chosen.
Work Folder The folder where files will be temporarily stored during conversion and processing.
Error Folder Source documents that have errors during processing will be placed in the specified folder.
Temp Folder Some job steps can require a significant amount of temporary storage, particularly those steps involving OCR. This folder defines the location of the temporary space.
Trigger File You can find this setting under the Processing tab, if you provide a Trigger File value, Document Automation Server will not process a folder until the Trigger File is present, the file will be deleted after each job run.

Job Scheduling

To use the Job Schedule, you will need to click the Schedule tab under the Designer Tab.

The product supports three types of scheduling which are implemented via the Autobahn DX service:

Ad-Hoc

This means that the job does not have any fixed schedule, but maybe explicitly run via the management GUI or via one of the API methods.

Watched Folder / Continuous Scheduling

This allows the job to be scheduled to run periodically between a start time and end time each day. The periods may be seconds, minutes, or hours. For example, a job may be specified to run every 30 seconds between 9:00 and 17:00.

If you check the “Run Continuously” checkbox, the job will run for 24 hours a day. This option is the default for all continuous jobs.

Daily Scheduling

This allows the job to be scheduled to run at a specified time each day.

Alerts

This allows you to send Emails to your mailbox when the job is successful or fails, to get to the Alerts tab, you will need to click the Alerts tab under the Designer Tab.

Note: You will need to enter your SMTP setting in the modules and options tab before the email alerts will work properly.

Menu Item Action
Send Email Alerts on Job Completion If checked, Autobahn DX will send an email if the job ends naturally or prematurely. This alert can be further tailored using the properties In the section below.
Only Send Email Alerts if
At least one file was processed If you check this option, Autobahn DX will not send any email until it processes at least one file in the job. This is meant to reduce the number of irrelevant messages you get.
Job Terminated Prematurely Check this if you only want to receive emails when an error occurs during the processing of a job. Note: Individual file errors will not put the job in error, a job error occurs in a more fatal circumstance.
At least one file error occurred Check this option if you only want to receive emails when individual file errors occur.
Attach Log File Check this option if you want Autobahn DX to attach the Log file of the job to the email alert.
Attach Job Report Check this option if you want Autobahn DX to attach a report/summary of the job to the email alert.
From Email Address The “from” email address that will be used for the message.
To Email Address The email address that the message will be sent to.
Email Title The title of the email.
Email Message The body of the email, this can be HTML content.

Alert variables

When sending emails, there are several variables that can be used to customize the alerts you send out, these variables are enclosed by two percent signs “%%”. Autobahn DX will replace any occurrences of the variables with an appropriate value at run time. The table below shows the possible variables that can be used.

Variable Meaning
%JOBID% The Job ID, this works with both the email title and email message.
%JOBNAME% The Job Name, this works with both the email title and email message.
%JOBSTATUS% The Job Status, this works with both the email title and email message.
%LOGFILE% The location of the log file, this works with both the email title and email message.
%JOBSOURCE% The Source Directory of the job, this works with the email message only.
%JOBTARGET% The Destination Directory of the job, this works with the email message only.
%DATESTAMP% The date that the alert was generated, this works with both the email title and email message.
%TIMESTAMP% The time the alert was generated, this works with both the email title and email message.

Workflow Processing versus In-Place Processing

Autobahn DX is designed as a Workflow product where there is an input folder and an output folder. At the end of the process, there are options to copy, delete or move the input files that have been successfully processed.

With “in-place” processing, the input documents are turned into searchable PDFs and returned to the same location. It is possible to replace the existing file if the output file format produces the same file name. The input files can be copied to an archive location if they need to be kept (this is recommended during the development process and during testing – if this is not set, the original file cannot be recovered).

Autobahn DX can be used for in-place processing, but we have an OCR product named Aquaforest Searchlight that is designed specifically for in-place conversions to searchable PDFs, it may handle this Use-Case more effectively. Searchlight records all the files it processes, so is more efficient when there are a lot of files, as they do not need to be opened to be identified as previously processed.

Example In-Place Job Setup

The job shown below will convert PDFs under the tree “C:\ADX Demo\Documents” to searchable PDFs, processing up to 5 files each time the job is run.

The Source Folder and the Target Folder must be the same.

The Use Work folders check box must be checked when processing in place. A message will be displayed when the folders are set to the same location in the UI and the check box set automatically.

Graphical user interface, text, application Description automatically generated

Select the Process Sub-Folders check box.

For Audit Purposes, the Input Files option should be set to Copy to archive after Processing.

To avoid re-processing files, select the Include Unprocessed PDFs Only – with Document Count Limit option in the Filter Files combo box.

Because the Filter Files option selected includes the Document Count Limit, the Batch Size of the job can be set to 5 files per run (You can increase this to a suitable batch size).

The Output file Name is set in the Conversion Settings for the step and should be configured to %FILENAME.pdf so that it will replace the input file

Step Types

This section explains each of the step types.

Autobahn DX Server edition is licensed to use Standard and GDPicture steps. The Extended edition adds the Extended OCR steps

Step group Step name
OCR Image to Searchable PDF (Standard)
OCR Image to Searchable PDF (Extended)
OCR PDF to Searchable PDF (Standard)
OCR PDF to Searchable PDF (Extended)
OCR Any File to Searchable PDF (Standard)
OCR Any File to Searchable PDF (Extended)
OCR Merge Image to Searchable PDF (Standard)
OCR Merge Image to Searchable PDF (Extended)
OCR PDF To Searchable PDF (GdPicture)
Convert Convert PDF to TIFF
Convert Convert Any File to PDF
Convert Convert PDF to PDFA
Convert Convert Any File To PDF (GdPicture)
Convert Combine Any File To PDF
Convert PDF To JPEG
Convert PDF To PNG
Convert PDF To TIFF
Convert PDF To Text
Split and Merge Merge PDF
Split and Merge Split PDF
Split and Merge Merge TIFF, JPEG, BMP, PNG, GIF
Split and Merge Split TIFF
Split and Merge Combine PDFs
Split and Merge Split PDF (GdPicture)
Connectors Read Mailbox
Connectors Send Documents
Connectors SharePoint Download
Connectors SharePoint Upload
Connectors Azure Storage Download
Connectors Azure Storage Upload
Barcode Barcode TIFF/PDF
Barcode Split by Barcode
PDF Operations Set PDF Properties
PDF Operations Create XML Property File
PDF Operations Extract Text from PDF File
PDF Operations Optimize PDF
PDF Operations Stamp PDF Files
PDF Operations Modern Compress PDF
PDF Operations Validate PDFA
PDF Operations Linearize PDF
PDF Operations Create Pdf Portfolio
Advanced Custom Script Step
Advanced High Availability
Advanced Kingfisher Job
Advanced Distributed Polling
Advanced PDF Recognition to JSON
Advanced Image to Searchable PDF (Microsoft Cloud OCR)
Advanced PDF to Searchable PDF (Microsoft Cloud OCR)
Advanced Image to Searchable PDF (Google Cloud OCR)
Advanced PDF to Searchable PDF (Google Cloud OCR)
Advanced Detect Signatures
Advanced Smart Redaction
Advanced Key Value Pair Extraction
Advanced Pattern Redaction

Image To Searchable PDF

This step can be found under the OCR Expander. It creates a searchable PDF file from input image types e.g. .png, .tiff, .jpg, .gif, .bmp.

Depending upon the Step Type Properties chosen, a separate text, HTML and Office files may be produced from the OCR process.

This step is not available for the GDPicture engine; however, it can be replicated by using a combination of the Convert Any File To PDF (GdPicture) and PDF To Searchable PDF (GdPicture) steps

Standard Engine

Parameter Notes
Output File Name Target file template which can include %FILENAME (original filename without the extension) and %DIRNAME (directory name of the original file)
Create Directories if Required Force creation of any output directories if they do not already exist.
Continue on Error Continue processing TIFF files after an error occurs.
OCR Choose “No” to generate an image-only PDF. Choose “Yes” to generate searchable PDF and/or text files
OCR Language Select the language the original file is written in. This will determine the dictionary that is used.
Deskew Straighten the image.
Auto-Rotate Automatically rotate pages so that text flows left to right.
Despeckle Remove specks below the specified pixel size from the image.
OCR to Text File Choose “Yes” to Generate text Output
Output File - Plain Text (txt).
- Plain Text (txt) No PDF
- MS Word (rtf)
- HTML
PDF/A Options Select the output PDF/A compliant version you would like the output PDF to be.
- PDF/A1-b
- PDF/A2-b
- PDF/A3-b
Validate PDF/A Whether or not to validate the PDF/A document after conversion
JBIG2 Compression This option will compress bitonal images in generated PDFs using JBIG2 compression rather than the default Group 4 compression scheme. This will result in smaller PDF file sizes, at a cost of increased processing time.
Box/Graphics Options By default, if an area of the document is identified as a graphic area, then no OCR processing is run on that area. However, certain documents may include areas or boxes that are identified as “graphics” or “picture” areas but do contain useful text. To ensure that the OCR engine can be forced to process such areas there are two options:
“Treat all Graphics Areas as Text”. This option will ensure the entire document is processed as text.
“Remove Box Lines in OCR Processing”. This option is ideal for forms where sometimes boxes around text can cause an area to be identified as graphics. This option removes boxes from the temporary copy of the imaged used by the OCR engine. It does not remove boxes from the final image. Technically, this option removes connected elements with a minimum area (by default 100 pixels).
Line Removal in OCR Processing This removes lines and boxes during OCR processing to improve recognition – particularly in cases where characters “touch” lines.
MRC This enables Mixed Raster Compression which can dramatically reduce the output size of PDFs comprising Color scans.
Save Pre-Despeckle This will use the original image (i.e., before applying pre-processing) in the output PDF. The default value is true.
StampName This has been deprecated, use the Stamp PDF Files step.
StampValue This has been deprecated, use the Stamp PDF Files step.
Advanced Flags Command line flags to be passed through to the underlying executable.
Maximum Cores This specifies the number of parallel files you want to be processed at a given time. Note: This needs a multi-core license and the number of cores used will depend on the availability of cores.
Debug Set this to true to execute the step in debug mode.

Extended Engine

Parameter Notes
Output File Name The output filename excluding the extension (which will be added according to the output file type).
Output File Type One or more of the following, separated by commas if more than one is required.
- CSV *
- DOCX
- EPUB
- EXCELML *
- HTM
- OPENTXT
- PDF
- RTF
- TXT
- WORDML
- XLSX *
- XPS
*These output formats are suitable for table-oriented pages that can be mapped onto a spreadsheet format.
Create Folders If Required Create an output folder if it does not exist. Default true.
OCR Engine The OCR engine to use. This must be set to use the IRIS engine.
OCR Language 1-8 You can set up to eight different languages for OCR recognition on one page, only if they are in the same character set. English is available as a language
Automatic language detection Property that enables or disables the Auto Language Detection feature. The aim of this feature is to detect the most probable language of a single-language page. If at least one language has been detected, recognition will be performed in the first language candidate that has been detected, and not in the language(s) set through Language or Languages. If it fails to detect a language, recognition will be performed using the language(s) set through Language or Languages.
Auto rotate Detect page orientation and correct if required
Deskew Rotates the image to correct its skew angle.
Advanced Deskew Set this to true to define advanced deskew properties.
Force Deskew Under certain circumstances, rotating the image to correct its skew angle may decrease the OCR accuracy. The extended engine is able to analyze the image and detect from an OCR accuracy point of view whether it’s better to rotate the image or not. Because the skew angle may be visible in the output document (e.g. if KeepDeskew is set to ‘true’), you can choose to force the deskew to rotate the image, even if it affects the accuracy. If turned off, the image is analyzed before rotation and the engine may choose not to rotate the image depending on the analysis result. If turned on, the image is rotated to correct skew angle.
Adjustment Mode Set the behavior regarding dimension adjustment for deskew operation.
Despeckle Removes all the groups of connected pixels with a few pixels below the parameter. Suggested range: 1-20.
Advanced Despeckle Set the advanced despeckle settings, advanced despeckle provides advanced image noise reduction features by the image despeckle filter.
Remove White Pixels By default, Advanced Despeckle removes black pixels. If this setting is set to ‘true’, white pixels will be removed instead of black pixels.
Dilate Despeckle removes all the groups of connected pixels with a few pixels below the SpeckleSize parameter. Those connected pixels are not removed if the distance to a larger connected component is below this parameter. As a result, only the isolated pixels get deleted. The maximum value for this property is 20 pixels. The default value is ‘0’.
Layout The layout for the docx or rtf document.
- Standard
- Flow
PDFVersion
PDF version This determines the PDF version of the generated PDF:
- 1.4
- 1.5
- 1.6
- 1.7
- 1.7 Extension Level 3
- 1.7 Extension Level 5
- 1.7 Extension Level 8
- PDF/A-1a
- PDF/A-1b
- PDF/A-2a
- PDF/A-2b
- PDF/A-3a
- PDF/A-3b
Remove Blank Page Set this to true to remove blank pages from Tiff or PDF documents. Value needs to be set for sensitivity (see below).
Sensitivity The sensitivity, from 1 to 100. With high sensitivity, fewer blank pages are detected.
Work Depth This parameter (0 – 255) defines how deeply the OCR engine will analyze a page with 255 being the deepest. For poorer quality documents, higher values can give better recognition results.
JPEG Quality This parameter (0 – 255) determines the compression/quality of color JPEG images in generated PDFs. 0 gives the smallest file size whilst 255 gives the best quality. The default value is 128.
JPEG2000 Compression Enable/Disable JPEG2000 Compression.
JPEG2000 Compression Mode The JPEG2000 Compression Mode to use.
JPEG2000 Compression Value The Value to set for the selected Compression Mode.
IHQC Compression Apply Intelligent High-Quality Compression
IHQC Compression Level Level 1 is the basic compression level while level 3 is the most advanced Intelligent High-Quality Compression Mode.
IHQC Quality Factor The quality Factor for IHQC
No OCR Whether are not to perform OCR on the document (Yes to not perform OCR, No to perform OCR).
Binarization Whether or not to perform binarization on the document.
Brightness The brightness (higher values will make the result darker).
Contrast The contrast (lower values will make the result darker).
Smoothing Level Smoothing may be useful to binarize text with a colored background to avoid noisy pixels (0 disables smoothing, higher values smooth more).
Undithering Whether or not to use automatic undithering while processing a page. NOTE: Automatic undithering will be applied only if smoothing is also activated (Smoothing Level). Dithering is a scanning technique which consists in representing a color or grayscale image using only a limited color palette. This allows reducing file size while maintaining the general aspect of the image. This technique is known to create images more difficult to handle for OCR technology; therefore specific image preprocessing is needed to detect and revert it.
Threshold Sets the threshold for fixed threshold binarization (0 for automatic threshold computation).
Remove Lines Whether or not to remove lines from an image (The image must be black and white).
Horizontal Clean X The parameter for cleaning noisy pixels attached to the horizontal lines.
Horizontal Clean Y The parameter for cleaning noisy pixels attached to the horizontal lines.
Vertical Clean X The parameter for cleaning noisy pixels attached to the vertical lines.
Vertical Clean Y The parameter for cleaning noisy pixels attached to the vertical lines.
Horizontal Dilate The dilate parameter that helps the detection of horizontal lines.
Vertical Dilate The dilate parameter that helps the detection of vertical lines.
Horizontal Max Gap The maximum horizontal line gap to close. It is useful to remove broken lines.
Vertical Max Gap The maximum vertical line gap to close. It is useful to remove broken lines.
Horizontal Max Thickness The maximum thickness of the horizontal lines to remove. It is useful to keep vertical lines larger than this parameter. Can be also useful to keep vertical letter strokes.
Vertical Max Thickness The maximum thickness of the vertical lines to remove. It is useful to keep horizontal lines larger than this parameter. Can be also useful to keep horizontal letter strokes.
Horizontal Min Length The minimum length of the horizontal lines to remove.
Vertical Min Length The minimum length of the vertical lines to remove.
Remove Dark Borders Removes the dark surrounding from bitonal, grayscale or color images. The dark surrounding of the image is whitened (Note: The dark border should be touching the edge of the page for this to work).
Punch Hole Removal Attempts to remove punch holes from pages.
Note: The punch hole algorithm can be used on images with the following minimum dimensions width: 300px, height: 100px (computed for 300 DPI). The minimum height and width can vary with the image resolution.
Interpolation Interpolates the source image to the given resolution. This value (the target resolution) must be greater than the source image’s resolution.
Interpolation Mode Sets the interpolation mode.
Keep Original Image Set this to true if you want to use the pre-processed image for OCR but keep the original image in the output document. The default value is ‘true’. Note: This property only applies when processing image files or when processing PDF files with the Convert To TIFF set to Yes
Keep Deskewed Image Set this to true if you want to use the deskewed image in the output document. Note: This property only applies when Keep Original Image is set to No
Keep Despeckled Image Set this to true if you want to use the despeckled image in the output document. This requires the source image to be black and white. Note: This property only applies when Keep Original Image is set to No
Keep Dark Border Removal Set this to true if you want to use the image after dark borders have been removed, in the output document. Note: This property only applies when Keep Original Image is set to No
Keep Punch Hole Removal Set this to true if you want to use the image after punch holes have been removed, in the output document. Note: This property only applies when Keep Original Image is set to No

PDF to Searchable PDF

Creates a searchable PDF file from the set of images from an image-only PDF file.

Depending upon the Step Type Properties chosen, a separate text, HTML and Office files may be produced from the OCR process.

Standard Engine

ParameterNotes
Output File NameTarget file template which can include %FILENAME (original filename without the extension) and %DIRNAME (directory name of the original file)
Create Directories if RequiredForce creation of any output directories if they do not already exist.
Continue on ErrorContinue processing TIFF files after an error occurs.
OCR

Choose “No” to generate an image-only PDF.

Choose “Yes” to generate searchable PDF and/or text files.

OCR LanguageSelect the language the original file is written in. This will determine the dictionary that is used.
DeskewStraighten the image.
Auto-RotateAutomatically rotate pages so that text flows left to right.
DespeckleRemove specks below the specified pixel size from the image.
OCR to Text FileChoose “Yes” to Generate text Output
Output File
  • Plain Text (txt).

  • Plain Text (txt) No PDF

  • MS Word (rtf)

  • HTML

Non-Image PDFs

This allows control over the treatment of non-image PDFs, i.e. PDFs that have some text in them as well as images. The options are:

  • OCR: The document will be OCRed using the image method defined by “Image Method”

  • Raise Error: The task will terminate with an error. If “On Error Continue” is set, this then behaves as Skip. This is the default.

  • Skip: The document will not be processed.

  • Pass Through: The file will not be processed, but a copy of the document will be made and named as if the processing had occurred.

Remove Hidden TextThis applies only when a PDF is being used as the source for OCR. When set to true this will not include any searchable text layers that already exist from the source document. Such functionality might be useful if the source document was created by OCR of an image only PDF or other image file and the quality of the text from the previous OCR is poor. NOTE: There is no way to distinguish text added as a result of OCR from text added by other means and as a result, this option should be used with care.
Convert to TIFF

Choose the method for PDF image extraction.

  • No – (Native)

  • Yes – (Convert to TIFF)

DPIWhen OCRing a PDF, the PDF is rasterized to produce a TIFF file which is then OCRed. By default, the TIFF image resolution is determined from the images embedded in the source PDF but this flag can be used to override default processing and specify the DPI of the TIFF that will be generated.
TIFF Compression

Sets the Compression for the TIFF file used if the “Convert To TIFF” Option above is used.

  • Auto (Selects Group 4 if the page is Black AND White else it uses LZW Compression)

  • Group 4 (Black and White)

  • LZW (Colored)

Retain MetadataCopy metadata from the source PDF to the Searchable result PDF.
Retain BookmarksCopy bookmarks from the source PDF to the Searchable result PDF.
Retain Viewer PreferencesRetains any PDF Viewer Preferences, Page Mode and Page Layout from the source file in the output when using Convert To TIFF=’Yes’.
PDF/A Options

Select the output PDF/A compliant version you would like the output PDF to be.

  • PDF/A1-b

  • PDF/A2-b

  • PDF/A3-b

Validate PDF/AWhether or not to validate the PDF/A document after conversion
Box/Graphics Processing

By default, if an area of the document is identified as a graphic area then no OCR processing is run on that area. However, certain documents may include areas or boxes that are identified as “graphics” or “picture” areas but that actually do contain useful text.

To ensure that the OCR engine can be forced to process such areas there are two options:

“Treat all Graphics Areas as Text”. This option will ensure the entire document is processed as text.

“Remove Box Lines in OCR Processing”. This option is ideal for forms where sometimes boxes around text can cause an area to be identified as graphics. This option removes boxes from the temporary copy of the imaged used by the OCR engine. It does not remove boxes from the final image. Technically, this option removes connected elements with a minimum area (by default 100 pixels).

Line Removal in OCR ProcessingThis removes lines and boxes during OCR processing to improve recognition – particularly in cases where characters “touch” lines.
JBIG2 CompressionThis option will compress bitonal images in generated PDFs using JBIG2 compression rather than the default Group 4 compression scheme. This will result in smaller PDF file sizes, at a cost of increased processing time.
MRC CompressionApplies Mixed Raster Compression which can drastically reduce the size of PDF documents.
Save Pre-DespeckleThis will use the original image (i.e. before applying pre-processing) in the output PDF. The default value is true.
StampNameThis has been deprecated, use the Stamp PDF Files step.
StampValueThis has been deprecated, use the Stamp PDF Files step.
Advanced FlagsCommand line flags to be passed through to the underlying executable.
Maximum Cores

This specifies the number of parallel files you want to be processed at a given time.

Note: This needs a multi-core license and the number of cores used will depend on the availability of cores.

Password Files

This option specifies what Autobahn does when it encounters a password protected PDF file. The file will be copied to the password sub directory in the Error Folder.

  • Take no action.

  • Move to Error Folder

  • Copy to Error Folder

DebugSet this to true to execute the step in debug mode.

Extended Engine

Parameter Notes
Output File Name The output filename excluding the extension (which will be added according to the output file type).
Output File Type One or more of the following, separated by commas if more than one is required.
- CSV *
- DOCX
- EPUB
- EXCELML *
- HTM
- OPENTXT
- PDF
- RTF
- TXT
- WORDML
- XLSX *
- XPS
*These output formats are suitable for table-oriented pages that can be mapped onto a spreadsheet format.
OCR Engine The OCR engine to use. This must be set to use the IRIS engine.
OCR Language 1-8 You can set up to eight different languages for OCR recognition in one page as long as they are in the same character set.
Automatic language detection Property that enables or disables the Auto Language Detection feature. The aim of this feature is to detect the most probable language of a single-language page. If at least one language has been detected, recognition will be performed in the first language candidate that has been detected, and not in the language(s) set through Language or Languages. If it fails to detect a language, recognition will be performed using the language(s) set through Language or Languages.
Auto rotate Detect page orientation and correct if required
Deskew Rotates the image to correct its skew angle.
Advanced Deskew Set this to true to define advanced deskew properties.
Force Deskew Under certain circumstances, rotating the image to correct its skew angle may decrease the OCR accuracy. The extended engine is able to analyze the image and detect from an OCR accuracy point of view whether it’s better to rotate the image or not. Because the skew angle may be visible in the output document (e.g. if KeepDeskew is set to ‘true’), you can choose to force the deskew to rotate the image, even if it affects the accuracy. If turned off, the image is analyzed before rotation and the engine may choose not to rotate the image depending on the analysis result. If turned on, the image is rotated to correct skew angle.
Adjustment Mode Set the behavior regarding dimension adjustment for deskew operation.
Despeckle Removes all the groups of connected pixels with a number of pixels below the parameter. Suggested range: 1-20.
Advanced Despeckle Set the advanced despeckle settings, advanced despeckle provides advanced image noise reduction features by the image despeckle filter.
Remove White Pixels By default, Advanced Despeckle removes black pixels. If this setting is set to ‘true’, white pixels will be removed instead of black pixels.
Dilate Despeckle removes all the groups of connected pixels with a few pixels below the SpeckleSize parameter. Those connected pixels are not removed if the distance to a larger connected component is below this parameter. As a result, only the isolated pixels get deleted. The maximum value for this property is 20 pixels. The default value is ‘0’.
Retain Bookmark This option allows you to retain the bookmarks in the new PDF if the old PDF was Converted to TIFF before it was OCRed. Note: This Will Only work if:
“Extract Images Method = Convert to TIFF”
Retain Metadata This option allows you to retain the metadata in the new PDF if the old PDF was Converted to TIFF before it was OCRed. Note: This Will Only work if:
“Convert to TIFF = Yes”
Layout The layout for the docx or rtf document
- Standard
- Flow
PDFVersion This determines the PDF version of the generated PDF:
- 1.4
- 1.5
- 1.6
- 1.7
- 1.7 Extension Level 3
- 1.7 Extension Level 5
- 1.7 Extension Level 8
- PDF/A-1a
- PDF/A-1b
- PDF/A-2a
- PDF/A-2b
- PDF/A-3a
- PDF/A-3b
Note: This will only work if:
“Extract Images Method = Convert to TIFF”
Extract Images Method Whether to convert the images in a PDF document to TIFF or not.
- Convert to TIFF – The pages in the PDF document are rasterized and saved as TIFF images
- Native - This method places the OCRed text directly into a copy of the original PDF rather than creating an entirely new PDF.
Remove Blank Page Set this to true to remove blank pages from Tiff or PDF documents. Value needs to be set for sensitivity (see below).
Sensitivity The sensitivity, from 1 to 100. With high sensitivity, fewer blank pages are detected.
Work Depth This parameter (0 – 255) defines how deeply the OCR engine will analyze a page with 255 being the deepest. For poorer quality documents, higher values can give better recognition results.
JPEG Quality This parameter (0 – 255) determines the compression/quality of Color JPEG images in generated PDFs. 0 gives the smallest file size whilst 255 gives the best quality. The default value is 128.
JPEG2000 Compression Enable/Disable JPEG2000 Compression.
JPEG2000 Compression Mode The JPEG2000 Compression Mode to use.
JPEG2000 Compression Value The Value to set for the selected Compression Mode.
IHQC Compression Apply Intelligent High-Quality Compression
IHQC Compression Level Level 1 is the basic compression level while level 3 is the most advanced Intelligent High-Quality Compression Mode.
IHQC Quality Factor The quality Factor for IHQC
Binarization Whether or not to perform binarization on the document.
Brightness The brightness (higher values will make the result darker).
Contrast The contrast (lower values will make the result darker).
Smoothing Level Smoothing may be useful to binarize text with a colored background to avoid noisy pixels (0 disables smoothing, higher values smooth more).
Undithering Whether or not to use automatic undithering while processing a page. NOTE: Automatic undithering will be applied only if smoothing is also activated (Smoothing Level). Dithering is a scanning technique which consists in representing a color or grayscale image using only a limited color palette. This allows reducing file size while maintaining the general aspect of the image. This technique is known to create images more difficult to handle for OCR technology; therefore specific image preprocessing is needed to detect and revert it.
Threshold Sets the threshold for fixed threshold binarization (0 for automatic threshold computation).
Remove Lines Whether or not to remove lines from an image (The image must be black and white).
Horizontal Clean X The parameter for cleaning noisy pixels attached to the horizontal lines.
Horizontal Clean Y The parameter for cleaning noisy pixels attached to the horizontal lines.
Vertical Clean X The parameter for cleaning noisy pixels attached to the vertical lines.
Vertical Clean Y The parameter for cleaning noisy pixels attached to the vertical lines.
Horizontal Dilate The dilate parameter that helps the detection of horizontal lines.
Vertical Dilate The dilate parameter that helps the detection of vertical lines.
Horizontal Max Gap The maximum horizontal line gap to close. It is useful to remove broken lines.
Vertical Max Gap The maximum vertical line gap to close. It is useful to remove broken lines.
Horizontal Max Thickness The maximum thickness of the horizontal lines to remove. It is useful to keep vertical lines larger than this parameter. Can be also useful to keep vertical letter strokes.
Vertical Max Thickness The maximum thickness of the vertical lines to remove. It is useful to keep horizontal lines larger than this parameter. Can be also useful to keep horizontal letter strokes.
Horizontal Min Length The minimum length of the horizontal lines to remove.
Vertical Min Length The minimum length of the vertical lines to remove.
Remove Dark Borders Removes the dark surrounding from bitonal, grayscale or color images. The dark surrounding of the image is whitened (Note: The dark border should be touching the edge of the page for this to work).
Punch Hole Removal Attempts to remove punch holes from pages. Note: The punch hole algorithm can be used on images with the following minimum dimensions width: 300px, height: 100px (computed for 300 DPI). The minimum height and width can vary with the image resolution.
Interpolation Interpolates the source image to the given resolution. This value (the target resolution) must be greater than the source image’s resolution.
Interpolation Mode Sets the interpolation mode.
Keep Original Image Set this to true if you want to use the pre-processed image for OCR but keep the original image in the output document. The default value is ‘true’. Note: This property only applies when processing image files or when processing PDF files with the Convert To TIFF set to Yes
Keep Deskewed Image Set this to true if you want to use the deskewed image in the output document. Note: This property only applies when Keep Original Image is set to No
Keep Despeckled Image Set this to true if you want to use the despeckled image in the output document. This requires the source image to be black and white. Note: This property only applies when Keep Original Image is set to No
Keep Dark Border Removal Set this to true if you want to use the image after dark borders have been removed, in the output document. Note: This property only applies when Keep Original Image is set to No
Keep Punch Hole Removal Set this to true if you want to use the image after punch holes have been removed, in the output document. Note: This property only applies when Keep Original Image is set to No

Merge TIFFs to PDF

This step first merges the input images in a folder into a multi-page PDF file, then performs an OCR on the file. Depending upon the Step Type Properties chosen, a separate text, HTML and Office files may be produced from the OCR process.

Standard Engine

ParameterNotes
Output File NameTarget file template which can include %DIRNAME (directory name of the original file)
Create Directories if RequiredForce creation of any output directories if they do not already exist.
OCR Options

Choose “No OCR” to generate an image-only PDF.

Choose “OCR” to generate searchable PDF and/or text files.

Continue on ErrorContinue processing TIFF files after an error occurs.
OCR LanguageSelect the language the original file is written in. This will determine the dictionary that is used.
DeskewStraighten the image.
Auto-RotateAutomatically rotate pages so that text flows left to right.
DespeckleRemove specks below the specified pixel size from the image.
Save Pre-DespeckleThis will use the original image (i.e. before applying pre-processing) in the output PDF. The default value is true.
Output PDFChoose “Yes” to Generate a PDF file.
Output TXTChoose “Yes” to generate a .txt file (only applicable if OCR is specified).
Output RTFChoose “Yes” to generate a .rtf file (only applicable if OCR is specified).
Output HTMLChoose “Yes” to generate a .htm file (only applicable if OCR is specified).
Advanced FlagsCommand line flags to be passed through to the underlying executable.
PDF/A Options

Select the output PDF/A compliant version you would like the output PDF to be.

  • PDF/A1-b

  • PDF/A2-b

  • PDF/A3-b

Validate PDF/AWhether or not to validate the PDF/A document after conversion

Convert Any File to PDF

This converts any printable document to PDF, such as Microsoft Word, Excel, PowerPoint, HTML, etc. subject to the native application being available on the server. See Section 18.2 for more details.

Parameter Notes
Output File Name Target file template which can include %FILENAME (original filename without the extension) and %DIRNAME (directory name of the original file)
Continue on Error Continue processing files after an error occurs.
Conversion Timeout (ms) Limits the amount of time in milliseconds that can be spent on conversion. A value of zero means there is no time limit.
Convert Bookmarks For MS Word, convert bookmarks
Bookmark Depth This property will take effect only when the Convert Bookmarks property is set to True. Numbers defining bookmark levels must be equal to or larger than one. Word style names must not repeat in the string. The string must not start or end with the delimiter. When this property is empty, the default style mapping (Heading one through nine will be mapped to level one through nine) will be used. Therefore, an empty string is functionally equivalent to Heading 1, Heading 2, Heading 3, Heading 4, Heading 5, Heading 6, Heading 7, Heading 8, Heading 9.
Note: If you use a non-English version of Microsoft Word, then you may need to replace the word “Heading” with its localized version.
Convert Hyperlinks Sets the flag to indicate whether to convert Word hyperlinks to PDF hyperlinks.
Print All Sheets (Excel) The flag that indicates whether to print all Excel worksheets or not.
Print Background Color (IE) For files printed via IE Sets the flag that indicates whether to print background color or not when printing.
Print Scale % (Visio) For Visio files, sets the print scale
Header (IE) This property modifies Internet Explorer’s header setting.
Footer (IE) This property modifies Internet Explorer’s footer setting.
Image Compression If you want a lossless image compression, use PRN_IMAGE_COMPRESS_ZIP (ZIP compression).
Image Downsizing If this property is set to Yes, then the resolution of images is reduced to the DPI value specified in the Downsize Resolution DPI property.
Downsize Resolution DPI If the Image Downsizing property is set to True, then the resolution of images is reduced to the DPI value specified in this property.
Image JPEG Quality The allowed value range is from 5 to 100 with 100 being the highest quality.
Font Embedding The option PRN_FONT_EMBED_FULLSET (embedding a full set of fonts) will cause a significant increase in PDF file size, especially for CJK font, and therefore not recommended. If you need to embed the font, PRN_FONT_EMBED_SUBSET (embed subset of fonts) will be a better choice.
Font Substitution For the PRN_FONT_SUBST_TABLE (use font substitution table) option, you need to configure the substitution table. The table is stored under the “Device Setting” section of the printer driver properties (can be accessed from the Control Panel).
Embed Fonts as Type 0 This option is recommended if you have non-standard fonts like barcode font.
Top Margin Sets top margin. (Inches)
Bottom Margin Sets bottom margin. (Inches)
Left Margin Sets left a margin. (Inches)
Right Margin Sets right margin. (Inches)
Page Width Sets a custom page width. (Inches)
Page Height Sets a custom page height. (Inches)
Paper Orientation Sets paper orientation to
- Default (Maintain Source Orientation)
- Landscape
- Portrait
PDF Compliance Allows the User To choose PDF/A or PDF/X Compliant files
- None (No PDF/A Output)
- PDF/A-1b (PDF/A-1b compliant)
- PDF/X-1a (PDF/X-1a compliant)
- PDF/X-3 (PDF/X-3 compliant)
Convert MSG Attachments If you set this to true, Autobahn DX will convert both MSG files and their Attachments to a single PDF file.
Attach MSG Attachments to PDF If set to true, Autobahn DX will Attach Msg Attachments that are converted as PDF Attachments. If set to false, Autobahn DX will merge Msg Attachments that are converted to the PDF file generated by the body.
Preserve Word Attachments Determines whether embedded and linked files will be preserved during conversion. Default value: False (disabled).
Note: This will work with WordExtensionEX only
Convert PDF Attachments (PDF) Convert PDF Attachments to create a combined PDF file.
Merge PDF Attachments (PDF) Set this flag to true if you want to convert pdf attachments and merge them into the output pdf file. Otherwise, the converted files will be merged back to the pdf.
Retain PDF Attachment (PDF) Switch this on to Retain the Original PDF attachments if you set Merge PDF Attachments to true.
Retain Properties (Office) Set this flag if you want the MS Office properties to be transferred to the target pdf document.
Color Type (PowerPoint) Use this property to set PowerPoint to print with either color, grayscale, or black and white.
Handout Order (PowerPoint) Sets the handout order, this flag only applies to PowerPoint jobs. The possible values are:
- Vertical First
- Horizontal First
Output Type (PowerPoint) Sets the output type, it only works with the PowerPoint files. The possible values are:
- Slides
- Build slides
- Two slides handouts
- Three slides handouts
- Four slides handouts
- Six slides handouts
- Nine slides handouts
- Notes
- Outline
Print Graphics (Publisher) Sets the graphics setting for printing.
- Print Full Resolution
- Print Low Resolution
- Print Graphics
Frame Slides (PowerPoint) Indicate whether to draw a frame around the border of the slides.
Zoom (Excel) Sets printing zoom of the worksheet. The allowed value range is from 10 to 400.
Fit to Pages Wide (Excel) Sets number of pages wide the worksheet will be scaled to. This property is ignored if the Zoom property is set.
Fit to Pages Tall (Excel) Sets number of pages tall the worksheet will be scaled to. This property is ignored if the Zoom property is set.
Include Document Markups Determines whether document markups are retained. When this property is False (the default), document markups are omitted. When this property is True, markups are included.
Advanced Flags Command line flags to be passed through to the underlying executable.
Maximum Cores The number of parallel files Autobahn DX will attempt to process at the same time.
Password Files This option specifies what Autobahn does when it encounters a password protected PDF file. The file will be copied to the password sub directory in the Error Folder.
- Take no action.
- Move to Error Folder
- Copy to Error Folder
Debug Set this to true to execute the step in debug mode.

Set PDF Properties

This is used to set PDF Metadata properties (such as Author, Title, etc.), Security settings and Document Display properties.

Parameter Notes
Output File Name Target file template which can include %FILENAME (original filename without the extension), %DIRNAME (directory name of the original file), %UNIQUEn (e.g. %UNIQUE4 for 4 digits), %BOOKMARK and %PAGEn (e.g. %PAGE4 for 4 digits)
Encryption Strength Must be set to 128 bits if security attributes are to be set.
User Password A password that will be required to open the document.
Owner Password A password that will be required to change the document permissions.
Allow Printing Allow high-quality printing
Allow Modify Contents Allow assembly and other document modifications
Allow Copy Allow text and graphics copying and extraction
Allow Modify Annotations Allow modification of annotations
Allow Filling Allow filling of form fields
Allow Screen Readers Allow extraction of text and graphics in support of accessibility.
Allow Assembly Allow rotation, insertion or deletion of pages.
Allow Degraded Printing Allow low-quality printing
Author Sets the Author property
Title Sets the Title property
Subject Sets the Subject property
Keywords Sets the Keywords property
Creator Sets the Creator property
Page Layout The setting for the initial document page display
Page Mode The setting for initial viewer mode
Non-Full Screen Mode Only applicable where Page Mode=Full Screen. The setting for document page display when exiting Full-Screen mode.
Hide Menu Bar The viewer’s menu bar will be hidden
Hide Window UI The viewer’s UI elements (scrollbars etc.) will be hidden
Hide Tool Bar The viewer’s toolbar will be hidden
Fit Window The viewer will resize the document’s window to fit the size of the first displayed page.
Center Window The document window will be positioned in the center of the screen.

Custom Script

This can be used to support a custom scripted step in the process. See section 6 for more details.

ParameterNotes
Custom Script FileName of the custom script file to be run located in the Autobahn custom folder.
Job ID(optional) Will send an additional flag with the jobdef file location. e.g. A value of 1024 with give the flag
"/jobdef:C:\Aquaforest\Autobahn DX/jobdef/1024.xml" given that Autobahn is installed on the default C drive location.

Stamp PDF Files

This step can be used to add stamps to PDF pages, we have given the user the ability to customize these stamps extensively in a very simple manner. See the step properties below.

Parameter Notes
Output File Name Target file template which can include %FILENAME (original filename without the extension), %DIRNAME (directory name of the original file).
Stamp Operation Autobahn DX has different ways to apply stamps to a page, this gives the user some level of flexibility.
- StampTextAsString: When this operation has selected the text passed as the StampObject will be stamped on the PDF document as text.
- StampPDFText: When this operation is selected the text passed as the StampObject will be stamped on the PDF document as an image.
- StampPageNumber: When this operation is selected, every page in the PDF file will be stamped with a page number, starting from the start number. E.g. if StartNumber = 6 the first-page number will start from 6.
- StampPageNumberBates: When this operation is selected, every page in the PDF file will be stamped with a bate number, starting from the start number. E.g. if StartNumber = 6 the first-page number will start from 000006.
- StampVariable: This option allows a user to specify a variable like a date, filename or time. The variable specified by the StampObject will be stamped on the document. Check the table below for different Stamp variables provided.
- StampPDFImage: When this operation is selected the text passed as the StampObject is the address of the image to be stamped on the PDF document.
Stamp Placement The property specifies the location in a page a stamp can be placed. Below is a list of options available.
- Bottom Center
- Bottom Left
- Bottom Right
- Center
- Center Left
- Center Right
- Top Center
- Top Left
- Top Right
Stamp Direction This represents the direction of the stamp on the output PDF
- Normal
- Diagonal Up
- Diagonal Down
Stamp Text Enter any static text to be stamped on a PDF page, this works with the StampPDFText stamp operation.
Stamp Variable Enter a stamp variable to be stamped on a PDF page, this works with the StampVariable stamp operation. See table below for more details.
Image Path The path to the image if you are using the StampPDFImage operation. 
Page Range Set of page ranges separated by commas that define which pages from the original should be stamped. Using * or leaving it blank will process all pages.
Start Number The number that the page numbering will start with, works with StampPageNumber and StampPageNumberBates.
Start Page Specifies the page that the stamping should start
End Page Specifies the page that the stamping should stop
Bates Prefix Specifies the prefix of the Bates stamp
Bates Suffix Specifies the suffix of the Bates stamp
Bates Length Specifies the length of the Bates stamp
Stamp Color The color of non-image stamps. Enter a valid color name or black will be used.
Stamp Opacity The opacity of non-image stamps. Enter a valid color name or black will be used.
Font Name The font name of non-image stamps. Choose the font you want from a drop-down list of different fonts.
Font Size The font size of non-image stamps, default value = 20.
Stamp Text as Image Set this to Yes if you want Autobahn DX to convert text-based stamps to images before applying it to the PDF page.
Image Background Color When you set Stamp Text as Image to yes, use this property to set the background color of the image(rectangle) that the text is converted to.
Maximum Cores The number of parallel files Autobahn DX will attempt to process at the same time.
Password Files This option specifies what Autobahn does when it encounters a password protected PDF file. The file will be copied to the password sub directory in the Error Folder.
- Take no action.
- Move to Error Folder
- Copy to Error Folder
Debug Set this to true to execute the step in debug mode.

Stamp Variables

The table below shows the different Stamp variables supported by Autobahn DX, the idea is that Autobahn will replace an occurrence of the variable with the appropriate value in a text string before applying the stamp. E.g. to Stamp Today is Monday on a PDF page, use the following Stamp variable “Today is %A”.

 

 Variable   Stamp 

%a 

Short Day (Mon) 

%A 

Long Day (Monday) 

%b 

Short Month (Jan) 

%B 

Long Month (January) 

%c 

Date and time (30 October 2013 17:21) 

%C 

Date and Time with seconds (30 October 2013 17:21:50) 

%d 

Month and Year (October 2013) 

%D 

Day and Month (30 October) 

%e 

Short Year (13) 

%E 

Long Year (2013) 

%f 

Short Time of Day (17:21) 

%F 

Time of Day with Seconds (17:21:20) 

%G 

Full Date and time (Wed, 30 October 2013 17:21:50 GMT) 

%Y 

File Name 

   

Merge PDF

Merges a folder of PDF files into a single file.

ParameterNotes
Output File NameTarget file template which can include %DIRNAME (directory name of the original file)
Create Directories if RequiredForce creation of any output directories if they do not already exist.
Retain BookmarksGenerated files will include bookmarks from the original file.
Retain MetadataGenerated files will include metadata (such as Author and Title) from the original file.
File Names as BookmarksGenerate bookmarks in the output PDF using filenames of source PDF files.
Continue on ErrorContinue processing if an error occurs.
Advanced FlagsCommand line flags to be passed through to the underlying executable.
Maximum CoresThe number of parallel files Autobahn DX will attempt to process at the same time.
Password Files

This option specifies what Autobahn does when it encounters a password protected PDF file.

  • Take no action.

  • Move to Error Folder

  • Copy to Error Folder

DebugSet this to true to execute the step in debug mode.

Split PDF

Splits each input PDF file into a set of files, either a single page per file or by page ranges.

ParameterNotes
Output File NameThe target file template which can include %UNIQUEn (a unique number starting at 1, zero padded to n digits) %FILENAME (original filename without the extension) and %DIRNAME (directory name of the original file)
Create Directories if RequiredForce creation of any output directories if they do not already exist.
Retain BookmarksGenerated files will include bookmarks from the original file.
Retain MetadataGenerated files will include metadata (such as Author and Title) from the original file.
Split Type

Single Pages – Splits the file into single pages

Page Ranges – Splits the file based on the range

Repeated Ranges – Splits the file based on the range and the repeated range

Bookmarks – Splits the file based on the original bookmarks

Ranges (e.g. 1,3-10)Set of page ranges separated by commas that define which pages from the original should be extracted.
Repeat Every (Pages)Apply the page range to each set of Page Ranges within the document. For example, if 2-4 is specified for page ranges, and 4 is specified as the repeating range, then the range is re-applied every 4 pages.
Continue on ErrorContinue processing if an error occurs.
Advanced FlagsCommand line flags to be passed through to the underlying executable.
Maximum CoresThe number of parallel files Autobahn DX will attempt to process at the same time.
Password Files

This option specifies what Autobahn does when it encounters a password protected PDF file. The file will be copied to the password sub directory in the Error Folder.

  • Take no action.

  • Move to Error Folder

  • Copy to Error Folder

DebugSet this to true to execute the step in debug mode.

Merge TIFFs

Merges a folder of TIFF files into a single file.

Parameter Notes
Output File Name Target file template which can include %DIRNAME (directory name of the original file)
Create Directories if Required Force creation of any output directories if they do not already exist.
Advanced Flags Command line flags to be passed through to the underlying executable.
Maximum Cores The number of parallel files Autobahn DX will attempt to process at the same time.
Continue on Error Continue processing if an error occurs.
Debug Set this to true to execute the step in debug mode.

Split TIFF

Splits each input TIFF file into a set of files, either a single page per file or by page ranges.

ParameterNotes
Output File NameThe target file template which can include %UNIQUEn (a unique number starting at 1, zero padded to n digits) %FILENAME (original filename without the extension) and %DIRNAME (directory name of the original file)
Create Directories if RequiredForce creation of any output directories if they do not already exist.
Split Type

Single Pages – Splits the file into single pages

Page Ranges – Splits the file based on the range

Repeated Ranges – Splits the file based on the range and the repeated range

Ranges (e.g. 1,3-10)Set of page ranges separated by commas that define which pages from the original should be extracted.
Repeat Every (Pages)Apply the page range to each set of Page Ranges within the document. For example, if 2-4 is specified for page ranges, and 4 is specified as the repeating range, then the range is re-applied every 4 pages.
Advanced FlagsCommand line flags to be passed through to the underlying executable.
Maximum CoresThe number of parallel files Autobahn DX will attempt to process at the same time.
Continue on ErrorContinue processing if an error occurs.
DebugSet this to true to execute the step in debug mode.

Read Inbox

This can read mailboxes and extract attachments using IMAP4 or OAuth2 (Modern) Authentication, in accordance with the parameters specified below. Use of this step type requires a Server License.

Check with your System Administrator and ensure the following for IMAP4:

  • IMAP4 is enabled for the mail server and your account.

  • You have the IMAP address of the mail server.

For OAuth2, you require an access token from the Microsoft Identity Platform, which will supply you with the credentials to use our email steps with Modern Authentication.

Note: The files will be downloaded in the following format, name@timestamp@[email protected] where:

  • Name = Filename

  • timestamp= date of the email.

  • email= The ‘From’ address

Example: file1@[email protected]@[email protected]

The following Microsoft article provides information on how to verify basic IMAP connectivity by using Telnet: http://support.microsoft.com/kb/189326

Parameter Notes
Authentication Mode Choose between IMAP and Modern Authentication
IMAP Server The IMAP server address e.g. imap.company.co.uk
Require Authentication If anonymous authentication is set up on your server, a username and password is not needed when setting this option to ‘No’
Username The username for the account to access the IMAP server.
Password Password for the account. This is held encrypted.
Azure Client ID The Client ID for OAuth2 Authentication
Azure Tenant The Tenant for OAuth2 Authentication
Azure AD Instance The address of the Azure AD Instance. e.g. https://login.microsoftonline.com
Credential Type Select the credential type for OAuth2 Authentication. The options are Client Secret or Certification.
Client Secret The Client secret generated by Azure
Certificate Path The path to the certificate generated by Azure
Certificate Password The password of the certificate generated by Azure
Source Email Account The email account to be read e.g. [email protected]
Mailbox Mailbox to read e.g. Inbox
Processed Mailbox Mailbox to move processed email to e.g. Deleted Items. If left blank, the emails will be left in the inbox which can be useful for testing.
Output Template The template for the name of the output file. This can include %FILENAME% for the original filename, %TIMESTAMP% for the job timestamp, and %FROMADDRESS% for the ‘From’ email address.
Include Regular expression. If specified, only files matching the expression will be processed. E.g. *.tif. This allows alternate jobs to be created for different file types.
Exclude Regular expression. If specified, files matching the expression will not be processed. E.g. *.pdf
Subject Filter Autobahn will only download attachments from email with the subject filter in their subject.
Debug Set this to true to execute the step in debug mode.

Send Documents

Use of this step type requires a Server License.

Attachment limit is 50MB but email provider’s limits are normally lower.

Note: The input file of this step must be in the format of name@timestamp@[email protected]

where:

  • Name = Filename

  • timestamp= date of the email.

  • email= the address where we will send the output files.

Example: file1@[email protected]@[email protected]

Parameter Notes
Authentication Mode Choose between SMTP and Modern Authentication
Domain The sending domain e.g., aquaforest.com
SMTP Server SMTP Server address e.g., smtp.aquaforest.com
Require Authentication If anonymous authentication is set up on your server, a username and password is not needed when setting this option to ‘No’
Username The username for the account to access the SMTP server.
Password Password for the account. This is held encrypted.
Azure Client ID The Client ID for OAuth2 Authentication
Azure Tenant The Tenant for OAuth2 Authentication
Azure AD Instance The address of the Azure AD Instance. e.g., https://login.microsoftonline.com
Credential Type Select the credential type for OAuth2 Authentication. The options are Client Secret or Certification.
Client Secret The Client secret generated by Azure
Certificate Path The path to the certificate generated by Azure
Certificate Password The password of the certificate generated by Azure
Sender Name Name of the sending User e.g., John
From Email Address Sending user e.g., [email protected]
CC Addresses Email list of CC’d email addresses. Separate addresses with a comma. e.g., [email protected], [email protected]
BCC Addresses Email list of Bcc’d email addresses. Separate addresses with a comma. e.g., [email protected], [email protected]
Email Title The title of the Email
Email Body The body of the Email
Allow Multiple Attachments By default, Autobahn sends files as individual emails. If set to ‘Yes’ Autobahn will try to group files by destination and send multiple files in one email.
Attachment Number Limit Setting this number limits the number of files that can be attached to one email sent by Autobahn.
Attachment Total Size Limit In MB. This value limits the total size of all the files sent in each individual email by Autobahn.
Use Original Filename Input filenames must fit a specific format. Select true if you want the final attachment to revert to its original name.
Debug Set this to true to execute the step in debug mode.

Convert PDF to TIFF

Rasterizes a PDF file, converting into a multi-page TIFF file.

ParameterNotes
Output File NameTarget file template which can include %FILENAME (original filename without the extension)
CompressionGroup 4 (For bitonal images) or LZW (for color).
ResolutionThe DPI of the resulting TIFF File.
Continue on ErrorContinue processing if an error occurs.
Advanced FlagsCommand line flags to be passed through to the underlying executable.
Maximum CoresThe number of parallel files Autobahn DX will attempt to process at the same time.
Password Files

This option specifies what Autobahn does when it encounters a password protected PDF file. The file will be copied to the password sub directory in the Error Folder.

  • Take no action.

  • Move to Error Folder

  • Copy to Error Folder

DebugSet this to true to execute the step in debug mode.

Extract Text from PDF

Extracts the raw text from a searchable PDF. [NB this does not perform an OCR process, it just extracts the existing text from the PDF file.] Note, there is a GDPicture based step (PDF to Text).

ParameterNotes
Output File NameTarget file template which can include %FILENAME (original filename without the extension)
Continue on ErrorContinue processing if an error occurs.
Page FromThe start of the range of pages from which to extract text. If not specified, a start page of 1 is assumed.
Page ToThe end of the range of pages from which to extract text. If not specified, the last page is assumed.
Page SeparatorThis allows the definition of an optional page separator string in the output text file.
Page Separator PlacementSpecifies whether the Page Separator will appear at the beginning or the end of the page.
Extract Text Engine.

The Extract Text Engine to use:

  • 0 = PDFBox with Formatting,

  • 1 = BCL

  • 2 = PDFBox

Copy Input PDF to Target FolderSet to true if you want Autobahn DX to copy the input PDF file to the target folder.
Maximum CoresThe number of parallel files Autobahn DX will attempt to process at the same time.
Password Files

This option specifies what Autobahn does when it encounters a password protected PDF file. The file will be copied to the password sub directory in the Error Folder.

  • Take no action.

  • Move to Error Folder

  • Copy to Error Folder

DebugSet this to true to execute the step in debug mode.

SharePoint Download

This step downloads documents from the specified SharePoint document library ready for processing.

Parameter Notes
SharePoint Site URL Site, the URL of the SharePoint site that you want to access, e.g. http://localhost/testsite
SharePoint Online (Office 365) Whether or not the upload location is in SharePoint Online (Office 365).
Use ADFS Switch this on if you use Active Directory for your SharePoint User Management.
Username The username used to connect to the SharePoint site. Leave empty to use Windows Credentials (for local SharePoint only).
Password The password used to connect to the SharePoint site. Leave empty to use Windows Credentials (for local SharePoint only).
ADFS Host Provide the name of the Active Directory server.
ADFS Relying Party Identifier Provide the Relying Party Trust identifier for your SharePoint.
SharePoint Library Library, the name of the library that you want to access, e.g. “Test Library”
SharePoint Sub Folder Download documents from the specified subfolder in the SharePoint library only.
Extension Filter An optional extension mask that limits those files to manipulate, e.g. “pdf,tiff”
Recurse SharePoint Library If set to “Yes” sub-folders of the SharePoint Library are handled.
Include Pattern Autobahn will only include the files that match this pattern.
Exclude Pattern Any file that matches this pattern will be excluded.
Debug Set to “Yes” to see more processing information on the console.
Continue on Error Continue processing if an error occurs.

SharePoint Upload

This step uploads documents to the specified SharePoint document library.

ParameterNotes
SharePoint Site URLSite, the URL of the SharePoint site that you want to access, e.g. http://localhost/testsite
SharePoint Online (Office 365)Whether or not the upload location is in SharePoint Online (Office 365).
Use ADFSSwitch this on if you use Active Directory for your SharePoint User Management.
UsernameThe username used to connect to the SharePoint site.
PasswordThe password used to connect to the SharePoint site.
ADFS HostProvide the name of the Active Directory server.
ADFS Relying Party IdentifierProvide the Relying Party Trust identifier for your SharePoint.
SharePoint LibraryLibrary, the name of the library that you want to access, e.g. "Test Library"
SharePoint Sub Folder

The subfolder inside the SharePoint library to upload the files into. The subfolder should be present in the library or else the following message will be displayed:

“The remote server returned an error: (409) Conflict.”

Extension FilterAn optional extension mask that limits those files to manipulate, e.g. “pdf,tiff”
Recurse Source Folder

Recurse the source folder and its subfolders for files to upload and create the folders in SharePoint if they do not already exist.

Note: If “Use Work Folders” is checked, then “Process Sub-Folders” must also be checked for this to work.

Create Directories if RequiredForce creation of any output directories if they do not already exist.
Include PatternOnly files that match this pattern will be included.
Exclude PatternAny file that matches this pattern will be excluded.
DebugIf set to “Yes” the user will see more processing information on the console.
Continue on ErrorContinue processing if an error occurs.

Azure Storage Download

This step will download files to your local machine from an Azure storage Container.

Parameter Notes
Storage Account Name The name of the Azure storage account you want to download files from.
Azure Account Key Key 1 under the accesskeys section of the storage account in the portal.
Container Name The name of the Azure blob container you want to download files from.
Extension Filter File extension filters separated by commas (e.g. .tif,.pdf)
Recurse Azure Storage Download documents from folders and subfolders in the SharePoint Library
Debug If set to “Yes” the user will see more processing information on the console.

Azure Storage Upload

This step will upload files from your local machine to an Azure storage Container.

ParameterNotes
Storage Account NameThe name of the Azure storage account you want to upload files to.
Azure Account KeyKey 1 under the accesskeys section of the storage account in the portal.
Container NameThe name of the Azure blob container you want to upload files to.
Extension FilterFile extension filters separated by commas (e.g. .tif,.pdf)
Recurse Local FolderUpload documents from folders and subfolders of the local folder
Replace Invalid Characters With

A pattern to replace any invalid character Windows File Storage in the file name before downloading. Invalid characters are: " * : < > ? \ |

Default replacement pattern is: _

DebugIf set to “Yes” the user will see more processing information on the console.

Create XML Property File

This step takes a PDF input file and generates an XML output file.

Parameter Notes
Copy the Source PDF to Target Folder Set to true if you want Autobahn DX to copy the input PDF file to the target folder.
Continue on Error Continue processing files after an error occurs.
Debug Set this to true to execute the step in debug mode.

Optimize PDF

This allows the creation of Web Optimized (Linearize) PDFs.

ParameterNotes
Linearize – Fast Web ViewSet to true to Linearize a PDF file
Continue on ErrorContinue processing files after an error occurs.
Maximum CoresThe number of parallel files Autobahn DX will attempt to process at the same time.
Password Files

This option specifies what Autobahn does when it encounters a password protected PDF file. The file will be copied to the password sub directory in the Error Folder.

  • Take no action.

  • Move to Error Folder

  • Copy to Error Folder

DebugSet this to true to execute the step in debug mode.

OCR Any File to PDF

This step attempts to convert all files to searchable PDFs, Autobahn DX may have the following OCR engines.

  • Standard Engine

  • GdPicture Engine

  • Extended Engine

See Standard OCR vs Extended OCR for the differences.

Standard Engine

ParameterNotes
General Settings
Output File NameTarget file template which can include %FILENAME (original filename without the extension) and %DIRNAME (directory name of the original file)
Create Directories if RequiredForce creation of any output directories if they do not already exist.
Continue on ErrorContinue processing TIFF files after an error occurs.
Overwrite ExistingOverwrites the target document if it exists.
Advanced FlagsCommand line flags to be passed through to the underlying executable.
Password Files

This option specifies what Autobahn does when it encounters a password protected PDF file. The file will be copied to the password sub directory in the Error Folder.

  • Take no action.

  • Move to Error Folder

  • Copy to Error Folder

Maximum Cores

This specifies the number of parallel files you want to be processed at a given time.

Note: You need the multi-core license for this.

DebugSet this to true to execute the step in debug mode.
Standard OCR Settings
OCR LanguageSelect the language the original file is written in. This will determine the dictionary that is used.
DeskewStraighten the image.
Auto-RotateAutomatically rotate pages so that text flows left to right.
DespeckleRemove specks below the specified pixel size from the image.
OCR to Text FileChoose “Yes” to Generate text Output
Output File
  • Plain Text (txt).

  • Plain Text (txt) No PDF

  • MS Word (rtf)

  • HTML

Non-Image PDFs

This allows control over the treatment of non-image PDFs, i.e. PDFs that have some text in them as well as images. The options are:

  • OCR: The document will be OCRed using the image method defined by “Image Method”

  • Raise Error: The task will terminate with an error. If “On Error Continue” is set this then behaves as Skip. This is the default.

  • Skip: The document will not be processed.

  • Pass Through: The file will not be processed, but a copy of the document will be made and named as if the processing had occurred.

Remove Hidden TextThis applies only when a PDF is being used as the source for OCR. When set to true this will not include any searchable text layers that already exist from the source document. Such functionality might be useful if the source document was created by OCR of an image only PDF or other image file and the quality of the text from the previous OCR is poor. NOTE: There is no way to distinguish text added as a result of OCR from text added by other means and as a result, this option should be used with care.
Convert to TIFF

Choose the method for PDF image extraction.

  • No – (Native)

  • Yes – (Convert to TIFF)

DPIWhen OCRing a PDF, the PDF is rasterized to produce a TIFF file which is then OCRed. By default, the TIFF image resolution is determined from the images embedded in the source PDF but this flag can be used to override default processing and specify the DPI of the TIFF that will be generated.
TIFF Compression

Sets the Compression for the TIFF file used if the “Convert To TIFF” Option above is used.

  • Auto (Selects Group 4 if the page is Black AND White else it uses LZW Compression)

  • Group 4 (Black and White)

  • LZW (Colored)

Retain MetadataCopy metadata from the source PDF to the Searchable result PDF.
Retain Bookmarks.Copy bookmarks from the source PDF to the Searchable result PDF.
Retain Viewer PreferencesRetains any PDF Viewer Preferences, Page Mode and Page Layout from the source file in the output when using Convert To TIFF='Yes'.
PDF/A Options

Select the output PDF/A compliant version you would like the output PDF to be.

  • PDF/A1-b

  • PDF/A2-b

  • PDF/A3-b

Validate PDF/AWhether or not to validate the PDF/A document after conversion
Box/Graphics Processing

By default, if an area of the document is identified as a graphic area then no OCR processing is run on that area. However, certain documents may include areas or boxes that are identified as “graphics” or “picture” areas but that actually do contain useful text.

To ensure that the OCR engine can be forced to process such areas there are two options:

“Treat all Graphics Areas as Text”. This option will ensure the entire document is processed as text.

“Remove Box Lines in OCR Processing”. This option is ideal for forms where sometimes boxes around text can cause an area to be identified as graphics. This option removes boxes from the temporary copy of the imaged used by the OCR engine. It does not remove boxes from the final image. Technically, this option removes connected elements with a minimum area (by default 100 pixels).

Line Removal in OCR ProcessingThis removes lines and boxes during OCR processing to improve recognition – particularly in cases where characters “touch” lines.
JBIG2 CompressionThis option will compress bitonal images in generated PDFs using JBIG2 compression rather than the default Group 4 compression scheme. This will result in smaller PDF file sizes, at a cost of increased processing time.
MRC CompressionApplies Mixed Raster Compression which can drastically reduce the size of PDF documents.
Save Pre-DespeckleThis will use the original image (i.e. before applying pre-processing) in the output PDF. The default value is true.
StampNameThis has been deprecated, use the Stamp PDF Files step.
StampValueThis has been deprecated, use the Stamp PDF Files step.
Any File To PDF Conversion Settings
Conversion Timeout (ms)Limits the amount of time in milliseconds that can be spent on conversion. A value of zero means waits indefinitely.
Convert BookmarksFor MS Word, convert bookmarks
Bookmark DepthThis property will take effect only when the Convert Bookmarks property is set to True. Numbers defining bookmark levels must be equal to or larger than one. Word style names must not repeat in the string. The string must not start or end with the delimiter. When this property is empty, the default style mapping (Heading one through nine will be mapped to level one through nine) will be used. Therefore, an empty string is functionally equivalent to

Heading 1|1|Heading 2|2|Heading 3|3|Heading 4|4|Heading 5|5|Heading 6|6|Heading 7|7|Heading 8|8|Heading 9|9

Note: If you use a non-English version of Microsoft Word, then you may need to replace the word "Heading" with its localized version.
Convert HyperlinksSets the flag to indicate whether to convert Word hyperlinks to PDF hyperlinks.
Print All Sheets (Excel)The flag that indicates whether to print all Excel worksheets or not.
Print Background Color (IE)For files printed via IE Sets the flag that indicates whether to print background color or not when printing.
Print Scale % (Visio)For Visio files, sets the print scale
Header (IE)This property modifies Internet Explorer's header setting.
Footer (IE)This property modifies Internet Explorer's footer setting.
Image CompressionIf you want a lossless image compression, use PRN_IMAGE_COMPRESS_ZIP (ZIP compression).
Image DownsizingIf this property is set to Yes, then the resolution of images is reduced to the DPI value specified in the Downsize Resolution DPI property.
Downsize Resolution DPIIf the Image Downsizing property is set to True, then the resolution of images is reduced to the DPI value specified in this property.
Image JPEG QualityThe allowed value range is from 5 to 100 with 100 being the highest quality.
Font EmbeddingThe option PRN_FONT_EMBED_FULLSET (embedding a full set of fonts) will cause a significant increase in PDF file size, especially for CJK font, and therefore not recommended. If you need to embed the font, PRN_FONT_EMBED_SUBSET (embed subset of fonts) will be a better choice.
Font SubstitutionFor the PRN_FONT_SUBST_TABLE (use font substitution table) option, you need to configure the substitution table. The table is stored under the "Device Setting" section of the printer driver properties (can be accessed from the Control Panel).
Embed Fonts as Type 0This option is recommended if you have non-standard fonts like barcode font.
Top MarginSets top margin. (Inches)
Bottom MarginSets bottom margin. (Inches)
Left MarginSets left margin. (Inches)
Right MarginSets right margin. (Inches)
Page WidthSets a custom page width. (Inches)
Page HeightSets a custom page height. (Inches)
Paper Orientation

Sets paper orientation to

  • Default (Maintain Source Orientation)

  • Landscape

  • Portrait

PDF Compliance

Allows the User To choose PDF/A or PDF/X Compliant files

  • None (No PDF/A Output)

  • PDF/A-1b (PDF/A-1b compliant)

  • PDF/X-1a (PDF/X-1a compliant)

  • PDF/X-3 (PDF/X-3 compliant)

Convert MSG AttachmentsIf you set this to true, Autobahn DX will convert both MSG files and their Attachments to a single PDF file.
Attach MSG Attachments to PDF

If set to true, Autobahn DX will Attach Msg Attachments that are converted as PDF Attachments.

If set to false, Autobahn DX will merge Msg Attachments that are converted to the PDF file generated by the body.

Preserve Word Attachments

Determines whether embedded and linked files will be preserved during conversion. Default value: False (disabled).

Note: This will work with WordExtensionEX only

Convert PDF Attachments (PDF)Convert PDF Attachments to create a combined PDF file.
Merge PDF Attachments (PDF)Set this flag to true if you want to convert pdf attachments and merge them into the output pdf file. Otherwise, the converted files will be merged back to the pdf.
Retain PDF Attachment (PDF)Switch this on to Retain the Original PDF attachments if you set Merge PDF Attachments to true.
Retain Properties (Office)Set this flag if you want the MS Office properties to be transferred to the target pdf document.
Color Type (PowerPoint)Use this property to set PowerPoint to print with either color, grayscale, or black and white.
Handout Order (PowerPoint)

Sets the handout order, this flag only applies to PowerPoint jobs.

The possible values are:

  • Vertical First

  • Horizontal First

Output Type (PowerPoint)

Sets the output type, it only works with the PowerPoint files. The possible values are:

  • Slides

  • Build slides

  • Two slides handouts

  • Three slides handouts

  • Four slides handouts

  • Six slides handouts

  • Nine slides handouts

  • Notes

  • Outline

Print Graphics (Publisher)

Sets the graphics setting for printing.

  • Print Full Resolution

  • Print Low Resolution

  • Print Graphics

Frame Slides (PowerPoint)Indicate whether to draw a frame around the border of the slides.
Zoom (Excel)

Sets printing zoom of the worksheet.

The allowed value range is from 10 to 400.

Fit to Pages Wide (Excel)

Sets number of pages wide the worksheet will be scaled to.

This property is ignored if the Zoom property is set.

Fit to Pages Tall (Excel)

Sets number of pages tall the worksheet will be scaled to.

This property is ignored if the Zoom property is set.

Include Document Markups

Determines whether document markups are retained.

When this property is False (the default), document markups are omitted.

When this property is True, markups are included.

Extended Engine

ParameterNotes
General Settings
Output File NameTarget file template which can include %FILENAME (original filename without the extension) and %DIRNAME (directory name of the original file)
Create Directories if RequiredForce creation of any output directories if they do not already exist.
Continue on ErrorContinue processing TIFF files after an error occurs.
Overwrite ExistingOverwrites the target document if it exists.
Advanced FlagsCommand line flags to be passed through to the underlying executable.
Password Files

This option specifies what Autobahn does when it encounters a password protected PDF file. The file will be copied to the password sub directory in the Error Folder.

  • Take no action.

  • Move to Error Folder

  • Copy to Error Folder

Maximum Cores

This specifies the number of parallel files you want to be processed at a given time.

Note: You need the multi-core license for this.

DebugSet this to true to execute the step in debug mode.
Extended OCR Settings
Output File Type

One or more of the following, separated by commas if more than one is required.

  • CSV *

  • DOCX

  • EPUB

  • EXCELML *

  • HTM

  • OPENTXT

  • PDF

  • RTF

  • TXT

  • WORDML

  • XLSX *

  • XPS

*These output formats are suitable for table-oriented pages that can be mapped onto a spreadsheet format.

OCR EngineThe OCR engine to use. This must be set to use the IRIS engine.
OCR Language 1-8You can set up to eight different languages for OCR recognition in one page, as long as they are in the same character set.
Automatic language detection

Property that enables or disables the Auto Language Detection feature. The aim of this feature is to detect the most probable language of a single-language page.

If at least one language has been detected, recognition will be performed in the first language candidate that has been detected, and not in the language(s) set through Language or Languages. If it fails to detect a language, recognition will be performed using the language(s) set through Language or Languages.

Auto rotateDetect page orientation and correct if required
DeskewRotates the image to correct its skew angle.
Advanced DeskewSet this to true to define advanced deskew properties.
Force Deskew

Under certain circumstances, rotating the image to correct its skew angle may decrease the OCR accuracy. The extended engine is able to analyze the image and detect from an OCR accuracy point of view whether it's better to rotate the image or not. Because the skew angle may be visible in the output document (e.g. if KeepDeskew is set to 'true'), you can choose to force the deskew to rotate the image, even if it affects the accuracy.

If turned off, the image is analyzed before rotation and the engine may choose not to rotate the image depending on the analysis result.

If turned on, the image is rotated to correct skew angle.

Adjustment ModeSet the behavior regarding dimension adjustment for deskew operation.
DespeckleRemoves all the groups of connected pixels with a few pixels below the parameter. Suggested range: 1-20.
Advanced DespeckleSet the advanced despeckle settings, advanced despeckle provides advanced image noise reduction features by the image despeckle filter.
Remove White PixelsBy default, Advanced Despeckle removes black pixels. If this setting is set to 'true', white pixels will be removed instead of black pixels.
Dilate

Despeckle removes all the groups of connected pixels with a few pixels below the SpeckleSize parameter. Those connected pixels are not removed if the distance to a larger connected component is below this parameter. As a result, only the isolated pixels get deleted. The maximum value for this property is 20 pixels.

The default value is '0'.

Layout

The layout for the docx or rtf document

  • Standard

  • Flow

PDF Version

This determines the PDF version of the generated PDF:

  • 1.4

  • 1.5

  • 1.6

  • 1.7

  • 1.7 Extension Level 3

  • 1.7 Extension Level 5

  • 1.7 Extension Level 8

  • PDF/A-1a

  • PDF/A-1b

  • PDF/A-2a

  • PDF/A-2b

  • PDF/A-3a

  • PDF/A-3b

Remove Blank PageSet this to true to remove blank pages from Tiff or PDF documents. Value needs to be set for sensitivity (see below).
SensitivityThe sensitivity, from 1 to 100. With high sensitivity, fewer blank pages are detected.
Work DepthThis parameter (0 – 255) defines how deeply the OCR engine will analyze a page with 255 being the deepest. For poorer quality documents, higher values can give better recognition results.
JPEG QualityThis parameter (0–255) determines the compression/quality of color JPEG images in generated PDFs. 0 gives the smallest file size whilst 255 gives the best quality. The default value is 128.
JPEG2000 CompressionEnable/Disable JPEG2000 Compression.
JPEG2000 Compression ModeThe JPEG2000 Compression Mode to use.
JPEG2000 Compression ValueThe Value to set for the selected Compression Mode.
IHQC CompressionApply Intelligent High-Quality Compression
IHQC Compression LevelLevel 1 is the basic compression level while level 3 is the most advanced Intelligent High-Quality Compression Mode.
IHQC Quality FactorThe quality Factor for IHQC
No OCRWhether are not to perform OCR on the document (Yes to not perform OCR, No to perform OCR).
BinarizationWhether or not to perform binarization on the document.
BrightnessThe brightness (higher values will make the result darker).
ContrastThe contrast (lower values will make the result darker).
Smoothing LevelSmoothing may be useful to binarize text with a colored background to avoid noisy pixels (0 disables smoothing, higher values smooth more).
Undithering

Whether or not to use automatic undithering while processing a page. NOTE: Automatic undithering will be applied only if smoothing is also activated (Smoothing Level)

Dithering is a scanning technique which consists in representing a color or grayscale image using only a limited color palette. This allows reducing file size while maintaining the general aspect of the image. This technique is known to create images more difficult to handle for OCR technology; therefore specific image preprocessing is needed to detect and revert it.

ThresholdSets the threshold for fixed threshold binarization (0 for automatic threshold computation).
Remove LinesWhether or not to remove lines from an image (The image must be black and white).
Horizontal Clean XThe parameter for cleaning noisy pixels attached to the horizontal lines.
Horizontal Clean YThe parameter for cleaning noisy pixels attached to the horizontal lines.
Vertical Clean XThe parameter for cleaning noisy pixels attached to the vertical lines.
Vertical Clean YThe parameter for cleaning noisy pixels attached to the vertical lines.
Horizontal DilateThe dilate parameter that helps the detection of horizontal lines.
Vertical DilateThe dilate parameter that helps the detection of vertical lines.
Horizontal Max GapThe maximum horizontal line gap to close. It is useful to remove broken lines.
Vertical Max GapThe maximum vertical line gap to close. It is useful to remove broken lines.
Horizontal Max ThicknessThe maximum thickness of the horizontal lines to remove. It is useful to keep vertical lines larger than this parameter. Can be also useful to keep vertical letter strokes.
Vertical Max ThicknessThe maximum thickness of the vertical lines to remove. It is useful to keep horizontal lines larger than this parameter. Can be also useful to keep horizontal letter strokes.
Horizontal Min LengthThe minimum length of the horizontal lines to remove.
Vertical Min LengthThe minimum length of the vertical lines to remove.
Remove Dark BordersRemoves the dark surrounding from bitonal, grayscale or color images. The dark surrounding of the image is whitened (Note: The dark border should be touching the edge of the page for this to work).
Punch Hole Removal

Attempts to remove punch holes from pages.

Note: The punch hole algorithm can be used on images with the following minimum dimensions width: 300px, height: 100px (computed for 300 DPI). The minimum height and width can vary with the image resolution.

InterpolationInterpolates the source image to the given resolution. This value (the target resolution) must be greater than the source image's resolution.
Interpolation ModeSets the interpolation mode.
Keep Original Image

Set this to true if you want to use the pre-processed image for OCR but keep the original image in the output document. The default value is 'true'.

Note: This property only applies when processing image files or when processing PDF files with the Convert To TIFF set to Yes

Keep Deskewed Image

Set this to true if you want to use the deskewed image in the output document.

Note: This property only applies when Keep Original Image is set to No

Keep Despeckled Image

Set this to true if you want to use the despeckled image in the output document. This requires the source image to be black and white.

Note: This property only applies when Keep Original Image is set to No

Keep Dark Border Removal

Set this to true if you want to use the image after dark borders have been removed, in the output document.

Note: This property only applies when Keep Original Image is set to No

Keep Punch Hole Removal

Set this to true if you want to use the image after punch holes have been removed, in the output document.

Note: This property only applies when Keep Original Image is set to No

Any File To PDF Conversion Settings
Conversion Timeout (ms)Limits the amount of time in milliseconds that can be spent on conversion. A value of zero means waits indefinitely.
Convert BookmarksFor MS Word, convert bookmarks
Bookmark DepthThis property will take effect only when the Convert Bookmarks property is set to True. Numbers defining bookmark levels must be equal to or larger than one. Word style names must not repeat in the string. The string must not start or end with the delimiter. When this property is empty, the default style mapping (Heading one through nine will be mapped to level one through nine) will be used. Therefore, an empty string is functionally equivalent to

Heading 1|1|Heading 2|2|Heading 3|3|Heading 4|4|Heading 5|5|Heading 6|6|Heading 7|7|Heading 8|8|Heading 9|9

Note: If you use a non-English version of Microsoft Word, then you may need to replace the word "Heading" with its localized version.
Convert HyperlinksSets the flag to indicate whether to convert Word hyperlinks to PDF hyperlinks.
Print All Sheets (Excel)The flag that indicates whether to print all Excel worksheets or not.
Print Background Color (IE)For files printed via IE Sets the flag that indicates whether to print background color or not when printing.
Print Scale % (Visio)For Visio files, sets the print scale
Header (IE)This property modifies Internet Explorer's header setting.
Footer (IE)This property modifies Internet Explorer's footer setting.
Image CompressionIf you want a lossless image compression, use PRN_IMAGE_COMPRESS_ZIP (ZIP compression).
Image DownsizingIf this property is set to Yes, then the resolution of images is reduced to the DPI value specified in the Downsize Resolution DPI property.
Downsize Resolution DPIIf the Image Downsizing property is set to True, then the resolution of images is reduced to the DPI value specified in this property.
Image JPEG QualityThe allowed value range is from 5 to 100 with 100 being the highest quality.
Font EmbeddingThe option PRN_FONT_EMBED_FULLSET (embedding a full set of fonts) will cause a significant increase in PDF file size, especially for CJK font, and therefore not recommended. If you need to embed the font, PRN_FONT_EMBED_SUBSET (embed subset of fonts) will be a better choice.
Font SubstitutionFor the PRN_FONT_SUBST_TABLE (use font substitution table) option, you need to configure the substitution table. The table is stored under the "Device Setting" section of the printer driver properties (can be accessed from the Control Panel).
Embed Fonts as Type 0This option is recommended if you have non-standard fonts like barcode font.
Top MarginSets top margin. (Inches)
Bottom MarginSets bottom margin. (Inches)
Left MarginSets left margin. (Inches)
Right MarginSets right margin. (Inches)
Page WidthSets a custom page width. (Inches)
Page HeightSets a custom page height. (Inches)
Paper Orientation

Sets paper orientation to

  • Default (Maintain Source Orientation)

  • Landscape

  • Portrait

PDF Compliance

Allows the User To choose PDF/A or PDF/X Compliant files

  • None (No PDF/A Output)

  • PDF/A-1b (PDF/A-1b compliant)

  • PDF/X-1a (PDF/X-1a compliant)

  • PDF/X-3 (PDF/X-3 compliant)

Convert MSG AttachmentsIf you set this to true, Autobahn DX will convert both MSG files and their Attachments to a single PDF file.
Attach MSG Attachments to PDF

If set to true, Autobahn DX will Attach Msg Attachments that are converted as PDF Attachments.

If set to false, Autobahn DX will merge Msg Attachments that are converted to the PDF file generated by the body.

Preserve Word Attachments

Determines whether embedded and linked files will be preserved during conversion. Default value: False (disabled).

Note: This will work with WordExtensionEX only

Convert PDF Attachments (PDF)Convert PDF Attachments to create a combined PDF file.
Merge PDF Attachments (PDF)Set this flag to true if you want to convert pdf attachments and merge them into the output pdf file. Otherwise, the converted files will be merged back to the pdf.
Retain PDF Attachment (PDF)Switch this on to Retain the Original PDF attachments if you set Merge PDF Attachments to true.
Retain Properties (Office)Set this flag if you want the MS Office properties to be transferred to the target pdf document.
Color Type (PowerPoint)Use this property to set PowerPoint to print with either color, grayscale, or black and white.
Handout Order (PowerPoint)

Sets the handout order, this flag only applies to PowerPoint jobs.

The possible values are:

  • Vertical First

  • Horizontal First

Output Type (PowerPoint)

Sets the output type, it only works with the PowerPoint files. The possible values are:

  • Slides

  • Build slides

  • Two slides handouts

  • Three slides handouts

  • Four slides handouts

  • Six slides handouts

  • Nine slides handouts

  • Notes

  • Outline

Print Graphics (Publisher)

Sets the graphics setting for printing.

  • Print Full Resolution

  • Print Low Resolution

  • Print Graphics

Frame Slides (PowerPoint)Indicate whether to draw a frame around the border of the slides.
Zoom (Excel)

Sets printing zoom of the worksheet.

The allowed value range is from 10 to 400.

Fit to Pages Wide (Excel)

Sets number of pages wide the worksheet will be scaled to.

This property is ignored if the Zoom property is set.

Fit to Pages Tall (Excel)

Sets number of pages tall the worksheet will be scaled to.

This property is ignored if the Zoom property is set.

Include Document Markups

Determines whether document markups are retained.

When this property is False (the default), document markups are omitted.

When this property is True, markups are included.

Barcode TIFF/PDF

This step can detect barcodes in TIFF/PDF files and either Split/Rename the file based on the barcodes detected.

Screen Field / ButtonDescription
Output File Name

The output file path template where the split files will be saved.

  • %VALUE%: Replaced by the barcode value found.

  • %INDEX%: Replaced by the current split index.

  • %FILENAME%: Replaced by the file name

Output File Name

(No Barcode)

The renaming template to use for page ranges where no barcodes were identified.

Allowed templates:

  • %INDEX%; Replaced by the current split index.

  • %FILENAME%: Replaced by the filename of the source file.

Barcode Operation

Select between Split by Barcode or Rename by Barcode

  • Split by Barcode: Choose this option to split the TIFF/ PDF file by Barcode.

  • Rename by Barcode: Choose this option to rename the TIFF/PDF file based on Barcode.

Split Mode

Various Options for splitting Files by Barcode

  • Barcode on First Page

  • Barcode on Last Page

  • Remove Barcode Page

Barcode FormatBarcode formats supported.
Try HarderSpend more time to try to find a barcode; optimize for accuracy, not speed. The default is true.
Overwrite Existing

Overwrites any file that exists with the same name in the output folder.

Note: If you have the same barcode in different pages or files, they will be overwritten if this is set to true.

Metadata Name

Choose the Metadata field you want to set the ‘Metadata Value’ for. The named fields below will have the value added to them when set.

  • Author

  • Creator

  • Keywords

  • Producer

  • Subject

  • Title

  • Trapped

Any other entry will be used as the name for a new custom metadata item.

Metadata Value

Enter a value for the Metadata Value. You can use the following file naming variables here too.

  • %VALUE%: Replaced by the barcode value found.

  • %INDEX%: Replaced by the current split index.

  • %FILENAME%: Replaced by the file name

Note: ‘Trapped’ metadata only accepts either ‘True’, ‘False’ or ‘Unknown’ as a value

Perform Pre-processingDo not enable this option unless instructed by Aquaforest support.
BinarizeSet this to true to get better results from colored files.
DeskewStraighten the image.
Remove LinesWhether or not to remove lines from an image.
DespeckleRemove specks below the specified pixel size from the image.
Box SizeThis option is ideal for forms where sometimes boxes around text can cause an area to be identified as graphics. This option removes boxes from the temporary copy of the imaged used by the barcode reader. Technically, this option removes connected elements with a minimum area (in pixels and defined by this property). This option is currently only applied for bitonal images.
Zones

Only examine the region specified for barcode(s).

Note to specify the zone you need to set the following in the step properties:

  • Left

  • Top

  • Width

  • Height

PDF DPIThe DPI of TIFF images generated from the source PDF file. These images are then used for barcode recognition.
TIFF CompressionThe compression to set to the TIFF images generated or converted from the source PDF file. These images are then used for barcode recognition
Advanced FlagsAdditionally advanced command-line flags may be entered here (see section 3)
Continue on ErrorContinue processing TIFF/PDF files after an error occurs.
Maximum CoresThe number of parallel files Autobahn DX will attempt to process at the same time.
DebugSet this to true to execute the step in debug mode.

High Availability

The high availability step in Autobahn DX is designed to utilize two instances of the product running on separate hosts.

Screen Field / Button Description
Current Job ID The Job ID on the current host.
Default Status Select the Default status of the current host (Controller | Replica)
Shared Status File Enter the shared.txt file location – this needs to be on a shared network location accessible to both hosts.
Hostname Name of the paired host.
ADX Install Path Install path of Autobahn DX on the paired host.
Job ID The Job ID on the paired host

Distributed Polling

This step can be used to implement load balancing in Autobahn DX. It achieves this by copying a fraction of the files from a central input location to the local system where Autobahn DX is running. Multiple Autobahn DX servers can point to one input folder, as a result, the files will be shared across several servers and the processing will be more optimized.

See the Distributed Polling Section for more details

Screen Field / ButtonDescription
Autobahn Job ID

The Job ID of the Job that will be processing your input files.

Note: The Source Folder of this job will be the Destination Folder of the Distributed Polling Job

LimitThe maximum number of files to be copied to the shared folder per run.
ExtensionsEnter the file extensions you want us to copy separated by a comma. E.g. “.pdf,.tif,tiff”
Process Sub FolderSelect true if you want to copy subfolders.
DebugSelect true if you want to see more debug output.

Kingfisher Job

This step allows a Kingfisher job to be integrated as an Autobahn step. See the Kingfisher Job Step Section for more details.

Screen Field / Button Description
Kingfisher Job ID The Kingfisher Job ID

PDF To PDFA Job

This step uses GDPicture libraries to convert a PDF document to a PDFA format.

Screen Field / Button Description
Output File Name The template for the output file, which can include %FILENAME to give the input file name without extensions.
Continue on Error Set to true if the job should continue processing files after a file has failed.
PDF/A Output Type Select the type of PDF/A to output. The selection is: PDF/A-1a, PDF/A-1b, PDF/A-2a, PDF/A-2b, PDF/A-2u, PDF/A-3a, PDF/A-3b, PDF/A-3u, PDF/A-4, PDF/A-4e, PDF/A-4f
Allow Vectorization If set to false, the job will attempt to create the PDF/A files without Vectorization
Allow Rasterization If set to false, the job will attempt to create the PDF/A files without Rasterization
Debug Select true if you want to see more debug output.

PDF Recognition to JSON Job

This step extracts important data from PDF files in the form of key/value pairs. Users can define their expected keys and easily retrieve the data from those fields. No templates are needed.

Screen Field / Button Description
Output Expected Key JSON Creates a JSON file of expected key-values as output.
Output Expected Key Values By Page JSON Creates a JSON file of expected key-values by page as output.
Output PDF Data Pages Text Creates a .txt file of the pdf data by page.
Output PDF Data Page Details Creates a .txt file of key + bounding box, Values + Bounding Boxes by page
Output PDF Data Pages As CSV Creates a CSV containing page number, key, key bounding box, value, value bounding box, page number, page dimensions
Output PDF Data Pages As JSON Creates a CSV containing page number, key, key bounding box, value, value bounding box, page number, page dimensions
List PDF Data Pages As JSON If true, the results of ‘Output PDF Date Pages as JSON’ will be included in the logging
Date Format Set to input date format.
Use Currency Symbols Set to false if you want symbols and strings to be removed before returning currency values.
Page Limit Maximum number of pages to be processed.
Page Range A string representation of the page numbers you want to process. e.g., 1,3-4.
Current Culture Choose the expected format of date times if ambiguous e.g., 03/07/12
Expected Keys File Paths File paths of the text files containing expected keys. (use ‘|’ to separate multiple paths)
Ignore Case Expected Keys Choose if Casing is ignored when comparing recognition values to the Expected Keys set.
Custom Keys File Paths File path of the text files containing custom keys. (use ‘|’ to separate multiple paths)
Ignore Case Custom Keys Choose if Casing is ignored when comparing recognition values to the Custom Keys set.
Custom Keys Default File Path The default file path of the text file containing custom keys. (use ‘|’ to separate multiple paths)
Load Default Custom Keys Set to true if you want custom keys to be taken from the default path.
Skip Line Width This value will be multiplied by page width and any line with its width below this calculated value will NOT be skipped.
Skip Line Word Count Do not skip line if the number of words in the line is less than this value.
Skip Line Word Space Any line with an average space greater than this value will NOT be skipped.
Ignore Don’t Skip Space The only time special chunks are broken into smaller chunks is if the space between two adjacent words in the chunk is greater than this value.
Chunk Break Space Any chunk that has two adjacent words with a space between them greater than this value will be chunked.
Chunk Break Minimum If the average space of words in a chunk is smaller than this value, ‘Chunk break space’ will be used to break the chunk instead of this value.
Chunk Header Font Size Any chunk with an average font size below this value will not be considered as a header candidate.
Chunk Break Space Header Any header chunk that has two adjacent words with a space between them greater than this value will be chunked.
Break Words By Delimiter Switch this to true to break words by any of the Chunk Delimiters available (wordDelimiter, chunkDelimiter and chunkSpaceDelimiter).
Word Delimiter Enter one delimiter per index. If any series of characters match this pattern, we will break the word on that index.
Chunk Delimiter Enter one delimiter per line. If any word ends with any of these delimiters, they will be broken into chunks.
Chunk Space Delimiter Enter one delimiter per line.
Max Horizontal Space Skip analyzing key/value chunks that have a horizontal space greater than this value (points) between them.
Max Vertical Space Skip analyzing key/value chunks that have a vertical space greater than this value (points) between them.
Data Types To Split Choose the data types that the Chunker will attempt to split into smaller chunks.
Data Types To Check Choose the data types that will not be split once identified.
Data Types To Remove Choose the unwanted data types that will be removed in post processing.
Error On No Expected Keys When set to ‘Yes’, a file that does not contain any values for expected keys will be considered an error.
Regex Dictionary Terms File Path File path of a text file containing regex dictionary terms. (leave blank for default)
Plain Dictionary Terms File Path File path of a text file containing plain dictionary terms. (leave blank for default)
Debug Select true if you want to see more debug output.

Modern Compress PDF

This step uses GDPicture libraries to compress PDF documents with various options.

Screen Field / Button Description
Output File Name The template for the output file, which can include %FILENAME to give the input file name without extensions.
Continue on Error Set to true if the job should continue processing files after a file has failed.
Remove Annotations Select ‘Yes’ if you want to remove annotations.
Remove Blank Pages Select ‘Yes’ if you want to remove blank pages.
Remove Bookmarks Select ‘Yes’ if you want to remove bookmarks.
Remove Embedded Files Select ‘Yes’ if you want to remove embedded files.
Remove Form Fields Select ‘Yes’ if you want to remove form fields.
Remove Hyperlinks Select ‘Yes’ if you want to remove hyperlinks.
Remove JavaScript Select ‘Yes’ if you want to remove JavaScript.
Remove Metadata Select ‘Yes’ if you want to remove metadata.
Remove Page Thumbnails Select ‘Yes’ if you want to remove page thumbnails.
Pack Fonts Select ‘Yes’ if you want to pack fonts. This greatly optimizes output file size by focusing on fonts.
Pack Documents Select ‘Yes’ if you want to pack document content before saving.
Recompress Images Select ‘Yes’ if you want to recompress images.
Enable MRC Select ‘Yes’ if you want to enable MRC.
Downscale Resolution MRC Set the downscale resolution of the MRC compression. The default value is 100.
Preserve Smoothing Select ‘Yes’ if you want to preserve smoothing.
Image Quality Choose which Image Quality the output files will be. The default value is Medium.
Downscale Images Select ‘Yes’ if you want to downscale images.
Downscale Resolution Set the downscale resolution of the compression. The default value is 150.
Enable Color Detection Select ‘Yes’ if you want to enable automatic color detection.
Enable Char Repair Select ‘Yes’ if you want to enable character repair.
Enable JPEG2000 Select ‘Yes’ if you want to enable JPEG2000.
Enable JBIG2 Select ‘Yes’ if you want to enable JBIG2.
JBIG2 PMS Threshold Set the threshold of the JBIG2 pattern matching and substitution. The default value is 0.85.
Debug Select true if you want to see more debug output.

Validate PDFA

This step uses GdPicture libraries to validate if the input PDF document conforms to the selected PDFA version.

Screen Field / Button Description
Output File Name The template for the output file, which can include %FILENAME to give the input file name without extensions.
Continue on Error Set to true if the job should continue processing files after a file has failed.
PDF/A Validation Type Choose which PDF/A version the files will be validated against.
Report Location Target folder to save reports for files that failed to validate. The location must already exist, or the report will not save.
Debug Select true if you want to see more debug output.

Linearize PDF

This step uses GdPicture libraries to optimize PDFs for web-viewing, rendering the document one page at a time.

Screen Field / Button Description
Output File Name The template for the output file, which can include %FILENAME to give the input file name without extensions.
Continue on Error Set to true if the job should continue processing files after a file has failed.
Pack Document Select ‘Yes’ if you want the document to be packed before it is saved, reducing its size.
Enable Compression Select ‘Yes’ if you want to enable compression on the output pdf.
Debug Select true if you want to see more debug output.

Convert Any File To PDF (GdPicture)

This step uses GdPicture libraries to convert a large variety of file types to PDF. This step does not require an Office installation to process Office files.

Screen Field / ButtonDescription
Output File NameThe template for the output file, which can include %FILENAME to give the input file name without extensions.
Continue on ErrorSet to true if the job should continue processing files after a file has failed.
AuthorSet the Author metadata field in the output PDF. This can include %FILENAME% (original filename without the extension) or %DIRNAME% (directory name of original file)
TitleSet the Title metadata field in the output PDF. This can include %FILENAME% or %DIRNAME%
SubjectSet the Subject metadata field in the output PDF. This can include %FILENAME% or %DIRNAME%
KeywordsSet the Keywords metadata field in the output PDF. This can include %FILENAME% or %DIRNAME%
ProducerSet the Producer metadata field in the output PDF. This can include %FILENAME% or %DIRNAME%
MetadataSet the Metadata field in the output PDF. This can include %FILENAME% or %DIRNAME%
Convert Email AttachmentsSelect 'Yes' if you want to convert email attachments to PDF.
Attach Email Attachments To PdfSelect 'Yes' if you want to attach the email attachments to the output PDF. If set to 'No', the files will be merged to the PDF if they have been converted to PDF, otherwise they will be removed.
Email Page HeightSpecifies the page height, in points, of the resulting document when converting from the source Email file.
Email Page WidthSpecifies the page width, in points, of the resulting document when converting from the source Email file.
Email Page Margin BottomSpecifies the bottom page margin, in points, of the resulting document when converting from the source Email file.
Email Page Margin LeftSpecifies the left page margin, in points, of the resulting document when converting from the source Email file.
Email Page Margin RightSpecifies the right page margin, in points, of the resulting document when converting from the source Email file.
Email Page Margin TopSpecifies the top page margin, in points, of the resulting document when converting from the source Email file.
Email Prefer One PageSelect 'Yes' if you want the email to be converted to a single page PDF if possible.
Enable ICCSpecifies if the converter shall favor preserving the ICC profile, if present in the loaded document, during the conversion.
Html Emulation TypeSpecifies a type of a media to emulate.
Html Page HeightSpecifies the page height, in points, of the resulting document when converting from the source Html file.
Html Page WidthSpecifies the page width, in points, of the resulting document when converting from the source Html file.
Html Page Margin BottomSpecifies the bottom page margin, in points, of the resulting document when converting from the source Html file.
Html Page Margin LeftSpecifies the left page margin, in points, of the resulting document when converting from the source Html file.
Html Page Margin RightSpecifies the right page margin, in points, of the resulting document when converting from the source Html file.
Html Page Margin TopSpecifies the top page margin, in points, of the resulting document when converting from the source Html file.
Html Prefer CSS Page SizeGive any CSS page size declared in the page priority over what is declared in Html Page Width and Html Page Height. If set to false, the renderer will scale the content to fit the paper size.
Html Prefer One PageSpecifies whether the output document should contain a single page.
Load Only First PageSpecifies that all executed actions with the loaded document will be processed using only the first page of the document.
Page RangeUse the string of "1-5" for pages 1 to 5, or use the string of "1,5,6" to specify pages 1 and 5 and 6. You can use the string of "1,5,8-12" to specify pages 1, 5, 8 and all pages from page 8 to page 12, etc.
Pdf Bitonal Image Compression

Sets the scheme to be used to compress bitonal image data when converting/saving the currently loaded document to PDF format.

IDScheme
0None
1Flate
2CCITT4
3JPEG
4JBIG2
5JPEG2000
JBIG2 PMS Threshold Sets the threshold of the JBIG2 pattern matching and substitution. The default value is 0.85. Pdf Color Image Compression Sets the scheme to be used to compress color image data when converting/saving the currently loaded document to PDF format. Pdf Enable Color Detection Enables or disables the automatic color detection feature when converting/saving the currently loaded document to PDF format. Pdf Image Quality Sets the level of quality used to compress images with a lossy compression scheme, which are embedded in the newly produced PDF document when converting/saving the currently loaded document to PDF format. It must be a value from 0 to 100. 0 means the worst quality and the best compression, 100 means the best quality and the worst compression. Pdf Use Deflate On JPEG Specifies if the converter shall use additional Deflate compression for JPEG images in PDF output. Rasterization DPI Sets the rendering resolution to be used when converting vector content to raster content, if any is included in the currently loaded document. Tiff Enable Exif Rotate Specifies whether tiff encoder is using Exif rotate flag to handle page rotations. Timeout Milliseconds Specifies the timeout of the subsequent conversion process, in milliseconds. Default value is -1, which means no timeout. Txt Font Bold Specifies whether the font used for the resulting document when converting from the source txt file must have a bold style. Txt Font Italic Specifies whether the font used for the resulting document when converting from the source txt file must have an italic style. Txt Font Family Specifies the name of the font to be used for the resulting document when converting from the source txt file. Txt Font Size Specifies the text size, in points, to be used for the resulting document when converting from the source txt file. Txt Horizontal Text Alignment Specifies the horizontal text alignment of the resulting document when converting from the source txt file. Txt Page Height Specifies the page height, in points, of the resulting document when converting from the source Txt file. Txt Page Width Specifies the page width, in points, of the resulting document when converting from the source Txt file. Txt Page Margin Bottom Specifies the bottom page margin, in points, of the resulting document when converting from the source Txt file. Txt Page Margin Left Specifies the left page margin, in points, of the resulting document when converting from the source Txt file. Txt Page Margin Right Specifies the right page margin, in points, of the resulting document when converting from the source Txt file. Txt Page Margin Top Specifies the top page margin, in points, of the resulting document when converting from the source Txt file. Debug Select true if you want to see more debug output.

Combine Any File To PDF

This step uses GDPicture libraries to convert a large variety of file types to PDF, and then merges them to create a single output PDF. This step does not require an Office installation to process Office files.

Screen Field / ButtonDescription
Output File NameThe template for the output file, which can include %DIRNAME (original directory name)
Continue on ErrorSet to true if the job should continue processing files after a file has failed.
AuthorSet the Author metadata field in the output PDF. This can include %FILENAME% (original filename without the extension) or %DIRNAME% (directory name of original file)
TitleSet the Title metadata field in the output PDF. This can include %FILENAME% or %DIRNAME%
SubjectSet the Subject metadata field in the output PDF. This can include %FILENAME% or %DIRNAME%
KeywordsSet the Keywords metadata field in the output PDF. This can include %FILENAME% or %DIRNAME%
ProducerSet the Producer metadata field in the output PDF. This can include %FILENAME% or %DIRNAME%
MetadataSet the Metadata field in the output PDF. This can include %FILENAME% or %DIRNAME%
Convert Email AttachmentsSelect 'Yes' if you want to convert email attachments to PDF.
Attach Email Attachments To PdfSelect 'Yes' if you want to attach the email attachments to the output PDF. If set to 'No', the files will be merged to the PDF if they have been converted to PDF, otherwise they will be removed.
Email Page HeightSpecifies the page height, in points, of the resulting document when converting from the source Email file.
Email Page WidthSpecifies the page width, in points, of the resulting document when converting from the source Email file.
Email Page Margin BottomSpecifies the bottom page margin, in points, of the resulting document when converting from the source Email file.
Email Page Margin LeftSpecifies the left page margin, in points, of the resulting document when converting from the source Email file.
Email Page Margin RightSpecifies the right page margin, in points, of the resulting document when converting from the source Email file.
Email Page Margin TopSpecifies the top page margin, in points, of the resulting document when converting from the source Email file.
Email Prefer One PageSelect 'Yes' if you want the email to be converted to a single page PDF if possible.
Enable ICCSpecifies if the converter shall favor preserving the ICC profile, if present in the loaded document, during the conversion.
Html Emulation TypeSpecifies a type of a media to emulate.
Html Page HeightSpecifies the page height, in points, of the resulting document when converting from the source Html file.
Html Page WidthSpecifies the page width, in points, of the resulting document when converting from the source Html file.
Html Page Margin BottomSpecifies the bottom page margin, in points, of the resulting document when converting from the source Html file.
Html Page Margin LeftSpecifies the left page margin, in points, of the resulting document when converting from the source Html file.
Html Page Margin RightSpecifies the right page margin, in points, of the resulting document when converting from the source Html file.
Html Page Margin TopSpecifies the top page margin, in points, of the resulting document when converting from the source Html file.
Html Prefer CSS Page SizeGive any CSS page size declared in the page priority over what is declared in Html Page Width and Html Page Height. If set to false, the renderer will scale the content to fit the paper size.
Html Prefer One PageSpecifies whether the output document should contain a single page.
Load Only First PageSpecifies that all executed actions with the loaded document will be processed using only the first page of the document.
Page RangeUse the string of "1-5" for pages 1 to 5, or use the string of "1,5,6" to specify pages 1 and 5 and 6. You can use the string of "1,5,8-12" to specify pages 1, 5, 8 and all pages from page 8 to page 12, etc.
Pdf Bitonal Image Compression

Sets the scheme to be used to compress bitonal image data when converting/saving the currently loaded document to PDF format.

IDScheme
0None
1Flate
2CCITT4
3JPEG
4JBIG2
5JPEG2000
JBIG2 PMS Threshold Sets the threshold of the JBIG2 pattern matching and substitution. The default value is 0.85. Pdf Color Image Compression Sets the scheme to be used to compress color image data when converting/saving the currently loaded document to PDF format. Pdf Enable Color Detection Enables or disables the automatic color detection feature when converting/saving the currently loaded document to PDF format. Pdf Image Quality Sets the level of quality used to compress images with a lossy compression scheme, which are embedded in the newly produced PDF document when converting/saving the currently loaded document to PDF format. It must be a value from 0 to 100. 0 means the worst quality and the best compression, 100 means the best quality and the worst compression. Pdf Use Deflate On JPEG Specifies if the converter shall use additional Deflate compression for JPEG images in PDF output. Rasterization DPI Sets the rendering resolution to be used when converting vector content to raster content, if any is included in the currently loaded document. Tiff Enable Exif Rotate Specifies whether tiff encoder is using Exif rotate flag to handle page rotations. Timeout Milliseconds Specifies the timeout of the subsequent conversion process, in milliseconds. Default value is -1, which means no timeout. Txt Font Bold Specifies whether the font used for the resulting document when converting from the source txt file must have a bold style. Txt Font Italic Specifies whether the font used for the resulting document when converting from the source txt file must have an italic style. Txt Font Family Specifies the name of the font to be used for the resulting document when converting from the source txt file. Txt Font Size Specifies the text size, in points, to be used for the resulting document when converting from the source txt file. Txt Horizontal Text Alignment Specifies the horizontal text alignment of the resulting document when converting from the source txt file. Txt Page Height Specifies the page height, in points, of the resulting document when converting from the source Txt file. Txt Page Width Specifies the page width, in points, of the resulting document when converting from the source Txt file. Txt Page Margin Bottom Specifies the bottom page margin, in points, of the resulting document when converting from the source Txt file. Txt Page Margin Left Specifies the left page margin, in points, of the resulting document when converting from the source Txt file. Txt Page Margin Right Specifies the right page margin, in points, of the resulting document when converting from the source Txt file. Txt Page Margin Top Specifies the top page margin, in points, of the resulting document when converting from the source Txt file. Debug Select true if you want to see more debug output.

Combine PDFs

This step uses GDPicture libraries to convert a large variety of file types to PDF, and then merges them to create a single output PDF. This step does not require an Office installation to process Office files.

Screen Field / Button Description
Output File Name The template for the output file, which can include %DIRNAME (original directory name).
Continue on Error Set to true if the job should continue processing files after a file has failed.
Enable Numerical Ordering When enabled, documents will be merged in numerical order e.g. file1, file3, file11, file20, file101. Otherwise it will be ordered lexographically e.g. file1, file101, file11, file20, file3
Debug Select true if you want to see more debug output.

PDF To JPEG/PDF To PNG/ PDF To Tiff

These steps use GDPicture libraries to convert PDF files into the JPEG, PNG or TIFF format.

Screen Field / ButtonDescription
Output File NameThe template for the output file, which can include %FILENAME (original file name).
Continue on ErrorSet to true if the job should continue processing files after a file has failed.
Tiff Compression
(PDF to TIFF only)
Specifies the TIFF compression when saving images in TIFF format.
DPIThe dpi resolution to be used for rendering. A value of 72 will give the same result as Acrobat when zoom level is 100%. Values over 300 will cause excessive memory usage.
BrightnessAdjust the Brightness of the output image. Value must be between -100 and 100.
ContrastAdjust the Contrast of the output image. Value must be between -100 and 100.
SaturationAdjust the Saturation of the output image. Value must be between -100 and 100.
GammaAdjust the Gamma of the output image. Value must be between -100 and 100.
Auto DeskewSelect 'Yes' to try to deskew the image to about 15 degrees. Deskewing an image can help a lot to do OCR, OMR, barcode detection or just improve the readability of an image.
Crop Black BordersDetects and removes margins consisting of black color around the image.
Crop Black Borders ExDetects and sets to White, margins consisting of black color around the image. This does not have the same behavior as Crop Black Borders; The black borders are not removed but are set to blank. Therefore, the image dimensions are kept the same.
Crop Area HeightSpecifies the page height, in pixels, of the resulting document when cropping.
Crop Area WidthSpecifies the page width, in pixels, of the resulting document when cropping.
Crop Location LeftSpecifies the distance, in pixels, to crop from the left of the resulting document.
Crop Location BottomSpecifies the distance, in pixels, to crop from the bottom of the resulting document.
DespecklePerforms a 3x3 despeckle filter. It can remove black noise pixels from white backgrounds and visa versa. It also can remove random noise from multicolored backgrounds.
Despeckle MorePerforms a 5x5 despeckle filter. It can remove black noise pixels from white backgrounds and visa versa. It also can remove random noise from multicolored backgrounds.
Enable ICMSpecifies if color correction is used for images embedding an ICC profile. Enables ICM results in automatic pixel transformation while opening image including an ICC profile.
Remove Hole PunchRemoves all punch holes situated on the margins of your image.
Remove LinesPerforms line removal on the image in the direction specified.
Resize New HeightNew image height in pixels, of the resulting document when resizing.
Resize New WidthNew image width in pixels, of the resulting document when resizing.
Resize Interpolation ModeThe interpolation mode to use when resizing the image.
Rotate By AngleSelects whether to rotate by an angle specified, or by a preset type of rotation.
Rotation AngleThe angle of rotation for the image.
Rotation TypeThe method of rotation to apply to the image.
Page RangeUse the string of "1-5" for pages 1 to 5, or use the string of "1,5,6" to specify pages 1 and 5 and 6. You can use the string of "1,5,8-12" to specify pages 1, 5, 8 and all pages from page 8 to page 12, etc.
DebugSelect true if you want to see more debug output.

PDF To Text

This step uses GDPicture libraries to extract the searchable text from the pages of a PDF file, and creates an output text file. If the page is non-searchable, there is the option to enable OCR.

Screen Field / Button Description
Output File Name The template for the output file, which can include %FILENAME (original file name).
Continue on Error Set to true if the job should continue processing files after a file has failed.
Page Range Use the string of “1-5” for pages 1 to 5, or use the string of “1,5,6” to specify pages 1 and 5 and 6. You can use the string of “1,5,8-12” to specify pages 1, 5, 8 and all pages from page 8 to page 12, etc.
Page Separator A text separator that will go between the text of pages
Page Separator Placement The placement of the Page Separator. It can go above or below each page of text
Copy Input PDF To Target Folder Set to true to copy the input pdf to the output location after the text in extracted
Preserve Paragraph Specifies that the text extraction engine must preserve text paragraphs.
Paragraph Separator This property specifies the separator to be utilized for splitting paragraphs. It only takes effect when the PreserveParagraphs property is set to Yes.
Enable OCR Enables the use of the GdPicture OCR engine if the page in non-searchable.
OCR Dictionary Add the code of languages for OCR, separated by ‘+’. For example, ‘eng+deu+fra’ would add English, German, and French.
Debug Select true if you want to see more debug output.

PDF To Searchable PDF (GdPicture)

This step uses GDPicture libraries to carry out Optical Character Recognition on the input PDF, creating an invisible searchable text layer over the document.

Screen Field / Button Description
Output File Name The template for the output file, which can include %FILENAME (original file name).
Continue on Error Set to true if the job should continue processing files after a file has failed.
OCR Dictionary Add the code of any additional languages for OCR, separated by ‘+’. For example, ‘eng+deu+fra’ would add English, German and French. Codes can be found in the OCR Language Codes section.
DPI DPI of TIFF images generated or converted from the source PDF File. These images are then OCRed to create the searchable PDF.
Page Range Use the string of “1-5” for pages 1 to 5, or use the string of “1,5,6” to specify pages 1 and 5 and 6. You can use the string of “1,5,8-12” to specify pages 1, 5, 8 and all pages from page 8 to page 12, etc.
Thread Limit The GdPicture OCR engine processes multiple pages concurrently for optimal performance. This can take a heavy toll on the CPU. If needed, this option allows the number of pages processed consecutively to be limited.
Debug Select true if you want to see more debug output.

PDF Portfolio

This step uses GDPicture libraries to combine a folder of files into an integrated PDF unit. There are a wide range of file types that can be used to create the PDF Portfolio.

Screen Field / Button Description
Output File Name The template for the output file, which can include %DIRNAME (original directory name).
Continue on Error Set to true if the job should continue processing files after a file has failed.
Pdf Portfolio Type The initial view mode for the PDF Portfolio. This affects the way the user views the component files after opening the PDF Portfolio file.
Debug Select true if you want to see more debug output.

Smart Redaction

This step uses GDPicture libraries to identify and redact selected sensitive information in the input document.

Screen Field / Button Description
Output File Name The template for the output file, which can include %FILENAME (original file name).
Continue on Error Set to true if the job should continue processing files after a file has failed.
Redact Credit Card Numbers Set to true if you want to redact Credit Card Numbers.
Redact Email Addresses Set to true if you want to redact Email Addresses.
Redact IBANs Set to true if you want to redact IBANs.
Redact Phone Numbers Set to true if you want to redact Phone Numbers.
Redact URIs Set to true if you want to redact URIs.
Redact VAT IDs Set to true if you want to redact VAT IDs.
Redact Vehicle Identification Numbers Set to true if you want to redact Vehicle Identification Numbers.
Redact Social Security Numbers Set to true if you want to redact Social Security Numbers.
Redact Postal Addresses Set to true if you want to redact Postal Addresses.
Redaction Color Choose which color will be used for redacting.
OCR Dictionary Add the code of any additional languages for OCR, separated by ‘+’. For example, ‘eng+deu+fra’ would add English, German and French. To install additional dictionaries, see the language codes.
Detect Orientation Select ‘Yes’ if you want to auto detect orientation.
Page Range Use the string of “1-5” for pages 1 to 5, or use the string of “1,5,6” to specify pages 1 and 5 and 6. You can use the string of “1,5,8-12” to specify pages 1, 5, 8 and all pages from page 8 to page 12, etc.
Redaction Timeout (ms) Limits the amount of time in milliseconds that can be spent on a redaction. A value of zero means it will wait indefinitely.
Debug Select true if you want to see more debug output.

Detect Signatures

This step uses GDPicture libraries to identify pdf documents that contain digital signatures.

Any step that alters a digitally signed PDF will invalidate that PDF’s signature. This step allows signed files to be identified, and either copied or moved to a specified folder so the signature can be preserved.

If the Copy option is selected, the original signed file can also be attached to the copy that is processed. This means that the original is attached to the file that can be subsequently processed.

Diagram Description automatically generated

Screen Field / ButtonDescription
Output File NameThe template for the output file, which can include %FILENAME (original file name).
Continue on ErrorSet to true if the job should continue processing files after a file has failed.
Signed File NameSigned file name template which can include %FILENAME (original file name).
Signed File PathThe full path (excluding file name) for the location to copy/move the signed file before processing.
Create Signed Path

Setting this to 'Yes' will create the signed file path directory if it does not exist.

The file processing will fail if a signed file is processed, the signed path does not exist, and this is set to ‘No’.

Overwrite SignedSetting this to 'Yes' will automatically overwrite any file in the signed file path with the same name as the current signed file. The file processing will fail if the signed file already exists and overwrite is set to false.
Signed ActionThe action to take if a signed file is detected. It can either be copied or moved to the Signed File Path.
Attach Signed Document to OutputSetting this to 'Yes' will attach a copy of the signed document to itself before being saved in the output location. This ensures a signed copy will remain with the copy that is processed.
DebugSelect true if you want to see more debug information.

Key Value Pair Extraction

This step uses the GDPicture engine to extract information about key-value pairs in pdf document. The extra information included can be the Key or Value Bounding Box, Page Number, Confidence, and Data Type.

The user can also use JSON file to declare Expected Keys. These specific keys will be added to a separate output file if a value is found. Synonyms can also be declared for each Expected Key, so that a match for any of the synonyms will be counted as a match for the Expected Key. An example is below.

For example, we have used total and invoice number as the expected keys. grand total is a synonym for ‘total’, and there are two synonyms for invoice number in invoice no and inv no.

[

{

“expectedKey”:“total”,

“synonyms”:[“grand total”]

},

{

“expectedKey”:“invoice number”,

“synonyms”:[“invoice no”, “inv no”]

}

]

CSV Output Warning

CSV is a format commonly used by spreadsheet programs. These programs commonly transform numerical data or formula, and will save these transformations, overwriting the original data. To prevent these transformations, we add an apostrophe to the start of any possible transformations.

e.g. the phone number +44 115 496 0999 will appear as ‘+44 115 496 0999 in the CSV only.

The transformations are listed below.

  • Formula - these are generally for values that begin with +, -, =, or @, we add an apostrophe at the beginning for the CSV output. This prevents the CSV from producing unintended formulas and functions from these values.

  • Dates/Times – this covers many cases of date and time formats, as data can often be mistaken as a date or time, and then irreversibly transformed.

  • Long Numbers – this covers numbers that are 11 digits or longer, as they are transformed to decimal notation

We recommend removing the apostrophes when extracting the data. This only affects CSV output, so it may be easier to extract data from the other formats if possible.

Screen Field / ButtonDescription
Output File NameThe template for the output file, which can include %FILENAME (original file name).
Continue on ErrorSet to true if the job should continue processing files after a file has failed.
OCR Language

Add the codes of the languages for OCR and KVP extraction, separated by ‘+’ e.g. eng+fra

Codes can be found in the OCR Language Codes section.

DPIDPI used when performing OCR on the file as part of the KVP extraction process.
KVP Output FormatThis setting determines the file output format(s). KVP data can be output in JSON, CSV and XML. e.g. json,csv,xml.
Page RangeUse the string of "1-5" for pages 1 to 5, or use the string of "1,5,6" to specify pages 1 and 5 and 6. You can use the string of "1,5,8-12" to specify pages 1, 5, 8 and all pages from page 8 to page 12, etc.
AutorotateAutomatically rotate the page if the text does not have the correct orientation.
Trim SymbolsSetting this to 'Yes' will remove any symbols from the start/end of values, with the exception of the hash '#' or period '.' symbols.
Include Key Bounding BoxSetting this to 'Yes' will include the bounding box values for the key in the output.
Include Value Bounding BoxSetting this to 'Yes' will include the bounding box values for the value in the output.
Include Page NumberSetting this to 'Yes' will include the page number of the key value pair in the output.
Include Confidence

Setting this to 'Yes' will include the confidence score of the key value pair in the output.

Confidence is measured between 0 (no confidence) and 100 (full confidence).

Confidence ThresholdThe value of confidence (0-100) that a KVP must reach to be included in the output. Results under this confidence threshold will be discarded.
Include TypeSetting this to 'Yes' will include the data type of the key value pair in the output.
Expected KeysThe path to a JSON file for the expected keys and synonyms.
DebugSelect true if you want to see more debug information.

Pattern Redaction / Pattern Highlight

These steps use GDPicture libraries to identify and redact sensitive information (Redaction) or highlight important information (Highlight) in the input document based on a regular expression or terms list.

Screen Field / Button Description
Output File Name The template for the output file, which can include %FILENAME (original file name).
Continue on Error Set to true if the job should continue processing files after a file has failed.
Pattern A Regex pattern. The input pdf will be searched for matches to this Regex pattern, and any matches will be redacted/highlighted.
Terms Filepath The path to a text file containing a list of terms to redact/highlight. Each line will be treated as a pattern, and any matches will be redacted/highlighted.
Case Sensitive Determined whether or not the regex pattern matching should be case sensitive.
Red The amount of red color to be used for the redaction/highlighted region color. Use a value between 0 and 255. Default is 0.
Green The amount of green color to be used for the redaction/highlighted region color. Use a value between 0 and 255. Default is 0.
Blue The amount of blue color to be used for the redaction/highlighted region color. Use a value between 0 and 255. Default is 0.
Alpha The transparency value of the resulting region color. Use the value between 0 (full transparency) and 255 (full opacity). Default is 255.
Debug Select true if you want to see more debug output.

Split PDF (GdPicture)

This step uses GDPicture libraries to split PDF files based on the ranges, bookmarks, or into single pages.

Screen Field / ButtonDescription
Output File NameTarget file template which can include %UNIQUEn (unique number starting at 1, zero padded to n digits) %FILENAME (original filename without the extension) and %PAGEn (first page of split, zero padded to n digits)
Continue on ErrorSet to true if the job should continue processing files after a file has failed.
Retain MetadataGenerated files will include metadata(such as Author and Title) from the original file.
Split Type

Sets the way that the input file will be split.

One of:

  • Split into single pages

  • Split by ranges (See below)

  • Split by repeating ranges (See below)

  • Split by bookmarks

RangesSet of page ranges separated by commas that defines which pages from the original should be extracted.
Repeat Every (Pages)Apply the page range to each set of Page Ranges pages within the document. For example if 2-4 is specified for page ranges, and 4 is specified as the repeating range, then the range is re-applied every 4 pages.
Remove Unused ResourcesRemoves unused resources from a pdf file to minimize file size.

Split by Barcode

This step uses GDPicture libraries to identify different barcode types in a PDF, and split the document at each instance of a barcode.

Screen Field / Button Description
Output File Name Target file template which can include %UNIQUEn or %INDEXn (unique number starting at 1, zero padded to n digits) %FILENAME (original filename without the extension) and %PAGEn (first page of split, zero padded to n digits)
Continue on Error Set to true if the job should continue processing files after a file has failed.
Read QRCode Set this to true to recognize QRCode barcodes.
Read MicroQR Set this to true to recognize MicroQR barcodes.
Read DataMatrix Set this to true to recognize DataMatrix barcodes.
Read PDF417 Set this to true to recognize PDF417 barcodes.
Read Aztec Set this to true to recognize Aztec barcodes.
Read MaxiCode Set this to true to recognize MaxiCode barcodes.
Read Industrial2of5 Set this to true to recognize Industrial2of5 barcodes.
Read Inverted2of5 Set this to true to recognize Inverted2of5 barcodes.
Read Interleaved2of5 Set this to true to recognize Interleaved2of5 barcodes.
Read Iata2of5 Set this to true to recognize Iata2of5 barcodes.
Read Matrix2of5 Set this to true to recognize Matrix2of5 barcodes.
Read Code39 Set this to true to recognize Code39 barcodes.
Read Codabar Set this to true to recognize Codabar barcodes.
Read BcdMatrix Set this to true to recognize BcdMatrix barcodes.
Read DataLogic2of5 Set this to true to recognize DataLogic2of5 barcodes.
Read Code128 Set this to true to recognize Code128 barcodes.
Read Code93 Set this to true to recognize Code93 barcodes.
Read EAN13 Set this to true to recognize EAN13 barcodes.
Read EAN8 Set this to true to recognize EAN8 barcodes.
Read UPCA Set this to true to recognize UPCA barcodes.
Read UPCE Set this to true to recognize UPCE barcodes.
Read ADD5 Set this to true to recognize ADD5 barcodes.
Read ADD2 Set this to true to recognize ADD2 barcodes.
Page Range Specifies the page range to be scanned for barcodes. A value of * will scan every page for barcodes.
Pattern A Regex pattern. The input pdf will be searched for matches to this Regex pattern, and any matches will be redacted.
DPI DPI of TIFF images generated or converted from the source PDF File. These images are then scanned for barcodes.
Retain Metadata Generated files will include metadata(such as Author and Title) from the original file.
Remove Unused Resources Removes unused resources from a pdf file to minimize file size.
Left X coordinate of the Top Left Point of the rectangle you want to recognize the barcode.
Top Y coordinate of the Top Left Point of the rectangle you want to recognize the barcode.
Width Width of the rectangle you want to recognize the barcode.
Height Height of the rectangle you want to recognize the barcode.

Step Type Properties

Each of the Step Types referred to in the previous section will have a set of properties such as that shown below for “Convert any File to PDF”. Each property has a description associated with it which is displayed when the property is highlighted.

To look for a property, you can either use the scroll bar on the right-hand side or the search bar at the top. The search bar looks for an exact match of the text that you type but will offer suggestions that start with the text you have currently typed. Selecting a suggestion will jump you to the property and select it for editing.