Upgrade your Document Automation server easily

Upgrading

Upgrade overview

Two versions of Document Automation Server (DAS) cannot be installed at the same time.

Upgrading DAS will involve:

  • Backing up your job definition files.

  • Uninstalling your current installation.

  • Downloading the latest version from our website.

  • Installing the new version, which should only take a few minutes.

When the DAS user interface (UI) starts, it checks the version number of each of the step types and makes the required updates to any step. A status bar shows the progress of this process. This only happens the first time the job definition is checked by DAS. Once completed, your previous job list will appear in the new version as it was previously.

Updating from older versions (before 5.0) may require either manual opening, editing, and saving of a job, or recreating the job from a new one.

On the install server, DAS requires .NET Framework 4.7.2. The DAS installer will check this requirement before installation.

License keys

The license key will remain from your previous version. For upgrades between minor releases, the license key should still be valid for the newly installed version. If you’re on a new major version (5.0, 5.5, 6.0), the license key will no longer be valid. In this case, contact Support for a new license key.

Version 6.0.2411.11

New steps

New steps have been added to this release based on customer requests. The list of new steps is as follows:

  • Convert PDF To Office — Converts and inputs a PDF file to the selected Office type. Options include .docx, .pptx, .xlsx, and .svg.

  • Convert Any File To Office — Converts many different input file types to the selected Office type. Email files can also have their attachments converted. Output file options include .docx, .pptx, .xlsx, and .svg.

  • Get Document Information — Creates a report on the searchability of a PDF document. The report includes information about page count, searchable page count, and image page count. Output options include .txt, .csv, .json, and .xml.

  • Pattern Enumeration — Selects a pattern and/or list of terms. A csv. file report that lists the occurrences of each term in the input PDF is created.

Improvements

Add an option to force a dots per inch (DPI) value when converting an image to PDF

Ref: ADX-774

Previously, if a TIFF file had the incorrect DPI, users faced an issue when converting to PDF. The dimensions of the output PDF file were warped based on the DPI.

With this release, we fixed the issue by adding an option to set ConvertTiffForceDPI. The option let users select a sensible DPI, so the output will have reasonable dimensions.

KVP expected keys now gives all data points

Ref: ADX-773

Previously, when retrieving an expected key from the Key Value Pair Extraction step, only the key and value would be retrieved.

With this release, it’s possible to retrieve other data points (alongside the key-value pair output) such as:

  • Bounding box for the key and/or the value

  • Page number

  • Confidence score

  • Estimated data type (for example, email address, credit card number, etc.)

Improve license naming when opening ADX for the first time and entering a new key

Ref: ADX-760

When opening Autobahn DX for the first time, you’re required to enter your license key on a prompt that appears. However, the message displayed on that prompt was unclear and confusing.

With this release, we improved the message, so as to remove the areas of ambiguity and ensure clarity.

Bug fixes

GdPicture HTML/EML/MSG conversion issue

Ref: ADX-820 and ADX-818

An update to Google Chrome and Microsoft Edge caused an issue with conversions from .html, .eml, and .msg files when using GdPicture steps. Though we updated the steps to fix the issue for Google Chrome in a previous release, the update for the same for Microsoft Edge was remaining.

With this release, we introduced a new configuration setting called WebBrowserPath that can point toward a substitute for Portable Chromium, such as blink binaries, to resolve the issue for Microsoft Edge.

Incorrect default value for GdPicture conversion settings

Ref: ADX-811 and ADX-808

In GdPicture conversion settings, the default value for both TrackOfficeDocumentRevisions and SpreadsheetRenderOnlyPrintArea should be No. However, the default value for both of these parameters was set to Yes.

With this release, we updated the default value to No for both of these parameters for all the newly created GdPicture conversion settings.

Page Range does not work for Smart Redaction

Ref: ADX-793

The Smart Redaction step has a Page Range property that will determine on which pages the redaction will be applied. However, there was an issue with the Page Range property, which resulted in all pages being redacted.

With this release, we fixed the issue so that only the pages specified in the Page Range property will be redacted.

Changing the order of steps does not get saved automatically

Ref: ADX-768

Previously, if an existing job had multiple steps, and the steps were moved around, Autobahn DX didn’t recognize the move as a change in the job. As a result, when users left the Designer tab, Autobahn DX didn’t trigger a prompt to save the changes.

With this release, we fixed this issue so that moving steps within a job is seen as a change to the job and hence, Autobahn DX triggers the save prompt.

Version 6.0.2404.10

Improvements

Update documentation on Kingfisher Jobs

Ref: ADX-746

When using Kingfisher jobs, a user requires a separate instance of Kingfisher with a job set up in that program for Autobahn to call. It is important that this job is set to single core, so this point has now been emphasized in the Reference Guide for users.

Designer tab can be slow to load

Ref: ADX-738

When switching to to Designer tab, Autobahn has to load the data from the selected job. The time taken to load this data has now been improved to be much faster, making the user experience when switching between tabs better.

Skip previously OCRed pages in GdPicture OCR step

Ref: ADX-737

Autobahn provided the user with a start time for the entire job, and and end time once all jobs were complete. However, some customers requested more information about when each individual process starts. We have now added this as the Job Start time in the logs.

Start time of each job process is now shown in logging

Ref: ADX-736

Currently, the GdPicture OCR step will OCR every page that it processes. In some cases, customers prefer that previous OCR/hidden text layers are skipped instead. We have now added this choice as the options OCR Searchable Pages and OCR Hidden Text. Setting will prevent text from being overwritten with a new OCR text layer on previously OCRed pages.

Custom step now provided with the error folder as an extra argument

Ref: ADX-733

When using a custom step, Autobahn provides the input and output file locations to the script file. We have now included the error folder as an extra flag send to the script to give users more options when dealing with files that do not process in the custom script as expected.

Custom script step now warns users if custom script file field left blank

Ref: ADX-732

The custom script step needs the name of the script file that it will run. If this field is left empty, a helpful message box will now appear to warn the user that there needs to be a value for the step to function.

Remove reliance on license.txt

Ref: ADX-729

In previous version of Autobahn, the license.txt was checked to see if jobs needed upgrading. It would also be the version number displayed in the Help tab. This has now been changed to use the version number in the program itself. The license.txt will still be maintained for legacy.

Update PDF To Text step to new GdPicture class

Ref: ADX-728

The underlying GdPicture library is always improving, and a specific class for text extraction was added that fit the requirements for the current PDF To Text step perfectly, while improving on performance and adding new settings and features. This change will help keep the step performing at the best level for future GdPicture upgrades as the library is constantly improving.

Add option to replace invalid characters found in barcodes

Ref: ADX-727

When using the Split By Barcode step, it is very common to use the barcode value in the output filename. This can cause problems in the case that the barcode contains characters that are invalid for filenames. In this situation, we now find and replace these characters with an underscore by default, but this can be customised in the step by changing the “Replace Invalid Characters With” field.

Update terms in the High Availability step

Ref: ADX-388

The high availablity step is design to utilize 2 separate instances of Autobahn running on separate hosts. The first host would process the job, and if it ever ran into issues and stopped processing, the second host would take over. The terms refering to the two hosts has been updated to Controller for the processing host, and Replica for the host that monitors and takes over from a Controller that stops processing.

Bug Fixes

When “Remove Hidden Text” is set to “True”, pages with “visible text” are re-OCRed

Ref: SDK-210

An issue was introduced with the previous SDK update that caused pages to be re-OCRed when they should have been skipped. Remove hidden text should not have affected visible text, but these page were treated as needing OCR again. This has now been fixed and visible text is no longer re-OCRed.

Temporary files not deleted when converting email files

Ref: ADX-752

There was an uncommon issue when converting email files to PDF when they contained attachments. A temporary files would be created when processing the attachments, but this file would not be cleaned up after the conversion. It is now handled as expected.

SharePoint step displaying incorrect license information

Ref: ADX-751

The SharePoint Download and Upload steps display the expirtation date of the current Autobahn license. This information was incorrect in previous versions, but will now display the correct date.

Stamp PDF step not returning error messages

Ref: ADX-741

There was an issue with the Stamp PDF step where error messages were not returning if the job failed due to invalid job settings. The validation for the settings has now improved and will return relevant error messages if needed. Also, Page Range will now accept * as a valid input, to bring this in line with other steps.

Send Documents step renaming prevents moving original files after processing

Ref: ADX-740

The Autobahn Send Documents email step has a feature that allows the file to be sent using it’s original filename. This renaming caused the file to not move out of the input folder if the option was also set. This has now been fixed, and you can also use %EMAILNAME% in the Rename Input File settings to set the input filename to it’s original name before moving.

Version 6.0.2311.16

Bug Fixes

Command line returns before command has completed.

Ref: ADX-722

Command line returns before command has completed. This has been fixed in the latest build.

Autobahndx.exe Exit Code returns early before files are in output location

Ref: ADX-720

Sometimes, the job will return an exit code saying that the job has succeeded, but the output has not been copied to the output location. This has now been fixed and exit codes will return after the step is completed.

Version 6.0.2311.03

New Steps

Five new steps have been added to Autobahn DX in this release, which all utilize the GdPicture toolkit’s capabilities, including the new Key-Value Pairs processing engine.

Key Value Pair Extraction

Extracts key-value pairs from PDFs alongside information about their location, data type etc. The step outputs the information in the choice of JSON, CSV, and XML output. The user can also specify Expected Keys, and the closest match for each Expected Key will be output in a separate file.

Pattern Redaction

Redacts text in the input PDF based on a regex expression or terms listed in a text file. The user can also customize the color of the redaction zone.

Pattern Highlight

Adds a highlight to text in the input PDF based on a regex expression or terms listed in a text file. The user can also customize the color and transparency of the highlight.

Split PDF (GdPicture)

Split a PDF based on the criteria specified. The available options are to split by page ranges, bookmarks, or split into single pages.

Split By Barcode

Split a PDF based on the barcodes found. The output files can be named based on the value of each barcode found. After specifying the barcode types, page ranges and regular expressions can further specify the barcodes to split the document on.

Improvements

JSON and XML files can now be converted to PDF

Ref: ADX-673

The ‘Convert Any File To PDF (GdPicture)’ step previously could not process JSON or XML files as they were not supported. Now, they are treated as text files when converted to PDF, so they will now process successfully.

Updated license key messages to better reflect license status

Ref: ADX-661 and ADX-625

In various areas, generic messages were returned that did not fully reflect the characteristics of the license being used. The messages will now depend on the license, and so will give a clearer and more individual message in these cases.

Add SSN and Postal Address support to Smart Redaction

Ref: ADX-599

Improvements to the GdPicture engine have increased the possibilities of the Smart Redaction step, adding two more redaction categories. You can now choose to redact both Social Security Numbers and Postal Addresses from input PDFs.

Source / Destination folder warning popup for SP and Azure Connectors

Ref: ADX-584

Connector steps in Autobahn are unique in the fact that they ignore either the input or output folder of the job. This is because they instead connect to email, SharePoint, or Azure to send/retrieve their files. When adding an email job, a pop-up would mention that the unused input/output folder would be a dummy folder. To avoid confusion, this warning pop-up message has now been extended to both the SharePoint and the Azure steps.

Users can now control threads used in GdPicture OCR

Ref: ADX-565

The GdPicture OCR step is unique in that it uses multiple concurrent threads to process multiple pages simultaneously, drastically increasing speed. However, this also uses more CPU to run these threads. We have added the option to limit the threads so as not to use all the CPU power. If needed we would recommend between 2-8 threads.

Implement fileProcessOrder for merge steps

Ref: ADX-563

We are adding features to the GdPicture merge steps that will bring it in line with the old merge step. This feature allows some control over the order of the merging of files. Currently, all files are processed alphabetically. This option allows files to be merged in a numerical order instead.

Bug Fixes

Fields in the designer not being recognized as changed

Ref: ADX-624

There were a few fields in the designer that would not register the job as changed if they were edited. This would mean that the job would not ask you to save changes when changing tabs without saving. This has now been fixed, and the job will correctly recognize that a job has been changed.

Jobs stuck in ‘Finishing job execution’ after file error

Ref: ADX-591

In very specific circumstances, Autobahn had an intermittent issue when a file had failed during processing. The issue would leave the job in the running state, and so it would be unable to run again until the service was restarted. This issue was investigated and has now been fixed in the latest version.

PDF File fails Compression with PDFInvalidContent status

Ref: ADX-587

This issue was specific to files with a very specific structure. When ‘Remove Formfields’ and ‘Remove JavaScript’ is enabled in Compression step, the file fails to process. This was due to leftover data when the formfields were removed, and has now been fixed.

Generic Error when running Email steps in ADX

Ref: ADX-585

Running Autobahn email steps with Modern Authentication was giving a ‘Generic Error’ for specific setups. The cause of this issue has been resolved and email step should now run as normal.

SharePoint Download fails to download nested files from SPO

Ref: ADX-580

The SharePoint Download step had an issue that prevented it from successfully downloading files stored in subfolders of the target SharePoint location. Only files at the top level would successfully download. This issue has been resolved, and all folders will now download successfully.

OCR Removing visible text during processing

Ref: ADX-578 (SDK-199)

For very specific files, visible text was being removed in addition to hidden text during the OCR process when in native mode, damaging the OCRed file. This issue was found in the Aquaforest SDK, but a fix was developed and has been deployed to all products using this SDK for OCR processing.

Version 6.0.2304.10

Improvements

Improve properties in PDF to Image steps

Ref: ADX-540

The PDF to TIFF/PNG/JPEG steps contain may properties for user customization. We have added a default value of 0 to the following properties: Brightness, Contrast, Saturation, Gamma, Crop Left, Crop Top. We have also renamed the cropping properties to better reflect their purpose. Crop Height and Crop Width determine the output area of the crop, and so have been renamed to Crop Area Height and Crop Area Width. Crop Left and Crop Top determine the location of the crop area from the top left corner, and so have been renamed to Crop Location Left and Crop Location Top.

Improve error messages for invalid page ranges

Ref: ADX-491

In the PDF to TIFF/PNG/JPEG steps, if a document was not in range for the given page range, the error message would simply state that the file failed to save. This has been improved so that the user will be told that the file does not fall within the specified page range, giving a much clearer reason for failure.

Add ‘Order by Int’ option to Combine PDFs step

Ref: ADX-490

A common issue that users come across with sorting is that, by default, file11 comes before file2 alphabetically. This can be frustrating when trying to merge files in a numerical order. The new Combine PDFs GdPicture step now has the option to sort files by number. This sorts files with the same prefix by a number comparison of their numbers, putting file2 before file11.

Improvement for PDF to TIFF (GdPicture) to align with other PDF to Image steps

Ref: ADX-451

Our previous implementation of the ‘PDF to TIFF (GdPicture)’ step did not allow many options for customization to the user. The step has now been rewritten to be more in line with the other PDF to Image steps and now has access to many more properties that users can change depending on their use case.

General Improvements and Maintenance to Autobahn code

Ref: ADX-444

As a part of keeping our products up to date, we review existing code and aim to make improvements when necessary. This change focused on improving the steptype checks at the start of each decision tree, making them slightly faster and easier to update when new steps are added.

Bug Fixes

Extensions filter in Azure download step not working as expected

Ref: ADX-544

The extensions property filters out any files that do not have the given extension so that they are not downloaded. The list of extensions to be picked up are added in as a string separated by commas. e.g. ‘docx,xlsx,pptx’. However, files with extensions that were a part of any of the full extensions in the list would also be picked up. In the example, doc xls ppt files would all be picked up incorrectly. This has now been fixed and only exact matches are picked up.

Azure download step throws error for some special chars

Ref: ADX-539

Azure Blob storage allows for the following special characters to be included in file names: ” * : < > ? \ |
However, these symbols are illegal to use in the Windows file manager. To allow users to download these files, we replace these symbols with a replacement symbol, which can be set as the new Replace Invalid Characters With property. The default value is an underscore.

Periods removed from file name of Cloud steps

Ref: ADX-533

The Cloud steps had an issue where filenames containing multiple periods would have the last period removed along with the subsequent text. For example _file1.2.3.pdf_ would be output as _file1.2.pdf_. The final text was being treated as a file extension and was replaced with .pdf. This issue has now been fixed for all future versions.

Folder created as file when downloaded from Azure

Ref: ADX-500

A new issue came from a new setting in Azure Blob Storage. The ‘HNS Enabled’ setting was causing folders to be downloaded as if they were files when set to true. This was due to a change in behavior of methods in the code. This has now been fixed and only files will be downloaded, regardless of the blob settings.

Standard OCR DPI property only available when ‘Convert To TIFF’ set to ‘Yes’

Ref: ADX-477

There was an unintended restriction set on the DPI property in the ‘PDF to Searchable PDF (Standard)’ step. It was only selectable if the ‘Convert to TIFF’ property was set to ‘Yes’. This was not intended as they are independent properties, and so this issue has now been fixed and DPI can always be changed.

Create XML Property File does not escape special XML characters

Ref: ADX-386

There was an issue with the Create XML Property File step where metadata fields were not being escaped. This was an issue if the metadata fields of the input pdf contained any of the following five characters: < > & ‘ “ These would cause the output XML to be invalid. This has now been addressed and the characters are properly escaped before XML creation.

Version 6.0.2302.08

Improvements

Add option to Overwrite signed files with same name

Ref: ADX-448

The new ‘Detect Signatures’ step copy/moves digitally signed files to a user defined folder to preserve the signatures. This step would fail if a file with the same name already existed in the defined folder. The user now has the option to enable overwrite these files.

Add option to create the signed file path folder if it doesn’t exist

Ref: ADX-447

With the new ‘Detect Signatures’ step, users could copy/move signed files to a specific folder to prevent the signing from being broken. Previously, processing the file would fail if the folder did not exist. There is now the option to create the folder automatically if it does not exist.

Review the ‘Used by another process…’ job error’s severity

Ref: ADX-323

Autobahn has the option to move or delete input files after processing. However, if the input file is forced to stay open by another process, Autobahn will be unable to move the file. Previously in this case, the job would error out and would be unable to restart without manual intervention. A new config option ‘ErrorIfDeleteInputFails‘ has now been added. If set to true, the individual file will fail but the job will continue running for other files.

Version 6.0.2301.18

New Steps

We have continued to improve the variety of steps Autobahn can offer by utilizing the functions available in the GdPicture toolkit. We have added a step that allows for automatic Smart Redaction, and a step that with Detect Signatures on pdf files.

**Smart Redaction
**The Smart Redaction step has many options for redacting a variety of sensitive data, including credit card numbers, email addresses, and phone numbers. If a section of text matches the criteria, the visible text will be covered and searchable text removed.

**Detect Signatures
**The Detect Signatures step will analyze pdf files and will act if it detects a signature in a file. It has the option to copy or move the signed file from the processing location to a custom location. If the file was copied, then there is an option to attach the signed copy to the processing file. The signature in the attachment will remain intact even when the base file is processed.

Improvements

Allow users to disable SSL in email steps

Ref: ADX-429

SSL is a security feature that helps establish secure connections when sending emails. However, sometimes emails are only sent internally, so requiring an SSL certificate is may not be required. Though we recommend keeping SSL enabled in most cases, it can now be disabled by setting the config option “secureoption” to None. It is set to Auto by default.

Add step to detect Digital Signatures

Ref: ADX-409

PDF documents containing digital signatures are becoming more common. A signed file will lose its signature when processed by jobs in Autobahn. The new Detect Signatures step can be used at the start of a job to filter these files. They can then be copied/moved to a new location before processing.

Normalize the OCR Dictionary properties

Ref: ADX-402

In the previous release of Autobahn, the GdPicture OCR step had two properties that related to the OCR language option. It used a drop-down box for common languages, and a text box to enter any additional languages. The drop-down box was redundant as you could add as many languages as you wanted into the text box below. We decided to remove the drop-down box for clarity and conciseness.

DPI is not a changeable property for GdPicture PDF to Image steps

Ref: ADX-401

The new GdPicture PDF to Image steps (PDF to JPEG and PDF to PNG) did not include an option to set the DPI. This option was set to 300 by default, which is a higher quality that some people need. It is now a changeable property in the step’s settings, allowing greater flexibility and control of the output.

Add a Search bar for Step Properties

Ref: ADX-385

Some of the available steps in Autobahn have many properties, particularly the ‘Any File to PDF’ steps which have settings for multiple file types. We have now added a search bar above the properties which can be used to jump to a specific property without scrolling.

Bug Fixes

Inconsistency in naming of PDF to Image step property value

Ref: ADX-411

The ‘PDF to JPEG’ and ‘PDF to PNG’ steps both had a small visual issue where the Remove Lines property had a default value of No, but the options only contained Horizontal, Vertical, and None. The values of None has now been change to No for consistency.

%JOBSTATUS% does not return job status in email alerts

Ref: ADX-408

Autobahn jobs can be set to send out email alerts after jobs are finished. These emails contain information specific to the job such as the name, source, and target. However, the status of the job was not being correctly replaced with the specific job value. Instead, it would always return as ‘Status’. This has now been fixed and the emails will give a proper status. Test email alerts now give a ‘Testing’ status.

Version 6.0.2211.18

Upgrading

License Key

This is a major release of Autobahn and will require a new license key to use. If you have current Support and Maintenance Cover (SMC) for a perpetual license or a current subscription license, please contact [email protected] to request a new key. From 31 March 2022 Autobahn DX is only available as a subscription product. Existing permanent Autobahn DX 5.5 licenses will remain valid (and function as permanent licenses). Additional SMC for existing permanent licenses will continue to be available for version 6.0.

.Net Framework

This release of Autobahn requires a .Net Framework of 4.7.2 or higher. The Autobahn installer will check this requirement before installation.

New Steps

The main improvement of this Autobahn release is the inclusion of 11 new steps. These steps all use the GdPicture toolkit, which includes a wide variety of functions and will be constantly improved and updated. A brief description of the new job steps is below:

  • Validate PDFA
    Checks PDFs against a PDFA version and gives an error for the file if it does not conform.

  • Linearize PDF
    Optimizes PDFs for web-viewing, rendering the document one page at a time.

  • Convert Any File To PDF (GdPicture)
    Able to convert a large variety of file types to PDF. This step does not require an Office installation to process Office files.

  • Combine Any File To PDF
    Converts a folder of files into PDF and then merges them, so create a single output PDF. This step uses GdPicture and does not require an Office installation to process Office files.

  • Combine PDFs
    Merges a folder of PDF files to create a single output PDF.

  • PDF To JPEG
    Converts an input PDF to a JPEG file using the GdPicture toolkit.

  • PDF To PNG
    Converts an input PDF to a PNG file using the GdPicture toolkit.

  • PDF To TIFF (GdPicture)
    Converts an input PDF to a TIFF file using the GdPicture toolkit.

  • PDF To Text
    Extracts the searchable text from the pages of a PDF file and creates an output text file.

  • PDF To Searchable PDF (GdPicture)
    Carries out Optical Character Recognition on the input PDF using the GdPicture toolkit, creating an invisible searchable text layer over the document.

  • Create Pdf Portfolio
    Combines a folder of files into an integrated PDF unit. There are a wide range of file types that can be used to create the PDF Portfolio.

GDPicture input file types

The following file types can be used with the Convert Any File to PDF and Combine Any File to PDF steps.

Description Suffix
Windows bitmap format BMP
Microsoft Word (.doc) binary file format DOC
Microsoft Word OpenXML DOCX
Microsoft Word Macro-Enabled OpenXML format DOCM
Enhanced Windows Meta-format EMF
Graphics Interchange Format GIF
HTML format HTML
Icon and cursor format (single or multi page) ICO
Joint Photographic Expert Group JPEG
Portable Gray-map File PGM
Portable Network Graphics Format PNG
Portable Pix-map File PPM
Microsoft PowerPoint Presentation format PPTX
Microsoft PowerPoint Macro-Enabled Presentation format PPTM
Rich Text File Format RTF
Tagged Image Format TIFF
Plain text file TXT
Standard Windows Meta-format WMF
Microsoft Excel (.xls) binary file format XLS
Microsoft Excel Spreadsheet format XLSX
Electronic Mail format EML
Outlook Item File Formal MSG
Scalable Vector Graphics File SVG
Device Independent Bitmap format DIB
24-bit compressed JPEG Graphic format JPE
MIME HTML format MHTML
OpenDocument Text file format ODT
Portable Bitmap Image file format PBM
Picture Exchange image file format PCX
TARGA raster graphics format TGA

Improvements

The job upgrader tool could be automatic

Ref: ADX-390

A previous issue in Autobahn was upgrading jobs between versions. This was solved with an upgrading tool created for version 5.5. However, the visibility for the tool was low and it was a very manual process. All jobdef files need upgrading if they are not the correct version, so it made sense to make this process automatic. When opening Autobahn, the jobdef folder will now be checked for jobs that need upgrading. The process will be shown in a brief pop-up, but this will be very quick if you only have a few jobs that need to be upgraded.

The BCL Service being turned off gave unhelpful errors

Ref: ADX-353

The Any File To PDF steps that use BCL previously had no check to see if the BCL service was running. This would be an issue if the service was not running, as the job fails, and the error returned would not explain the cause of the issue. The service is now checked to see if it is running, and an informative error message is returned if it is not running.

Custom script examples are not easy to use

Ref: ADX-352

The Custom Script Step allowed users to run their own scripts as part of an Autobahn job. However, the custom folder we provide had example scripts that were old and not easy to use. We recently created new scripts with simpler functions. The examples were added to the custom folder, and there are examples for these new scripts in the Reference Guide.

Custom script documentation needs updating

Ref: ADX-347

The Custom Script section of the Autobahn Reference Guide had become outdated, and the information was both confusing and had links to sites that no longer existed. The documentation has now been updated to provide examples for example scripts with more straightforward functions. Any information that was no longer beneficial has been removed.

When first opening ADX with no license key set, the license prompt did not appear at the front

Ref: ADX-338

When first opening Autobahn, a prompt will appear asking for a license key. However, this prompt did not have priority and would often appear behind other windows. This has now been improved so that the prompt will be brought to the front of all the current windows, so users can more easily see it.

Bug Fixes

Warnings sometimes displayed when switching between jobs

Ref: ADX-362

If the settings of a job required the job to use work folders, a warning will be displayed to alert the user. This alert was incorrectly popping up when switching to a job that does not require work folders, as the check was being carried out while loading the settings for the new job. This has been fixed and will no longer occur.

Non-Latin characters are not rendered correctly in logging

Ref: ADX-360

Files that contained non-Latin characters were not displayed as expected in the logs, instead appearing as question marks. This was due to the character encoding, and this has now been updated so that most common characters will now correctly display in the logs.

Single clicking a job did not properly select the job

Ref: ADX-354 and ADX-345

In the Autobahn Job Manager tab, single clicking a job would highlight that job, but would not select it. This would cause confusion when then clicking the Design tab, as it would take the user to the previously selected job. This issue also had potential to create blank jobs if creating a new job was canceled before saving it. In this version, the jobs now are selected after a single click, as expected. New jobs that are canceled are also cleared from being selected, so blank jobs are no longer an issue.

Job Name label not updating correctly

Ref: ADX-351

The Job Name label in the top right-hand corner of Autobahn keeps track of the currently selected job. However, in certain situations, the label was not updating correctly. When creating new jobs or renaming old jobs, the label would display the previous job name instead, causing confusion. This has now been fixed and the label updates as expected.

Convert PDF To TIFF step incorrectly giving exit code 0 with specific error

Ref: ADX-342

A file with a broken page was creating a memory issue with the ‘Convert PDF To TIFF’ step and was not producing an output tiff. However, the step was reporting back that the conversion was a success. This has now been changed so that the output file is checked after processing. If the output file does not exist, an error is reported correctly.

SharePoint Downloads with path length over 260 causes error

Ref: ADX-335

When running the ‘SharePoint download’ step in Autobahn, if a download would create a file with an output path above 260 chars, the download fails with the message: “The specified path, file name, or both are too long. The fully qualified file name must be less than 260 characters, and the directory name must be less than 248 characters.“ This error has now been addressed and the restriction should no longer occur.

SharePoint steps missing a ‘Continue on Error’ property

Ref: ADX-334

For most Autobahn steps, the Continue on Error property exists to allow the job to continue processing files if a single file fails, which is generally desirable behavior. However, this property was missing in the SharePoint steps, meaning that if a single file failed to transfer to/from SharePoint, the job would error. This useful property has now been added to align with other steps.

Pause function not working as intended

Ref: ADX-333

The option to pause a job was intended to allow a job to be stopped temporarily, and it would start from the last file when restarted. However, the pause would instead start the job from the beginning. This was due to a file incorrectly recording the progress. This has now been fixed and restarting a job after a pause will start from the last file processed previously.

SharePoint Upload - first file in batch fails when folder threshold reached.

Ref: ADX-331

This issue is related to ADX-320. When uploading a batch of files to SharePoint Online, if too many folders already exist on SharePoint, the upload operation would be prohibited.

Deleting steps does not update the ‘Step Properties’

Ref: ADX-327

Whenever a user would delete a job step in the ‘Designer’ tab, Autobahn would automatically select the next step (if any), but the step properties of the deleted jobstep would still show. The designer now refreshes properly so that the newly selected job’s properties are displayed.

Canceling creating a new job causes strange behavior

Ref: ADX-326

If a user canceled creating a new job, and then immediately selected an existing job, Autobahn would mistake the job ID with the canceled job, loading a job that doesn’t exist and causing a UI error. This has now been fixed and the correct job ID will be read, and the job loaded successfully.

PDF/A Option in Standard OCR steps not working

Ref: ADX-324

The PDF/A conversion for Standard OCR steps had an issue and was being skipped during the processing. This has now been fixed, and files should now be output as a PDF/A file if this property is set.

Output File Name %DIRNAME not working in ‘Merge Image to Searchable PDF (Extended)’

Ref: ADX-321

Users can use the %DIRNAME template to name the output file from a merge step after the folder. Unfortunately, there was a bug with this specific merge step that replaced the template with the folder name of the parent folder instead. This has now been fixed and %DIRNAME is now replaced by the correct folder name.

SharePoint Upload - first file in batch fails when threshold reached.

Ref: ADX-320

When uploading a batch of files to SharePoint Online, if too many files already exist on SharePoint, the first file in the batch would fail to upload. All other files would upload successfully. This issue has been addressed and all files should successfully upload.

SharePoint - many file entries are created when accessing SPO sites

Ref: ADX-317

Autobahn was creating many certificate key files when accessing SPO sites, found in the C:\ProgramData\Microsoft\Crypto\RSA\MachineKeys folder. Some files are critical, but many others could be removed after the accessing is complete. These files are now removed automatically at the end of a SharePoint job step when they are no longer needed.

Version 5.5.2111.16

Enhancements

Set the emails sent out from Autobahn back to their original filename

Ref: ADX-315

Previously all files emailed by the Autobahn ‘Send Documents’ step were received in the format “name@timestamp@[email protected]”. This format is important as it holds information about the email that the file needs to be sent to. However, the file can now be renamed after this information is extracted by setting the new “Use Original Filename” property to true. The received file will have the timestamp, email and any extra @ signs removed.

Bug Fixes

Files with # in name not uploading to SharePoint correctly

Ref: ADX-319

When using the SharePoint Upload step in Autobahn, any file that contained a # symbol in its name will lose all text after this symbol, including the extension, when it is uploaded to SharePoint. This has been fixed in the latest version and will no longer be an issue with files named this way.

Emails over 3MB failing to send

Ref: ADX-318

Emails that had an attachment size greater than 3MB were getting a general exception error, which would cause the file to not be sent successfully. The issue causing this was fixed and the email attachment limit size now depends more on the email provider. Any file over 50MB will fail in Autobahn with a “file too large” error, but this limit is higher than most providers allow, so please be aware of these limits for your own provider.

Any File to Searchable PDF (Extended) does not produce a CSV log

Ref: ADX-314

It was discovered that for Any File to PDF (Extended) job steps specifically, the CSV log was not being written to when adding the run data. This has been fixed for this version of Autobahn, so it will produce a CSV log like every other OCR step.

Pages processed in CSV not correct when processing in multicore

Ref: ADX-313

The new feature of counting the pages processed for each file and adding this information to the CSV was giving incorrect data when used in multicore. This issue was fixed in the latest version, so Autobahn will now correctly output the pages processed in the CSV logs as long as the config option “verboseLogs” is enabled.

Console logging during Email steps do not output to the logs

Ref: ADX-312

The way that the email steps in Autobahn were previously called caused some unintended behavior, including the console output not being saved to the log files. After some reworking, these email steps are now called the same way as every other step. The logging is now saved to the log file as expected, and this has also fixed all the other unintended behavior.

DynamicRasterizer and pdfbox version mismatch

Ref: ADX-311

In earlier versions of the Autobahn release that included PDF Compression, some internal components were mismatched, causing certain jobs to fail. This has been fixed in the current version, and we encourage anyone experiencing these issues to upgrade to the latest version.

Version 5.5.2109.30

Enhancements

Add a tool to update Autobahn 5.01 jobdef files

Ref: ADX-309

When moving jobs from Autobahn 5.01 to 5.5, jobdefs relating to jobs that added new fields or changed field IDs will have missing information. This can cause issues, especially with the new field “Embed Font Subset” that is present in a few common steps.

An ‘Upgrade Jobs’ tool has been added to the ‘Help’ tab that will update 5.01 jobs in the ‘jobdef’ folder when run.

Add a new compression step in Autobahn DX

Ref: ADX-306

We have added the new step ‘Modern Compress PDF’. This step has many great options for compression of both image and searchable PDFs. It is a big upgrade on our old compression step, which is now deprecated and will be removed in a later release.

Include total page count per job as additional logging

Ref: ADX-305

A new config option has been added to Autobahn. With ‘verboseLogs’ set to true, the total processed page count will be added to the output log as extra information. There are a few conditions for this to work:

  • This option is for single step jobs, as pages may be processed multiple times in multistep jobs

  • This information will not work with every steptype, as not every step processes in pages

  • The log file path should contain %TIMESTAMP% otherwise data from previous runs may affect the output

Add PowerShell Scripts capability

Ref: ADX-303

Previously, the ‘Custom Script Step’ was unable to run PowerShell scripts directly. This has been addressed in the latest release, and there should be no issues attempting to run PowerShell script as part of a job in Autobahn.

Allow Autobahn email steps to send files in bulk with ‘Send Email’ step

Ref: ADX-299

In the previous version, the Autobahn ‘Send Email’ step would send each file in a separate email. For jobs that included splitting or big volumes, this created a lot of emails. We have now added the option to have multiple attachments in one email. This can be controlled with limits to the total file size or file number attached to the emails that Autobahn sends.

Bug Fixes

PDF page rasterization causes Runtime error

Ref: SDK-180

In rare cases with specific documents, an exception was being thrown intermittently during rasterization of an Extended OCR step. This has been fixed in the latest Aquaforest SDK version, and incorporated into Autobahn.

PDF files containing both visible and hidden text in a page not processed correctly

Ref: SDK-177

The combination of processing documents that contain both visible and hidden text using settings that set ‘RemoveExistingPDFText = true’ and ‘PdfToImageIncludeText = false’ caused the resulting files to be damaged. This rare issue has been fixed completely and should no longer cause any problems in output documents.

Files not moved to error folder if not able to copy to output

Ref: ADX-307

In the rare case that the output folder became unavailable during processing in Autobahn, an error would occur when attempting to copy the completed files over. However, there was no process of sending the original files to the error folder.

Autobahn now attempts to send the contents of the first work folder (or source folder) to the error folder. If this fails, it will disable the deletion of files if set.

Exception thrown when uploading large document to SharePoint On-Prem

Ref: ADX-302, PS-223

An issue was found in the SharePoint Upload step when users used Modern Authentication to upload a very large document, causing the upload to fail. The has been fixed in the latest version and should no longer occur.

The ‘&’ symbol in a SharePoint address causes the job to fail

Ref: ADX-301

SharePoint now allows & symbols to be used in its naming system. However, an error in Autobahn caused jobs to fail when they contained these addresses as it was unable to find the location. This is no longer a problem in the latest version, any location containing a ‘&’ symbol will now be found correctly.

Some PDF steps gave error messages when processing ‘.PDF’ files

Ref: ADX-298

Previously, some Autobahn PDF steps were incorrectly identifying ‘.PDF’ files. The steps would only accept files with a lowercase extension. This has been fixed in the latest version, and the steps are no longer case-sensitive when verifying the file extension.

Version 5.5.2105.28

Bug Fixes

Convert PDF to PDFA step grayed out with Standard license.

Ref: ADX-297

The Convert PDF To PDFA step was newly added in Version 5.5. However, it was only available for users with extended licenses. This was not intended, and it is now available for both standard and extended licenses in the latest version.

%DIRNAME gives the file name and not the directory name for extended jobs

Ref: ADX-296

When using the %DIRNAME template for the output file name for extended jobs, the template would be replaced with the filename instead of the directory name in the output file. This has now been fixed in all future versions going forward.

Version 5.5.2104.30

Upgrading

**
License Key** – This is a major release of Autobahn, and will require a new license key to use. Please contact [email protected] to request a new key.

.Net Framework – This release of Autobahn requires a .Net Framework of 4.7.2 or higher. The Autobahn installer will check this requirement before installation.

Cloud OCR Upgrade – The Cloud OCR steps have been split into Google and Microsoft counterparts. Upgrading jobs with this step will require remaking the job. This affects the following steps:

*Image to Searchable PDF (Cloud), PDF to Searchable PDF (Cloud)***

Note** - Some jobs have been improved and have changed from previous versions. When upgrading, property values may be missing from these steps. We recommend viewing the step in the designer and making sure all the properties are correctly filled out. If you still have issues, deleting and re-adding the step will update the properties to their default values. This affects the following steps:

Read Mailbox, Send Documents, SharePoint Download, SharePoint Upload, Barcode TIFF/PDF

Enhancements

  • Autobahn DX 5.5 contains new OCR engines, including for Extended OCR, which will help improve the accuracy of the output from all OCR steps

  • Added “PDF Recognition to JSON” step

    • This step will automatically extract important data from PDF files in the form of Key/Value pairs

    • No need for training or specifying extraction zones

    • Example document types that work well with this step

      • Invoices, shipping documents, etc.

  • Added “PDF to PDFA” step

  • Added new improved barcode library with enhanced accuracy when reading certain barcode types e.g. QR codes

  • Email and SharePoint libraries updated

    • Now supports OAuth2 Authentication

Give ability to use templates for metadata settings in “Set PDF Properties” step

Ref: ADX-284

Please note that these config options have now been replaced by the new Detect Signatures step. See ADX-468 for more details on this change.

Previously, users could only use static data for metadata input for the “Set PDF Properties” step. Users are now able to use %FILENAME% or %DIRNAME% in the Author, Title, Subject, Creator and Keywords field, and they will be replaced by the input file name or directory name for the output PDF.

Added Encrypted PDF Handling options

Ref: ADX-273

Two config options have been added to Autobahn. The first option, ‘securedPdfHandling’, allows you to ‘pass’ through secured files, or you can ‘move’ or ’copy’ these files to another location. The second option, ‘securePdfOutputLocation’, defines the location to move/copy the secured files to.

Allow users to sort input files by date

Ref: ADX-270

The ability to sort the input file order can now be changed with the new option ‘File Order’ in the ‘Properties’ tab of the Designer. There are UTC and local time variants of the date options, totaling nine options: Alphabetically, Created Date (Ascending), Created Date (Descending), Modified Date (Ascending), and Modified Date (Descending). Note: this setting does not work for “Merge Image to PDF…” steps, the merge and OCR must be done in two separate job steps.

Default input delay set to 5 seconds

Ref: ADX-269

A common issue that users came across when using Autobahn as part of a workflow is that the input file would not be fully uploaded from the last process when it is picked up by Autobahn. This would cause the incomplete file to error when processed. This extra delay should give the last process time to finish properly uploading the file.

Support for conversion of .eml files

Ref: ADX-253

The latest version of BCL, the third-party library we use for conversions, now supports the conversion of .eml files. We updated the library in Autobahn DX version 5.5 and added the filetype to the configuration, so the .eml filetype should now successfully convert.

Improve license key error messages for Cloud steps

Ref: ADX-242

With the implementation of the new Cloud OCR library, we have also updated the error messages returned to users when there are errors with the Google/Microsoft license key input. This should make it easier for the user to diagnose the problem with the job.

Update Email libraries

Ref: ADX-226

The email library has been updated with a new version that address many of the issues with the previous library. It also includes support for Modern Authentication, which can be used in both email alerts and the individual email job steps.

Add Modern Authentication to SharePoint libraries

Ref: ADX-225

The update to the SharePoint library includes support for Modern Authentication, which has been a feature that many users have requested. The jobs have been updated so users can customize their authentication type to one that suits their needs.

Renamed ‘Extract PDF Image via’ option

Ref: ADX-219

This small update to the naming of a job option was made to provide more clarity on the way that the PDF Image would be extracted. The option is now ‘Convert to TIFF’, which will convert a PDF File to a TIFF file before extracting the image. The option if false by default.

Select up to 8 languages in Extended OCR

Ref: ADX-214

The Extended OCR jobs have been updated to allow users to input up to 8 languages, given that they are from the same character set. This change will help users that process multilanguage documents.

Post Job Completion Alerts

Ref: ADX-209

The feature to send an email when a job completes has been in previous versions of Autobahn, but it has now recently been updated with the new email library changes, including the addition of Modern Authentication. On the ‘Modules and Options’ tab, users are able to select their preferred authentication, and these details will be used for each job that has alerts set it the ‘alerts’ tab.

Bug Fixes

  • The deployment executables, including TIFF Junction, can no longer be called directly

  • The support tool has been removed from the ‘Help’ tab

  • “PDF To Searchable PDF (Standard)” can no longer process TIFF files

Subfolders not created in output

Ref: ADX-294

In an early version of 5.5, Autobahn was not creating any subfolders in the output location. This meant that any folder tree that was processed would be output directly to the source file. This bug has now been fixed for future releases of Autobahn.

Barcode zones not being used

Ref: ADX-275

In the previous barcode step, the zonal information was not correctly sent to the barcode executable, so the whole page was always used as the zone. This has now been fixed with the new barcode step, so declaring zones will ignore the other areas of the page when searching for barcodes.

Merge PDF properties controlling processing encrypted documents

Ref: ADX-274

When processing encrypted files in the Merge PDF step, the job was expected to fail. However, some setting would allow the step to succeed. The step has been improved so it can now processes encrypted files. Due to the new encrypted file handling implemented in version 5.5, encrypted files are handled by Autobahn before the merge step.

Email attachments containing the ‘;’ character in their name causes error

Ref: ADX-272

In the rare case that an email attachment’s name contained the ‘;’ character, the file would fail to download. This was a limitation of the previous email library, and is no longer an issue with the new email library.

Version 5.01 200316

Bug Fixes

Office documents always output as Portrait

Ref: ADX-265

When processing Office documents, the output would always be portrait, regardless of the input orientation or the “Paper Orientation” setting in Autobahn. This has been fixed so that both orientations can now be produced as output.

Cloud OCR option ‘Extract PDF Images Via’ default value invalid

Ref: ADX-258

The PDF to Searchable PDF (Cloud OCR) step had an invalid default value of ‘No’ for this setting. This has now been changed to the valid value of ‘Native.

‘Keep original image’ setting available when processing in Native

Ref: ADX-257

In the PDF to Searchable PDF (Extended) step, the setting of ‘Keep original image’ will not take effect if processing with the ‘Native’ method, but the setting could still be changed. This could be confusing to a user, so this option is now grayed out unless processing using the ‘Convert to Tiff’ method.

Binarization Mode property not keeping value

Ref: ADX-256

When saving a PDF to Searchable PDF (Extended) step, the Binarization Mode property would store invalid values, due to the property referencing values of another property instead. This has been fixed in the latest version of Autobahn, so the property saves the correct value.

Contents of ‘temp’ folder not being properly removed

Ref: ADX-255

The main ‘temp’ folder holds job definitions for quick jobs, but extra definitions were generated every time Autobahn was reopened and these were not removed. This has been fixed so only necessary definitions are created, and these definitions are removed after use.

‘Run Continuously’ checkbox not keeping value

Ref: ADX-252

In the previous release of Autobahn, the “Run Continuously” checkbox would not keep its value when switching between jobs. This did not affect the schedule of jobs where the value was switched. This UI bug has been fixed in the new release.

Option “Keep original image” does not save changes and remains blank

Ref: ADX-251

When editing an ‘Any File to Searchable PDF (extended)’ job step, the value for ‘Keep original image’ will reset its value to false and appear blank in the UI. This has been fixed in the latest version of Autobahn.

Image to Searchable PDF (standard) with text output fails to save

Ref: ADX-246

When processing Image files with the step Image to Searchable PDF (standard), using “OCR to TextFile” and “Output File = Plain Text (no PDF)” the file will be processed, but the file will fail to save. This has now been fixed.
Convert to TIFF returning blank PDF forms

Ref: ADX-243

When converting PDF Forms to TIFF files, output files would lose their data, or return blank. This was an issue in a third-party component, and has been fixed in their latest build, which has been included in this build of Autobahn.

OCR Any file to PDF not keeping overwrite setting in GUI

Ref: ADX-240

The previous release of Autobahn DX 5.0 had a GUI bug that did not retain the overwrite setting in the Any file to PDF OCR job. This has been fixed.

Job analyzer not deleting temporary files it creates

Ref: ADX-238

In the previous release of Autobahn DX 5.0, the Job analyzer was not deleting the files it generated in a temp location. This no longer occurs, and the location of the generated files has been changed (see ADX-239).

Job API displays inconsistent behavior with concurrent jobs

Ref: ADX-233

In the previous release of Autobahn DX 5.0, the Job API encountered a problem if two or more jobs attempted to start at the same time and would instead show one job starting twice. We have fixed this issue.

‘Include unprocessed PDFs Only’ generates a PDFBox error

Ref: ADX-229

In the previous release of Autobahn DX 5.0, including this filter setting in a job would cause the error to occur. This was due to a PDFBox version mismatch. This issue has been resolved.

Hangs when processing certain types of PDFs in Native mode

Ref: SDK-135

In the previous release of Autobahn DX 5.0, PDFs that contained recursive code were causing hangs when processed in native mode. This was fixed in the latest PDFBox version, which has been implemented in this build.

Enhancements

SharePoint Step does not support SP2013 OR SP2010

Ref: ADX-261

The SharePoint Upload and Download steps were previously unsupported in SP2013 and SP2010. These steps have now been improved and can function with these versions of SharePoint.

Job summary tool uses job temp for its file processing

Ref: ADX-239

In the previous release of Autobahn DX 5.0, the job summary tool used the User Temp location to store files. These files are now stored in the job temp location.

Improved CPU Core throttling

Ref: ADX-234

Changes have been made in the latest version to improve CPU Core throttling.

Added ‘forcecores’ to allow setting cores manually

There now exists the config option ‘forcecores’, allowing users to tell Autobahn how many cores their machine has directly. The user will still be restricted by the cores allowed on their license and will need to restart the service for the config option to take effect.

Moved ‘Save Options’ in Module & Options tab

Ref: ADX-231

‘Save Options’ has been moved closer to the settings that it relates to, and the text has been changed to ‘Update’ instead.

Merge Image to Searchable PDF (extended)

This OCR step has now been updated to support PNG and BMP files.

Version 5.0.190905

Bug Fixes

Removed WIF 3.5 as a Prerequisite

Ref: ADX-227

The previous release of Autobahn DX 5.0 failed during installation on a Windows Server 2019 system, this was because of the WIF 3.5 prerequisite. This has been replaced.

Fixed the PDF to Searchable PDF (Cloud OCR) Step properties.

Ref: ADX-228

In the previous release of Autobahn DX 5.0, the Convert to TIFF option in was not getting passed to the OCR engine when using the new Cloud OCR step.

Version 5.0.190805

Bug Fixes

SharePoint connector license check failure

Ref: ADX-222

The previous release of Autobahn DX 5.0 failed executing the SharePoint steps, this was because of a bug in the license key validation

Version 5.0.190715

Bug Fixes

Any File to PDF fails when generating PDF/A files from text documents

Ref: ADX-218

The previous release of Autobahn DX 5.0 failed when generating PDF/A files from text documents when using GenericExtension, AutoExtension and AutoExtensionEx. Updating to the latest version of BCL fixes this issue.

Batch Size gets dropped after the first iteration of job from the service

Ref: ADX-220

The previous release of Autobahn DX 5.0 was setting the job filter limit to zero after the job executes for the first time under the service.

Version 5.0.190605

Bug Fixes

CSV Log Files for Paths with Comma(s)

In the previous version of Autobahn DX, the presence of commas (,) in file paths adds unwanted columns in the CSV log file. We have fixed this issue.

Merging PDF Files with Acroforms

In the previous version of Autobahn DX, there was a bug that caused merging of PDF files with Acroforms to fail. We have fixed this issue.

Version 5.0.190430

Upgrading from earlier Versions

Preserving Existing Job Definitions when Upgrading

When Upgrading to a new version of Autobahn DX, your old jobs will not have all the new step properties added. To rectify this issue, open all your old jobs from the Job Manager and save them.

License Key

Autobahn DX 5.0 uses different license keys from the previous versions of Autobahn DX. You will need to request a new license key from Aquaforest: [email protected].

Removed Files

Autobahn DX no longer makes use of the file called ‘emimap4.dll’, which was used in previous versions. If you have upgraded, this file may still exist in the ‘bin’ folder and we recommend that it is deleted.

Enhancements in v5.0

We have made a lot of changes in this version of Autobahn DX; we will discuss these enhancements in this section.

Pause Job

We have now added the ability to resume from Jobs in Autobahn DX if:

  1. The Job is Interrupted By a service crash or power failure.

  2. If you paused the job from the Autobahn DX GUI.

Note: If you make any changes to the Job when it is in a Paused state the job will start from the beginning.

New Job Steps

In Autobahn DX 5.0, we have added to our long list of job steps. This is to give the user more value and options. For more details, check the section 5.7.2 in the Autobahn DX 5.0 reference guide.

Cloud OCR

The optional Cloud OCR module extends Autobahn DX with additional OCR engines from Microsoft and Google, the main advantages of these OCR engines is their Handwriting recognition capabilities. These OCR engines are available as a SAAS model provided by both vendors. Before you can start using these steps in Autobahn DX, you will need to have a subscription first. See chapter 18 of the reference guide for more details.

We have added two step types to the Advanced section of the Job Designer tab of Autobahn DX, the steps are named:

  • Image to Searchable PDF (Cloud OCR)

  • PDF to Searchable PDF (Cloud OCR)

Stamp PDF Files

This step can be used to add stamps to PDF pages, we have given the user the ability to customize these stamps extensively in a very simple manner.

Autobahn DX has different ways to apply stamps to a page, this gives the user some level of flexibility.

  • StampTextAsString: When this operation has selected the text passed as the StampObject will be stamped on the PDF document as text.

  • StampPDFText: When this operation is selected the text passed as the StampObject will be stamped on the PDF document as an image.

  • StampPageNumber: When this operation is selected, every page in the PDF file will be stamped with a page number, starting from the start number. E.g. if StartNumber = 6 the first-page number will start from 6.

  • StampPageNumberBates: When this operation is selected, every page in the PDF file will be stamped with a bate number, starting from the start number. E.g. if StartNumber = 6 the first-page number will start from 000006.

  • StampVariable: This option allows a user to specify a variable like a date, filename or time. The variable specified by the StampObject will be stamped on the document. Check the table below for different Stamp variables provided.

  • StampPDFImage: When this operation is selected the text passed as the StampObject is the address of the image to be stamped on the PDF document.

Any File to Searchable PDF (Extended)

In previous versions of Autobahn DX, we use to have the OCR Any File to PDF (this has changed to Any File to Searchable PDF (Standard)) step. This step converted office files to PDF and performed an OCR on image-based files. This step use to be available only for the Standard OCR engine, in version 5.0 we have added similar step that will use the Extended Engine to OCR image-based files.

Azure Storage Download

We added this new step to allow users to download files from an Azure Storage Container to your local machine. This can be used as part of a workflow in Autobahn DX.

Azure Storage Upload

We added this new step to allow users to upload files to an Azure Storage Container from your local machine. This can be used as part of a workflow in Autobahn DX.

Using these two steps, you can download files from Azure, process them and upload the outputs back to Azure in a single job.

Distributed Polling

This step can be used to implement load balancing in Autobahn DX, it achieves this by copying a fraction of the files from a central input location to the local system where Autobahn DX is running. Multiple Autobahn DX servers can point to one input folder, as a result, the files will be shared across several servers and the processing will be more optimized.

Job API Changes

Remote API Enhancement

Previously you had to install Autobahn DX on the client and server machine in other to call a remote API in Autobahn DX. We have changed this so that you will only need to install Autobahn DX on the server computer.

GetLastRunDate

We have added the method below to the Job API

Public string GetLastRunDate();

Returns the last Date and Time the job executed.

New Alerts Method

We have changed the way alerts are setup to give the user more control over when to send alerts and what to include in the alerts.

Note: If you are upgrading your jobs from a previous version of Autobahn DX and you have alerts setup for the job, you will have to go the Alerts tab in the Job Designer and set up the alerts in your jobs again.

See section 5.2.4 of the reference guide for more details.

OCR Updates

Extended Engine

Autobahn DX 5.0 now has the latest version of the iDRS engine (iDRS 15.4.2) in the Extended OCR module.

Default Values

The default values for a few settings have been changed so that it gives good OCR results for different types of documents. These are shown below:

Setting Changed to
Binarize true
Binarization Mode Adaptive
Brightness 128
Smoothing Level 248
Threshold 0
Work Depth 255
Remove Lines true
New High-Quality OCR engine

The iDRS™ is updated with I.R.I.S.’ brand-new High-Quality OCR: a new OCR engine developed using state of the art concepts from the artificial intelligence research domain.

This new technology brings considerable OCR accuracy improvement especially for bad quality scans, camera images or low-resolution documents, which are affected by common issues such as:

  • Touching characters

  • Broken characters

  • Distorted characters

An example of document text where characters are distorted and stretched.

It will also be suited for recognition of Arabic and Farsi, due to the cursive nature of these languages:

An example of Arabic text.

The first release uses High Quality OCR engine for English, Arabic and Farsi languages; further languages will of course be added in future releases.

  • For Latin, Cyrillic, Greek, Hebrew and Asian languages, High Quality OCR will be combined with existing OCR engine to use the strengths of both engines.

  • For Arabic and Farsi languages, it fully replaces the previous engine, and reaches an unparalleled level of accuracy.

Note that processing time with High Quality OCR engine is expected to increase for low-quality documents: more time will be spent but better accuracy will be reached.

Recognition of images scanned with dithering

This release exposes an option allowing to improve recognition of color or greyscale images scanned with dithering:

An example of a greyscale document with dithering. A small section has been enlarged to show the dithering more clearly.

Previous releases would not have properly processed such images: in most cases, the text would simply not have been detected during page analysis step.

How to use

It can be enabled by setting the Undithering property in the Binarization object. Note that you also need to enable smoothing by setting SmoothingLevel to a value greater than ‘0’ to perform undithering.

Automatic language detection of a single-language page

Extended OCR can now automatically detect the language of an input document.

The aim of this feature is to detect the most probable language of a single-language page.

Supported languages

This release will be able to reliably detect the following scripts/languages:

  • Latin script

English, German, French, Spanish, Italian, Swedish, Danish, Norwegian, Dutch, Portuguese, Galician, Icelandic, Czech, Hungarian, Polish, Romanian, Slovak, Croatian, Slovenian, Finnish,

Turkish, Estonian, Lithuanian, Latvian, Albanian, Catalan, Irish Gaelic, Scottish Gaelic, Basque, Indonesian, Malay, Swahili, Tagalog, Haitian Creole, Kurdish, Cebuano, Ganda, Kinyarwanda, Malagasy, Maltese, Nyanja, Sotho, Sundanese, Welsh, Javanese, Azeri (Latin), Uzbek, Bosnian (Latin), Afrikaans

  • Cyrillic script

Serbian, Russian, Byelorussian, Ukrainian, Macedonian, Bulgarian, Kazakh

  • Greek script

Greek

  • Hebrew script

Hebrew

Future releases will extend the support to Arabic and Asian scripts.

Note:

  • If at least one language has been detected, recognition will be performed in the first language candidate that has been detected, and not in the language(s) set through the OCR Language x property.

  • If it fails to detect a language, recognition will be performed using the language(s) set through the OCR Language x property.

Punch-hole removal

A new feature has been added to the Extended engine that attempts to remove punch holes from pages. This feature only works when converting images to PDFs or when OCRing PDFs with Extract Images Method set to Convert to TIFF and with either Keep Original Image set to false or Keep Punch Hole Removal set to true.

Note: The punch-hole algorithm can be used on images with the following minimum dimensions width: 300px, height: 100px (computed for 300 DPI). The minimum height and width can vary with the image resolution.

Retain pre-processing settings

You can now retain specific pre-processing in the output PDF documents. For instance, if de-speckling is enabled, speckles are removed from each page to improve the OCR recognition, but this is only done internally and are not reflected in the output PDF document.

In this release, if you want to retain the de-speckling in the output document, set Keep Despeckled Image to true. Other pre-processing settings that can be preserved are deskew, dark border removal and punch-hole removal. These can be enabled using Keep Deskewed Image, Keep Dark Border Removal and Keep Punch Hole Removal respectively.

This feature only works when converting images to PDFs or when OCRing PDFs with Extract Image Method set to Convert to TIFF and with Keep Original Image set to false.

Advanced pre-processing settings

This release has new advanced settings for some existing pre-processing settings of the Extended module. These are:

  • AdvancedDeskew

  • AdjustmentMode

  • ForceDeskew

  • AdvancedDespeckle

  • Dilate

New languages available with High-Quality OCR engine

The brand-new technology ‘High-Quality OCR’ now embeds the 3 following languages:

  • Italian

  • Spanish

  • Portuguese

Note also that variants of already existing High-Quality OCR languages are now supported as well: Afrikaans, Brazilian Portuguese, British, Corsican, Frisian, Luxembourgish, Mexican Spanish, Sardinian, and Swiss-German.

Performance improved for page orientation detection on Korean documents

The algorithm used for page orientation detection with Korean language has been reviewed, allowing to drastically reduce processing time while improving a bit the accuracy.

On a set of 132 Korean documents, taken in all possible orientations for a total of 528 test cases:

  • Older versions:

    • Total time for orientation detection: 5,864 seconds

    • Orientation detection accuracy: 96,0%

  • This version:

    • Total time for orientation detection: 971 seconds (divided by a factor 6!)

    • Orientation detection accuracy: 97,3%

Memory consumption reduced for document conversion

The document output engine includes several optimizations regarding memory consumption when creating an output document. Those changes impact mostly the creation of PDF Image-Text and especially PDF iHQC documents.

In terms of peak memory consumption, considering an input image A4 at 600DPI:

  • Older versions:

    • PDF Image-Text: 343 Mb

    • PDF iHQC: 568 Mb

  • This version:

    • PDF Image-Text: 238 Mb

    • PDF iHQC: 359 Mb

Turn off PDF/A validation

In previous versions, PDF/A validation was always performed after converting to PDF/A. However, validating a PDF/A document adds a small performance penalty in terms of the overall processing time. This version allows you to turn off PDF/A validation.

Standard Engine

Default Values

The default values for a few settings have been changed so that it gives good OCR results for different types of documents. These are shown below:

Setting Changed to
SavePreDespeckle true

Step Types that have changed name

For clarity we have changed the names and groupings of our OCR steps in Autobahn DX to represent more clearly what they do. The table below shows the old step names and the corresponding new step.

Old Step Name New Step Name
Convert TIFF to PDF Image To Searchable PDF (Standard)
Extended Convert TIFF to PDF Image To Searchable PDF (Extended)
OCR Image-Only PDF PDF to Searchable PDF (Standard)
Extended OCR Image PDF PDF to Searchable PDF (Extended)
OCR Any File to PDF Any File to Searchable PDF (Standard)
Merge TIFFs to PDF Merge Image to Searchable PDF (Standard)
Extended Merge TIFF to PDF Merge Image to Searchable PDF (Extended)

Delete Empty Input Folders

When users select Delete Input Files or Move to Archive after Processing as the input file post processing action, it is a usual occurrence for a lot of empty folders in the input folder tree to remain. To delete these empty folders, you can use this new setting provided in Autobahn DX 5.0.

A screenshot of the Properties tab, highlighting the tick box that enables the deletion of empty input folders after processing.

CPU Core Licensing and Job Control

Your license key will support a specific number of CPU cores. The product will limit the number of concurrent file processing operations to this number and will “throttle” jobs accordingly.

For example, if a 4-core licensed server is currently running a 2-core job and a new job starts that is configured for 4 cores the number of cores allocated to the second job will be reduced accordingly:

Autobahn DX using 2 cores out of 4 allowed.

We will reduce the number of cores in this job from 4 to 2 allowed.

As another example, if a 4-core licensed server is currently running a 4-core job and a new job starts that is configured for 2 cores then the second job will not be able to start until cores are freed up:

Autobahn DX using 4 processors out of 4 allowed.

We will attempt to start the job 18 time(s) over the next 180 seconds.

The retry interval and number of tries is determined by these two config file settings in Autobahn.config (by default this file is in C:\Aquaforest\Autobahn DX\config)

<add key=“jobqueuetimeout” value=“180” />

<add key=“jobqueueinterval” value=“10”/>

Autobahn DX Directory Changes

We have added a distribution directory to the installation directory of Autobahn DX, this directory will contain the components need for Autobahn DX to function. As a result, we have moved some folders from the top-level folder to the distribution folder, we have also created new folders for other components. The table below shows the details.

Application Old Directory Path New Directory Path
Extended OCR extendedocr distribution /extendedocr
TIFF Junction tj distribution /tj
PDF Junction pj distribution /pj
Cloud OCR (new) - distribution /cloudocr
SharePoint Connector (new) - distribution /sharepoint
Azure Connector (new) - distribution /azure
Support Tool support distribution /support

Bug Fixes

[SDK-120] Graphics state

The graphics state was not being restored when processing pages that require rotation in the Standard OCR engine. This caused issues when other applications manipulated the PDF after it had been OCRed by Aquaforest. This has now been fixed.

Known Issues

Recognition of accented characters with High-Quality OCR engine (Extended OCR module)

The new Extended OCR module currently has an issue that impacts Latin languages processed with HighQuality OCR engine.

When a character with an accent (like é, è, à, ñ, etc.) is recognized but is not present in the character set (for instance if recognition is performed in English), the OCR engine will output a reject character (U+FFFD).

This is a regression compared to previous versions, where the “base” character would be output instead (e.g. ‘e’ instead of ‘é’).

This issue will be fixed with the next release.