Upgrade your Document Automation server easily
Upgrading
Upgrade overview
Two versions of Document Automation Server (DAS) cannot be installed at the same time.
Upgrading DAS will involve:
-
Backing up your job definition files.
-
Uninstalling your current installation.
-
Downloading the latest version from our website.
-
Installing the new version, which should only take a few minutes.
When the DAS user interface (UI) starts, it checks the version number of each of the step types and makes the required updates to any step. A status bar shows the progress of this process. This only happens the first time the job definition is checked by DAS. Once completed, your previous job list will appear in the new version as it was previously.
Updating from older versions (before 5.0) may require either manual opening, editing, and saving of a job, or recreating the job from a new one.
On the install server, DAS requires .NET Framework 4.7.2. The DAS installer will check this requirement before installation.
License keys
The license key will remain from your previous version. For upgrades between minor releases, the license key should still be valid for the newly installed version. If you’re on a new major version (5.0, 5.5, 6.0), the license key will no longer be valid. In this case, contact Support for a new license key.
Version 6.0.2411.11
New steps
New steps have been added to this release based on customer requests. The list of new steps is as follows:
-
Convert PDF To Office — Converts and inputs a PDF file to the selected Office type. Options include
.docx
,.pptx
,.xlsx
, and.svg
. -
Convert Any File To Office — Converts many different input file types to the selected Office type. Email files can also have their attachments converted. Output file options include
.docx
,.pptx
,.xlsx
, and.svg
. -
Get Document Information — Creates a report on the searchability of a PDF document. The report includes information about page count, searchable page count, and image page count. Output options include
.txt
,.csv
,.json
, and.xml
. -
Pattern Enumeration — Selects a pattern and/or list of terms. A
csv.
file report that lists the occurrences of each term in the input PDF is created.
Improvements
Add an option to force a dots per inch (DPI) value when converting an image to PDF
Ref: ADX-774
Previously, if a TIFF file had the incorrect DPI, users faced an issue when converting to PDF. The dimensions of the output PDF file were warped based on the DPI.
With this release, we fixed the issue by adding an option to set ConvertTiffForceDPI
. The option let users select a sensible DPI, so the output will have reasonable dimensions.
KVP expected keys now gives all data points
Ref: ADX-773
Previously, when retrieving an expected key from the Key Value Pair Extraction step, only the key and value would be retrieved.
With this release, it’s possible to retrieve other data points (alongside the key-value pair output) such as:
-
Bounding box for the key and/or the value
-
Page number
-
Confidence score
-
Estimated data type (for example, email address, credit card number, etc.)
Improve license naming when opening ADX for the first time and entering a new key
Ref: ADX-760
When opening Autobahn DX for the first time, you’re required to enter your license key on a prompt that appears. However, the message displayed on that prompt was unclear and confusing.
With this release, we improved the message, so as to remove the areas of ambiguity and ensure clarity.
Bug fixes
GdPicture HTML/EML/MSG conversion issue
Ref: ADX-820 and ADX-818
An update to Google Chrome and Microsoft Edge caused an issue with conversions from .html
, .eml
, and .msg
files when using GdPicture steps. Though we updated the steps to fix the issue for Google Chrome in a previous release, the update for the same for Microsoft Edge was remaining.
With this release, we introduced a new configuration setting called WebBrowserPath
that can point toward a substitute for Portable Chromium, such as blink binaries, to resolve the issue for Microsoft Edge.
Incorrect default value for GdPicture conversion settings
Ref: ADX-811 and ADX-808
In GdPicture conversion settings, the default value for both TrackOfficeDocumentRevisions and SpreadsheetRenderOnlyPrintArea should be No. However, the default value for both of these parameters was set to Yes.
With this release, we updated the default value to No for both of these parameters for all the newly created GdPicture conversion settings.
Page Range does not work for Smart Redaction
Ref: ADX-793
The Smart Redaction step has a Page Range property that will determine on which pages the redaction will be applied. However, there was an issue with the Page Range property, which resulted in all pages being redacted.
With this release, we fixed the issue so that only the pages specified in the Page Range property will be redacted.
Changing the order of steps does not get saved automatically
Ref: ADX-768
Previously, if an existing job had multiple steps, and the steps were moved around, Autobahn DX didn’t recognize the move as a change in the job. As a result, when users left the Designer tab, Autobahn DX didn’t trigger a prompt to save the changes.
With this release, we fixed this issue so that moving steps within a job is seen as a change to the job and hence, Autobahn DX triggers the save prompt.
Version 6.0.2404.10
Improvements
Update documentation on Kingfisher Jobs
Ref: ADX-746
When using Kingfisher jobs, a user requires a separate instance of Kingfisher with a job set up in that program for Autobahn to call. It is important that this job is set to single core, so this point has now been emphasized in the Reference Guide for users.
Designer tab can be slow to load
Ref: ADX-738
When switching to to Designer tab, Autobahn has to load the data from the selected job. The time taken to load this data has now been improved to be much faster, making the user experience when switching between tabs better.
Skip previously OCRed pages in GdPicture OCR step
Ref: ADX-737
Autobahn provided the user with a start time for the entire job, and and end time once all jobs were complete. However, some customers requested more information about when each individual process starts. We have now added this as the Job Start time in the logs.
Start time of each job process is now shown in logging
Ref: ADX-736
Currently, the GdPicture OCR step will OCR every page that it processes. In some cases, customers prefer that previous OCR/hidden text layers are skipped instead. We have now added this choice as the options OCR Searchable Pages and OCR Hidden Text. Setting will prevent text from being overwritten with a new OCR text layer on previously OCRed pages.
Custom step now provided with the error folder as an extra argument
Ref: ADX-733
When using a custom step, Autobahn provides the input and output file locations to the script file. We have now included the error folder as an extra flag send to the script to give users more options when dealing with files that do not process in the custom script as expected.
Custom script step now warns users if custom script file field left blank
Ref: ADX-732
The custom script step needs the name of the script file that it will run. If this field is left empty, a helpful message box will now appear to warn the user that there needs to be a value for the step to function.
Remove reliance on license.txt
Ref: ADX-729
In previous version of Autobahn, the license.txt was checked to see if jobs needed upgrading. It would also be the version number displayed in the Help tab. This has now been changed to use the version number in the program itself. The license.txt will still be maintained for legacy.
Update PDF To Text step to new GdPicture class
Ref: ADX-728
The underlying GdPicture library is always improving, and a specific class for text extraction was added that fit the requirements for the current PDF To Text step perfectly, while improving on performance and adding new settings and features. This change will help keep the step performing at the best level for future GdPicture upgrades as the library is constantly improving.
Add option to replace invalid characters found in barcodes
Ref: ADX-727
When using the Split By Barcode step, it is very common to use the barcode value in the output filename. This can cause problems in the case that the barcode contains characters that are invalid for filenames. In this situation, we now find and replace these characters with an underscore by default, but this can be customised in the step by changing the “Replace Invalid Characters With” field.
Update terms in the High Availability step
Ref: ADX-388
The high availablity step is design to utilize 2 separate instances of Autobahn running on separate hosts. The first host would process the job, and if it ever ran into issues and stopped processing, the second host would take over. The terms refering to the two hosts has been updated to Controller for the processing host, and Replica for the host that monitors and takes over from a Controller that stops processing.
Bug Fixes
When “Remove Hidden Text” is set to “True”, pages with “visible text” are re-OCRed
Ref: SDK-210
An issue was introduced with the previous SDK update that caused pages to be re-OCRed when they should have been skipped. Remove hidden text should not have affected visible text, but these page were treated as needing OCR again. This has now been fixed and visible text is no longer re-OCRed.
Temporary files not deleted when converting email files
Ref: ADX-752
There was an uncommon issue when converting email files to PDF when they contained attachments. A temporary files would be created when processing the attachments, but this file would not be cleaned up after the conversion. It is now handled as expected.
SharePoint step displaying incorrect license information
Ref: ADX-751
The SharePoint Download and Upload steps display the expirtation date of the current Autobahn license. This information was incorrect in previous versions, but will now display the correct date.
Stamp PDF step not returning error messages
Ref: ADX-741
There was an issue with the Stamp PDF step where error messages were not returning if the job failed due to invalid job settings. The validation for the settings has now improved and will return relevant error messages if needed. Also, Page Range will now accept * as a valid input, to bring this in line with other steps.
Send Documents step renaming prevents moving original files after processing
Ref: ADX-740
The Autobahn Send Documents email step has a feature that allows the file to be sent using it’s original filename. This renaming caused the file to not move out of the input folder if the option was also set. This has now been fixed, and you can also use %EMAILNAME% in the Rename Input File settings to set the input filename to it’s original name before moving.
Version 6.0.2311.16
Bug Fixes
Command line returns before command has completed.
Ref: ADX-722
Command line returns before command has completed. This has been fixed in the latest build.
Autobahndx.exe Exit Code returns early before files are in output location
Ref: ADX-720
Sometimes, the job will return an exit code saying that the job has succeeded, but the output has not been copied to the output location. This has now been fixed and exit codes will return after the step is completed.
Version 6.0.2311.03
New Steps
Five new steps have been added to Autobahn DX in this release, which all utilize the GdPicture toolkit’s capabilities, including the new Key-Value Pairs processing engine.
Key Value Pair Extraction
Extracts key-value pairs from PDFs alongside information about their location, data type etc. The step outputs the information in the choice of JSON, CSV, and XML output. The user can also specify Expected Keys, and the closest match for each Expected Key will be output in a separate file.
Pattern Redaction
Redacts text in the input PDF based on a regex expression or terms listed in a text file. The user can also customize the color of the redaction zone.
Pattern Highlight
Adds a highlight to text in the input PDF based on a regex expression or terms listed in a text file. The user can also customize the color and transparency of the highlight.
Split PDF (GdPicture)
Split a PDF based on the criteria specified. The available options are to split by page ranges, bookmarks, or split into single pages.
Split By Barcode
Split a PDF based on the barcodes found. The output files can be named based on the value of each barcode found. After specifying the barcode types, page ranges and regular expressions can further specify the barcodes to split the document on.
Improvements
JSON and XML files can now be converted to PDF
Ref: ADX-673
The ‘Convert Any File To PDF (GdPicture)’ step previously could not process JSON or XML files as they were not supported. Now, they are treated as text files when converted to PDF, so they will now process successfully.
Updated license key messages to better reflect license status
Ref: ADX-661 and ADX-625
In various areas, generic messages were returned that did not fully reflect the characteristics of the license being used. The messages will now depend on the license, and so will give a clearer and more individual message in these cases.
Add SSN and Postal Address support to Smart Redaction
Ref: ADX-599
Improvements to the GdPicture engine have increased the possibilities of the Smart Redaction step, adding two more redaction categories. You can now choose to redact both Social Security Numbers and Postal Addresses from input PDFs.
Source / Destination folder warning popup for SP and Azure Connectors
Ref: ADX-584
Connector steps in Autobahn are unique in the fact that they ignore either the input or output folder of the job. This is because they instead connect to email, SharePoint, or Azure to send/retrieve their files. When adding an email job, a pop-up would mention that the unused input/output folder would be a dummy folder. To avoid confusion, this warning pop-up message has now been extended to both the SharePoint and the Azure steps.
Users can now control threads used in GdPicture OCR
Ref: ADX-565
The GdPicture OCR step is unique in that it uses multiple concurrent threads to process multiple pages simultaneously, drastically increasing speed. However, this also uses more CPU to run these threads. We have added the option to limit the threads so as not to use all the CPU power. If needed we would recommend between 2-8 threads.
Implement fileProcessOrder for merge steps
Ref: ADX-563
We are adding features to the GdPicture merge steps that will bring it in line with the old merge step. This feature allows some control over the order of the merging of files. Currently, all files are processed alphabetically. This option allows files to be merged in a numerical order instead.
Bug Fixes
Fields in the designer not being recognized as changed
Ref: ADX-624
There were a few fields in the designer that would not register the job as changed if they were edited. This would mean that the job would not ask you to save changes when changing tabs without saving. This has now been fixed, and the job will correctly recognize that a job has been changed.
Jobs stuck in ‘Finishing job execution’ after file error
Ref: ADX-591
In very specific circumstances, Autobahn had an intermittent issue when a file had failed during processing. The issue would leave the job in the running state, and so it would be unable to run again until the service was restarted. This issue was investigated and has now been fixed in the latest version.
PDF File fails Compression with PDFInvalidContent status
Ref: ADX-587
This issue was specific to files with a very specific structure. When ‘Remove Formfields’ and ‘Remove JavaScript’ is enabled in Compression step, the file fails to process. This was due to leftover data when the formfields were removed, and has now been fixed.
Generic Error when running Email steps in ADX
Ref: ADX-585
Running Autobahn email steps with Modern Authentication was giving a ‘Generic Error’ for specific setups. The cause of this issue has been resolved and email step should now run as normal.
SharePoint Download fails to download nested files from SPO
Ref: ADX-580
The SharePoint Download step had an issue that prevented it from successfully downloading files stored in subfolders of the target SharePoint location. Only files at the top level would successfully download. This issue has been resolved, and all folders will now download successfully.
OCR Removing visible text during processing
Ref: ADX-578 (SDK-199)
For very specific files, visible text was being removed in addition to hidden text during the OCR process when in native mode, damaging the OCRed file. This issue was found in the Aquaforest SDK, but a fix was developed and has been deployed to all products using this SDK for OCR processing.
Version 6.0.2304.10
Improvements
Improve properties in PDF to Image steps
Ref: ADX-540
The PDF to TIFF/PNG/JPEG steps contain may properties for user customization. We have added a default value of 0 to the following properties: Brightness, Contrast, Saturation, Gamma, Crop Left, Crop Top. We have also renamed the cropping properties to better reflect their purpose. Crop Height and Crop Width determine the output area of the crop, and so have been renamed to Crop Area Height and Crop Area Width. Crop Left and Crop Top determine the location of the crop area from the top left corner, and so have been renamed to Crop Location Left and Crop Location Top.
Improve error messages for invalid page ranges
Ref: ADX-491
In the PDF to TIFF/PNG/JPEG steps, if a document was not in range for the given page range, the error message would simply state that the file failed to save. This has been improved so that the user will be told that the file does not fall within the specified page range, giving a much clearer reason for failure.
Add ‘Order by Int’ option to Combine PDFs step
Ref: ADX-490
A common issue that users come across with sorting is that, by default, file11 comes before file2 alphabetically. This can be frustrating when trying to merge files in a numerical order. The new Combine PDFs GdPicture step now has the option to sort files by number. This sorts files with the same prefix by a number comparison of their numbers, putting file2 before file11.
Improvement for PDF to TIFF (GdPicture) to align with other PDF to Image steps
Ref: ADX-451
Our previous implementation of the ‘PDF to TIFF (GdPicture)’ step did not allow many options for customization to the user. The step has now been rewritten to be more in line with the other PDF to Image steps and now has access to many more properties that users can change depending on their use case.
General Improvements and Maintenance to Autobahn code
Ref: ADX-444
As a part of keeping our products up to date, we review existing code and aim to make improvements when necessary. This change focused on improving the steptype checks at the start of each decision tree, making them slightly faster and easier to update when new steps are added.
Bug Fixes
Extensions filter in Azure download step not working as expected
Ref: ADX-544
The extensions property filters out any files that do not have the given extension so that they are not downloaded. The list of extensions to be picked up are added in as a string separated by commas. e.g. ‘docx,xlsx,pptx’. However, files with extensions that were a part of any of the full extensions in the list would also be picked up. In the example, doc xls ppt files would all be picked up incorrectly. This has now been fixed and only exact matches are picked up.
Azure download step throws error for some special chars
Ref: ADX-539
Azure Blob storage allows for the following special characters to be included in file names: ” * : < > ? \ |
However, these symbols are illegal to use in the Windows file manager. To allow users to download these files, we replace these symbols with a replacement symbol, which can be set as the new Replace Invalid Characters With property. The default value is an underscore.
Periods removed from file name of Cloud steps
Ref: ADX-533
The Cloud steps had an issue where filenames containing multiple periods would have the last period removed along with the subsequent text. For example _file1.2.3.pdf_
would be output as _file1.2.pdf_
. The final text was being treated as a file extension and was replaced with .pdf
. This issue has now been fixed for all future versions.
Folder created as file when downloaded from Azure
Ref: ADX-500
A new issue came from a new setting in Azure Blob Storage. The ‘HNS Enabled’ setting was causing folders to be downloaded as if they were files when set to true. This was due to a change in behavior of methods in the code. This has now been fixed and only files will be downloaded, regardless of the blob settings.
Standard OCR DPI property only available when ‘Convert To TIFF’ set to ‘Yes’
Ref: ADX-477
There was an unintended restriction set on the DPI property in the ‘PDF to Searchable PDF (Standard)’ step. It was only selectable if the ‘Convert to TIFF’ property was set to ‘Yes’. This was not intended as they are independent properties, and so this issue has now been fixed and DPI can always be changed.
Create XML Property File does not escape special XML characters
Ref: ADX-386
There was an issue with the Create XML Property File step where metadata fields were not being escaped. This was an issue if the metadata fields of the input pdf contained any of the following five characters: < > & ‘ “ These would cause the output XML to be invalid. This has now been addressed and the characters are properly escaped before XML creation.
Version 6.0.2302.08
Improvements
Add option to Overwrite signed files with same name
Ref: ADX-448
The new ‘Detect Signatures’ step copy/moves digitally signed files to a user defined folder to preserve the signatures. This step would fail if a file with the same name already existed in the defined folder. The user now has the option to enable overwrite these files.
Add option to create the signed file path folder if it doesn’t exist
Ref: ADX-447
With the new ‘Detect Signatures’ step, users could copy/move signed files to a specific folder to prevent the signing from being broken. Previously, processing the file would fail if the folder did not exist. There is now the option to create the folder automatically if it does not exist.
Review the ‘Used by another process…’ job error’s severity
Ref: ADX-323
Autobahn has the option to move or delete input files after processing. However, if the input file is forced to stay open by another process, Autobahn will be unable to move the file. Previously in this case, the job would error out and would be unable to restart without manual intervention. A new config option ‘ErrorIfDeleteInputFails‘ has now been added. If set to true, the individual file will fail but the job will continue running for other files.
Version 6.0.2301.18
New Steps
We have continued to improve the variety of steps Autobahn can offer by utilizing the functions available in the GdPicture toolkit. We have added a step that allows for automatic Smart Redaction, and a step that with Detect Signatures on pdf files.
**Smart Redaction
**The Smart Redaction step has many options for redacting a variety of sensitive data, including credit card numbers, email addresses, and phone numbers. If a section of text matches the criteria, the visible text will be covered and searchable text removed.
**Detect Signatures
**The Detect Signatures step will analyze pdf files and will act if it detects a signature in a file. It has the option to copy or move the signed file from the processing location to a custom location. If the file was copied, then there is an option to attach the signed copy to the processing file. The signature in the attachment will remain intact even when the base file is processed.
Improvements
Allow users to disable SSL in email steps
Ref: ADX-429
SSL is a security feature that helps establish secure connections when sending emails. However, sometimes emails are only sent internally, so requiring an SSL certificate is may not be required. Though we recommend keeping SSL enabled in most cases, it can now be disabled by setting the config option “secureoption” to None. It is set to Auto by default.
Add step to detect Digital Signatures
Ref: ADX-409
PDF documents containing digital signatures are becoming more common. A signed file will lose its signature when processed by jobs in Autobahn. The new Detect Signatures step can be used at the start of a job to filter these files. They can then be copied/moved to a new location before processing.
Normalize the OCR Dictionary properties
Ref: ADX-402
In the previous release of Autobahn, the GdPicture OCR step had two properties that related to the OCR language option. It used a drop-down box for common languages, and a text box to enter any additional languages. The drop-down box was redundant as you could add as many languages as you wanted into the text box below. We decided to remove the drop-down box for clarity and conciseness.
DPI is not a changeable property for GdPicture PDF to Image steps
Ref: ADX-401
The new GdPicture PDF to Image steps (PDF to JPEG and PDF to PNG) did not include an option to set the DPI. This option was set to 300 by default, which is a higher quality that some people need. It is now a changeable property in the step’s settings, allowing greater flexibility and control of the output.
Add a Search bar for Step Properties
Ref: ADX-385
Some of the available steps in Autobahn have many properties, particularly the ‘Any File to PDF’ steps which have settings for multiple file types. We have now added a search bar above the properties which can be used to jump to a specific property without scrolling.
Bug Fixes
Inconsistency in naming of PDF to Image step property value
Ref: ADX-411
The ‘PDF to JPEG’ and ‘PDF to PNG’ steps both had a small visual issue where the Remove Lines property had a default value of No, but the options only contained Horizontal, Vertical, and None. The values of None has now been change to No for consistency.
%JOBSTATUS% does not return job status in email alerts
Ref: ADX-408
Autobahn jobs can be set to send out email alerts after jobs are finished. These emails contain information specific to the job such as the name, source, and target. However, the status of the job was not being correctly replaced with the specific job value. Instead, it would always return as ‘Status’. This has now been fixed and the emails will give a proper status. Test email alerts now give a ‘Testing’ status.
Version 6.0.2211.18
Upgrading
License Key
This is a major release of Autobahn and will require a new license key to use. If you have current Support and Maintenance Cover (SMC) for a perpetual license or a current subscription license, please contact [email protected] to request a new key. From 31 March 2022 Autobahn DX is only available as a subscription product. Existing permanent Autobahn DX 5.5 licenses will remain valid (and function as permanent licenses). Additional SMC for existing permanent licenses will continue to be available for version 6.0.
.Net Framework
This release of Autobahn requires a .Net Framework of 4.7.2 or higher. The Autobahn installer will check this requirement before installation.
New Steps
The main improvement of this Autobahn release is the inclusion of 11 new steps. These steps all use the GdPicture toolkit, which includes a wide variety of functions and will be constantly improved and updated. A brief description of the new job steps is below:
-
Validate PDFA
Checks PDFs against a PDFA version and gives an error for the file if it does not conform. -
Linearize PDF
Optimizes PDFs for web-viewing, rendering the document one page at a time. -
Convert Any File To PDF (GdPicture)
Able to convert a large variety of file types to PDF. This step does not require an Office installation to process Office files. -
Combine Any File To PDF
Converts a folder of files into PDF and then merges them, so create a single output PDF. This step uses GdPicture and does not require an Office installation to process Office files. -
Combine PDFs
Merges a folder of PDF files to create a single output PDF. -
PDF To JPEG
Converts an input PDF to a JPEG file using the GdPicture toolkit. -
PDF To PNG
Converts an input PDF to a PNG file using the GdPicture toolkit. -
PDF To TIFF (GdPicture)
Converts an input PDF to a TIFF file using the GdPicture toolkit. -
PDF To Text
Extracts the searchable text from the pages of a PDF file and creates an output text file. -
PDF To Searchable PDF (GdPicture)
Carries out Optical Character Recognition on the input PDF using the GdPicture toolkit, creating an invisible searchable text layer over the document. -
Create Pdf Portfolio
Combines a folder of files into an integrated PDF unit. There are a wide range of file types that can be used to create the PDF Portfolio.
GDPicture input file types
The following file types can be used with the Convert Any File to PDF and Combine Any File to PDF steps.
Description | Suffix |
---|---|
Windows bitmap format | BMP |
Microsoft Word (.doc) binary file format | DOC |
Microsoft Word OpenXML | DOCX |
Microsoft Word Macro-Enabled OpenXML format | DOCM |
Enhanced Windows Meta-format | EMF |
Graphics Interchange Format | GIF |
HTML format | HTML |
Icon and cursor format (single or multi page) | ICO |
Joint Photographic Expert Group | JPEG |
Portable Gray-map File | PGM |
Portable Network Graphics Format | PNG |
Portable Pix-map File | PPM |
Microsoft PowerPoint Presentation format | PPTX |
Microsoft PowerPoint Macro-Enabled Presentation format | PPTM |
Rich Text File Format | RTF |
Tagged Image Format | TIFF |
Plain text file | TXT |
Standard Windows Meta-format | WMF |
Microsoft Excel (.xls) binary file format | XLS |
Microsoft Excel Spreadsheet format | XLSX |
Electronic Mail format | EML |
Outlook Item File Formal | MSG |
Scalable Vector Graphics File | SVG |
Device Independent Bitmap format | DIB |
24-bit compressed JPEG Graphic format | JPE |
MIME HTML format | MHTML |
OpenDocument Text file format | ODT |
Portable Bitmap Image file format | PBM |
Picture Exchange image file format | PCX |
TARGA raster graphics format | TGA |
Improvements
The job upgrader tool could be automatic
Ref: ADX-390
A previous issue in Autobahn was upgrading jobs between versions. This was solved with an upgrading tool created for version 5.5. However, the visibility for the tool was low and it was a very manual process. All jobdef files need upgrading if they are not the correct version, so it made sense to make this process automatic. When opening Autobahn, the jobdef folder will now be checked for jobs that need upgrading. The process will be shown in a brief pop-up, but this will be very quick if you only have a few jobs that need to be upgraded.
The BCL Service being turned off gave unhelpful errors
Ref: ADX-353
The Any File To PDF steps that use BCL previously had no check to see if the BCL service was running. This would be an issue if the service was not running, as the job fails, and the error returned would not explain the cause of the issue. The service is now checked to see if it is running, and an informative error message is returned if it is not running.
Custom script examples are not easy to use
Ref: ADX-352
The Custom Script Step allowed users to run their own scripts as part of an Autobahn job. However, the custom folder we provide had example scripts that were old and not easy to use. We recently created new scripts with simpler functions. The examples were added to the custom folder, and there are examples for these new scripts in the Reference Guide.
Custom script documentation needs updating
Ref: ADX-347
The Custom Script section of the Autobahn Reference Guide had become outdated, and the information was both confusing and had links to sites that no longer existed. The documentation has now been updated to provide examples for example scripts with more straightforward functions. Any information that was no longer beneficial has been removed.
When first opening ADX with no license key set, the license prompt did not appear at the front
Ref: ADX-338
When first opening Autobahn, a prompt will appear asking for a license key. However, this prompt did not have priority and would often appear behind other windows. This has now been improved so that the prompt will be brought to the front of all the current windows, so users can more easily see it.
Bug Fixes
Warnings sometimes displayed when switching between jobs
Ref: ADX-362
If the settings of a job required the job to use work folders, a warning will be displayed to alert the user. This alert was incorrectly popping up when switching to a job that does not require work folders, as the check was being carried out while loading the settings for the new job. This has been fixed and will no longer occur.
Non-Latin characters are not rendered correctly in logging
Ref: ADX-360
Files that contained non-Latin characters were not displayed as expected in the logs, instead appearing as question marks. This was due to the character encoding, and this has now been updated so that most common characters will now correctly display in the logs.
Single clicking a job did not properly select the job
Ref: ADX-354 and ADX-345
In the Autobahn Job Manager tab, single clicking a job would highlight that job, but would not select it. This would cause confusion when then clicking the Design tab, as it would take the user to the previously selected job. This issue also had potential to create blank jobs if creating a new job was canceled before saving it. In this version, the jobs now are selected after a single click, as expected. New jobs that are canceled are also cleared from being selected, so blank jobs are no longer an issue.
Job Name label not updating correctly
Ref: ADX-351
The Job Name label in the top right-hand corner of Autobahn keeps track of the currently selected job. However, in certain situations, the label was not updating correctly. When creating new jobs or renaming old jobs, the label would display the previous job name instead, causing confusion. This has now been fixed and the label updates as expected.
Convert PDF To TIFF step incorrectly giving exit code 0 with specific error
Ref: ADX-342
A file with a broken page was creating a memory issue with the ‘Convert PDF To TIFF’ step and was not producing an output tiff. However, the step was reporting back that the conversion was a success. This has now been changed so that the output file is checked after processing. If the output file does not exist, an error is reported correctly.
SharePoint Downloads with path length over 260 causes error
Ref: ADX-335
When running the ‘SharePoint download’ step in Autobahn, if a download would create a file with an output path above 260 chars, the download fails with the message: “The specified path, file name, or both are too long. The fully qualified file name must be less than 260 characters, and the directory name must be less than 248 characters.“ This error has now been addressed and the restriction should no longer occur.
SharePoint steps missing a ‘Continue on Error’ property
Ref: ADX-334
For most Autobahn steps, the Continue on Error property exists to allow the job to continue processing files if a single file fails, which is generally desirable behavior. However, this property was missing in the SharePoint steps, meaning that if a single file failed to transfer to/from SharePoint, the job would error. This useful property has now been added to align with other steps.
Pause function not working as intended
Ref: ADX-333
The option to pause a job was intended to allow a job to be stopped temporarily, and it would start from the last file when restarted. However, the pause would instead start the job from the beginning. This was due to a file incorrectly recording the progress. This has now been fixed and restarting a job after a pause will start from the last file processed previously.
SharePoint Upload - first file in batch fails when folder threshold reached.
Ref: ADX-331
This issue is related to ADX-320. When uploading a batch of files to SharePoint Online, if too many folders already exist on SharePoint, the upload operation would be prohibited.
Deleting steps does not update the ‘Step Properties’
Ref: ADX-327
Whenever a user would delete a job step in the ‘Designer’ tab, Autobahn would automatically select the next step (if any), but the step properties of the deleted jobstep would still show. The designer now refreshes properly so that the newly selected job’s properties are displayed.
Canceling creating a new job causes strange behavior
Ref: ADX-326
If a user canceled creating a new job, and then immediately selected an existing job, Autobahn would mistake the job ID with the canceled job, loading a job that doesn’t exist and causing a UI error. This has now been fixed and the correct job ID will be read, and the job loaded successfully.
PDF/A Option in Standard OCR steps not working
Ref: ADX-324
The PDF/A conversion for Standard OCR steps had an issue and was being skipped during the processing. This has now been fixed, and files should now be output as a PDF/A file if this property is set.
Output File Name %DIRNAME not working in ‘Merge Image to Searchable PDF (Extended)’
Ref: ADX-321
Users can use the %DIRNAME template to name the output file from a merge step after the folder. Unfortunately, there was a bug with this specific merge step that replaced the template with the folder name of the parent folder instead. This has now been fixed and %DIRNAME is now replaced by the correct folder name.
SharePoint Upload - first file in batch fails when threshold reached.
Ref: ADX-320
When uploading a batch of files to SharePoint Online, if too many files already exist on SharePoint, the first file in the batch would fail to upload. All other files would upload successfully. This issue has been addressed and all files should successfully upload.
SharePoint - many file entries are created when accessing SPO sites
Ref: ADX-317
Autobahn was creating many certificate key files when accessing SPO sites, found in the C:\ProgramData\Microsoft\Crypto\RSA\MachineKeys folder. Some files are critical, but many others could be removed after the accessing is complete. These files are now removed automatically at the end of a SharePoint job step when they are no longer needed.
Version 5.5.2111.16
Enhancements
Set the emails sent out from Autobahn back to their original filename
Ref: ADX-315
Previously all files emailed by the Autobahn ‘Send Documents’ step were received in the format “name@timestamp@[email protected]”. This format is important as it holds information about the email that the file needs to be sent to. However, the file can now be renamed after this information is extracted by setting the new “Use Original Filename” property to true. The received file will have the timestamp, email and any extra @ signs removed.
Bug Fixes
Files with # in name not uploading to SharePoint correctly
Ref: ADX-319
When using the SharePoint Upload step in Autobahn, any file that contained a # symbol in its name will lose all text after this symbol, including the extension, when it is uploaded to SharePoint. This has been fixed in the latest version and will no longer be an issue with files named this way.
Emails over 3MB failing to send
Ref: ADX-318
Emails that had an attachment size greater than 3MB were getting a general exception error, which would cause the file to not be sent successfully. The issue causing this was fixed and the email attachment limit size now depends more on the email provider. Any file over 50MB will fail in Autobahn with a “file too large” error, but this limit is higher than most providers allow, so please be aware of these limits for your own provider.
Any File to Searchable PDF (Extended) does not produce a CSV log
Ref: ADX-314
It was discovered that for Any File to PDF (Extended) job steps specifically, the CSV log was not being written to when adding the run data. This has been fixed for this version of Autobahn, so it will produce a CSV log like every other OCR step.
Pages processed in CSV not correct when processing in multicore
Ref: ADX-313
The new feature of counting the pages processed for each file and adding this information to the CSV was giving incorrect data when used in multicore. This issue was fixed in the latest version, so Autobahn will now correctly output the pages processed in the CSV logs as long as the config option “verboseLogs” is enabled.
Console logging during Email steps do not output to the logs
Ref: ADX-312
The way that the email steps in Autobahn were previously called caused some unintended behavior, including the console output not being saved to the log files. After some reworking, these email steps are now called the same way as every other step. The logging is now saved to the log file as expected, and this has also fixed all the other unintended behavior.
DynamicRasterizer and pdfbox version mismatch
Ref: ADX-311
In earlier versions of the Autobahn release that included PDF Compression, some internal components were mismatched, causing certain jobs to fail. This has been fixed in the current version, and we encourage anyone experiencing these issues to upgrade to the latest version.
Version 5.5.2109.30
Enhancements
Add a tool to update Autobahn 5.01 jobdef files
Ref: ADX-309
When moving jobs from Autobahn 5.01 to 5.5, jobdefs relating to jobs that added new fields or changed field IDs will have missing information. This can cause issues, especially with the new field “Embed Font Subset” that is present in a few common steps.
An ‘Upgrade Jobs’ tool has been added to the ‘Help’ tab that will update 5.01 jobs in the ‘jobdef’ folder when run.
Add a new compression step in Autobahn DX
Ref: ADX-306
We have added the new step ‘Modern Compress PDF’. This step has many great options for compression of both image and searchable PDFs. It is a big upgrade on our old compression step, which is now deprecated and will be removed in a later release.
Include total page count per job as additional logging
Ref: ADX-305
A new config option has been added to Autobahn. With ‘verboseLogs’ set to true, the total processed page count will be added to the output log as extra information. There are a few conditions for this to work:
-
This option is for single step jobs, as pages may be processed multiple times in multistep jobs
-
This information will not work with every steptype, as not every step processes in pages
-
The log file path should contain %TIMESTAMP% otherwise data from previous runs may affect the output
Add PowerShell Scripts capability
Ref: ADX-303
Previously, the ‘Custom Script Step’ was unable to run PowerShell scripts directly. This has been addressed in the latest release, and there should be no issues attempting to run PowerShell script as part of a job in Autobahn.
Allow Autobahn email steps to send files in bulk with ‘Send Email’ step
Ref: ADX-299
In the previous version, the Autobahn ‘Send Email’ step would send each file in a separate email. For jobs that included splitting or big volumes, this created a lot of emails. We have now added the option to have multiple attachments in one email. This can be controlled with limits to the total file size or file number attached to the emails that Autobahn sends.
Bug Fixes
PDF page rasterization causes Runtime error
Ref: SDK-180
In rare cases with specific documents, an exception was being thrown intermittently during rasterization of an Extended OCR step. This has been fixed in the latest Aquaforest SDK version, and incorporated into Autobahn.
PDF files containing both visible and hidden text in a page not processed correctly
Ref: SDK-177
The combination of processing documents that contain both visible and hidden text using settings that set ‘RemoveExistingPDFText = true’ and ‘PdfToImageIncludeText = false’ caused the resulting files to be damaged. This rare issue has been fixed completely and should no longer cause any problems in output documents.
Files not moved to error folder if not able to copy to output
Ref: ADX-307
In the rare case that the output folder became unavailable during processing in Autobahn, an error would occur when attempting to copy the completed files over. However, there was no process of sending the original files to the error folder.
Autobahn now attempts to send the contents of the first work folder (or source folder) to the error folder. If this fails, it will disable the deletion of files if set.
Exception thrown when uploading large document to SharePoint On-Prem
Ref: ADX-302, PS-223
An issue was found in the SharePoint Upload step when users used Modern Authentication to upload a very large document, causing the upload to fail. The has been fixed in the latest version and should no longer occur.
The ‘&’ symbol in a SharePoint address causes the job to fail
Ref: ADX-301
SharePoint now allows & symbols to be used in its naming system. However, an error in Autobahn caused jobs to fail when they contained these addresses as it was unable to find the location. This is no longer a problem in the latest version, any location containing a ‘&’ symbol will now be found correctly.
Some PDF steps gave error messages when processing ‘.PDF’ files
Ref: ADX-298
Previously, some Autobahn PDF steps were incorrectly identifying ‘.PDF’ files. The steps would only accept files with a lowercase extension. This has been fixed in the latest version, and the steps are no longer case-sensitive when verifying the file extension.
Version 5.5.2105.28
Bug Fixes
Convert PDF to PDFA step grayed out with Standard license.
Ref: ADX-297
The Convert PDF To PDFA step was newly added in Version 5.5. However, it was only available for users with extended licenses. This was not intended, and it is now available for both standard and extended licenses in the latest version.
%DIRNAME gives the file name and not the directory name for extended jobs
Ref: ADX-296
When using the %DIRNAME template for the output file name for extended jobs, the template would be replaced with the filename instead of the directory name in the output file. This has now been fixed in all future versions going forward.
Version 5.5.2104.30
Upgrading
**
License Key** – This is a major release of Autobahn, and will require a new license key to use. Please contact [email protected] to request a new key.
.Net Framework – This release of Autobahn requires a .Net Framework of 4.7.2 or higher. The Autobahn installer will check this requirement before installation.
Cloud OCR Upgrade – The Cloud OCR steps have been split into Google and Microsoft counterparts. Upgrading jobs with this step will require remaking the job. This affects the following steps:
*Image to Searchable PDF (Cloud), PDF to Searchable PDF (Cloud)***
Note** - Some jobs have been improved and have changed from previous versions. When upgrading, property values may be missing from these steps. We recommend viewing the step in the designer and making sure all the properties are correctly filled out. If you still have issues, deleting and re-adding the step will update the properties to their default values. This affects the following steps:
Read Mailbox, Send Documents, SharePoint Download, SharePoint Upload, Barcode TIFF/PDF
Enhancements
-
Autobahn DX 5.5 contains new OCR engines, including for Extended OCR, which will help improve the accuracy of the output from all OCR steps
-
Added “PDF Recognition to JSON” step
-
This step will automatically extract important data from PDF files in the form of Key/Value pairs
-
No need for training or specifying extraction zones
-
Example document types that work well with this step
-
Invoices, shipping documents, etc.
-
-
-
Added “PDF to PDFA” step
-
Added new improved barcode library with enhanced accuracy when reading certain barcode types e.g. QR codes
-
Email and SharePoint libraries updated
-
Now supports OAuth2 Authentication
-
Give ability to use templates for metadata settings in “Set PDF Properties” step
Ref: ADX-284
Please note that these config options have now been replaced by the new Detect Signatures step. See ADX-468 for more details on this change.
Previously, users could only use static data for metadata input for the “Set PDF Properties” step. Users are now able to use %FILENAME% or %DIRNAME% in the Author, Title, Subject, Creator and Keywords field, and they will be replaced by the input file name or directory name for the output PDF.
Added Encrypted PDF Handling options
Ref: ADX-273
Two config options have been added to Autobahn. The first option, ‘securedPdfHandling’, allows you to ‘pass’ through secured files, or you can ‘move’ or ’copy’ these files to another location. The second option, ‘securePdfOutputLocation’, defines the location to move/copy the secured files to.
Allow users to sort input files by date
Ref: ADX-270
The ability to sort the input file order can now be changed with the new option ‘File Order’ in the ‘Properties’ tab of the Designer. There are UTC and local time variants of the date options, totaling nine options: Alphabetically, Created Date (Ascending), Created Date (Descending), Modified Date (Ascending), and Modified Date (Descending). Note: this setting does not work for “Merge Image to PDF…” steps, the merge and OCR must be done in two separate job steps.
Default input delay set to 5 seconds
Ref: ADX-269
A common issue that users came across when using Autobahn as part of a workflow is that the input file would not be fully uploaded from the last process when it is picked up by Autobahn. This would cause the incomplete file to error when processed. This extra delay should give the last process time to finish properly uploading the file.
Support for conversion of .eml files
Ref: ADX-253
The latest version of BCL, the third-party library we use for conversions, now supports the conversion of .eml files. We updated the library in Autobahn DX version 5.5 and added the filetype to the configuration, so the .eml filetype should now successfully convert.
Improve license key error messages for Cloud steps
Ref: ADX-242
With the implementation of the new Cloud OCR library, we have also updated the error messages returned to users when there are errors with the Google/Microsoft license key input. This should make it easier for the user to diagnose the problem with the job.
Update Email libraries
Ref: ADX-226
The email library has been updated with a new version that address many of the issues with the previous library. It also includes support for Modern Authentication, which can be used in both email alerts and the individual email job steps.
Add Modern Authentication to SharePoint libraries
Ref: ADX-225
The update to the SharePoint library includes support for Modern Authentication, which has been a feature that many users have requested. The jobs have been updated so users can customize their authentication type to one that suits their needs.
Renamed ‘Extract PDF Image via’ option
Ref: ADX-219
This small update to the naming of a job option was made to provide more clarity on the way that the PDF Image would be extracted. The option is now ‘Convert to TIFF’, which will convert a PDF File to a TIFF file before extracting the image. The option if false by default.
Select up to 8 languages in Extended OCR
Ref: ADX-214
The Extended OCR jobs have been updated to allow users to input up to 8 languages, given that they are from the same character set. This change will help users that process multilanguage documents.
Post Job Completion Alerts
Ref: ADX-209
The feature to send an email when a job completes has been in previous versions of Autobahn, but it has now recently been updated with the new email library changes, including the addition of Modern Authentication. On the ‘Modules and Options’ tab, users are able to select their preferred authentication, and these details will be used for each job that has alerts set it the ‘alerts’ tab.
Bug Fixes
-
The deployment executables, including TIFF Junction, can no longer be called directly
-
The support tool has been removed from the ‘Help’ tab
-
“PDF To Searchable PDF (Standard)” can no longer process TIFF files
Subfolders not created in output
Ref: ADX-294
In an early version of 5.5, Autobahn was not creating any subfolders in the output location. This meant that any folder tree that was processed would be output directly to the source file. This bug has now been fixed for future releases of Autobahn.
Barcode zones not being used
Ref: ADX-275
In the previous barcode step, the zonal information was not correctly sent to the barcode executable, so the whole page was always used as the zone. This has now been fixed with the new barcode step, so declaring zones will ignore the other areas of the page when searching for barcodes.
Merge PDF properties controlling processing encrypted documents
Ref: ADX-274
When processing encrypted files in the Merge PDF step, the job was expected to fail. However, some setting would allow the step to succeed. The step has been improved so it can now processes encrypted files. Due to the new encrypted file handling implemented in version 5.5, encrypted files are handled by Autobahn before the merge step.
Email attachments containing the ‘;’ character in their name causes error
Ref: ADX-272
In the rare case that an email attachment’s name contained the ‘;’ character, the file would fail to download. This was a limitation of the previous email library, and is no longer an issue with the new email library.
Version 5.01 200316
Bug Fixes
Office documents always output as Portrait
Ref: ADX-265
When processing Office documents, the output would always be portrait, regardless of the input orientation or the “Paper Orientation” setting in Autobahn. This has been fixed so that both orientations can now be produced as output.
Cloud OCR option ‘Extract PDF Images Via’ default value invalid
Ref: ADX-258
The PDF to Searchable PDF (Cloud OCR) step had an invalid default value of ‘No’ for this setting. This has now been changed to the valid value of ‘Native.
‘Keep original image’ setting available when processing in Native
Ref: ADX-257
In the PDF to Searchable PDF (Extended) step, the setting of ‘Keep original image’ will not take effect if processing with the ‘Native’ method, but the setting could still be changed. This could be confusing to a user, so this option is now grayed out unless processing using the ‘Convert to Tiff’ method.
Binarization Mode property not keeping value
Ref: ADX-256
When saving a PDF to Searchable PDF (Extended) step, the Binarization Mode property would store invalid values, due to the property referencing values of another property instead. This has been fixed in the latest version of Autobahn, so the property saves the correct value.
Contents of ‘temp’ folder not being properly removed
Ref: ADX-255
The main ‘temp’ folder holds job definitions for quick jobs, but extra definitions were generated every time Autobahn was reopened and these were not removed. This has been fixed so only necessary definitions are created, and these definitions are removed after use.
‘Run Continuously’ checkbox not keeping value
Ref: ADX-252
In the previous release of Autobahn, the “Run Continuously” checkbox would not keep its value when switching between jobs. This did not affect the schedule of jobs where the value was switched. This UI bug has been fixed in the new release.
Option “Keep original image” does not save changes and remains blank
Ref: ADX-251
When editing an ‘Any File to Searchable PDF (extended)’ job step, the value for ‘Keep original image’ will reset its value to false and appear blank in the UI. This has been fixed in the latest version of Autobahn.
Image to Searchable PDF (standard) with text output fails to save
Ref: ADX-246
When processing Image files with the step Image to Searchable PDF (standard), using “OCR to TextFile” and “Output File = Plain Text (no PDF)” the file will be processed, but the file will fail to save. This has now been fixed.
Convert to TIFF returning blank PDF forms
Ref: ADX-243
When converting PDF Forms to TIFF files, output files would lose their data, or return blank. This was an issue in a third-party component, and has been fixed in their latest build, which has been included in this build of Autobahn.
OCR Any file to PDF not keeping overwrite setting in GUI
Ref: ADX-240
The previous release of Autobahn DX 5.0 had a GUI bug that did not retain the overwrite setting in the Any file to PDF OCR job. This has been fixed.
Job analyzer not deleting temporary files it creates
Ref: ADX-238
In the previous release of Autobahn DX 5.0, the Job analyzer was not deleting the files it generated in a temp location. This no longer occurs, and the location of the generated files has been changed (see ADX-239).
Job API displays inconsistent behavior with concurrent jobs
Ref: ADX-233
In the previous release of Autobahn DX 5.0, the Job API encountered a problem if two or more jobs attempted to start at the same time and would instead show one job starting twice. We have fixed this issue.
‘Include unprocessed PDFs Only’ generates a PDFBox error
Ref: ADX-229
In the previous release of Autobahn DX 5.0, including this filter setting in a job would cause the error to occur. This was due to a PDFBox version mismatch. This issue has been resolved.
Hangs when processing certain types of PDFs in Native mode
Ref: SDK-135
In the previous release of Autobahn DX 5.0, PDFs that contained recursive code were causing hangs when processed in native mode. This was fixed in the latest PDFBox version, which has been implemented in this build.
Enhancements
SharePoint Step does not support SP2013 OR SP2010
Ref: ADX-261
The SharePoint Upload and Download steps were previously unsupported in SP2013 and SP2010. These steps have now been improved and can function with these versions of SharePoint.
Job summary tool uses job temp for its file processing
Ref: ADX-239
In the previous release of Autobahn DX 5.0, the job summary tool used the User Temp location to store files. These files are now stored in the job temp location.
Improved CPU Core throttling
Ref: ADX-234
Changes have been made in the latest version to improve CPU Core throttling.
Added ‘forcecores’ to allow setting cores manually
There now exists the config option ‘forcecores’, allowing users to tell Autobahn how many cores their machine has directly. The user will still be restricted by the cores allowed on their license and will need to restart the service for the config option to take effect.
Moved ‘Save Options’ in Module & Options tab
Ref: ADX-231
‘Save Options’ has been moved closer to the settings that it relates to, and the text has been changed to ‘Update’ instead.
Merge Image to Searchable PDF (extended)
This OCR step has now been updated to support PNG and BMP files.
Version 5.0.190905
Bug Fixes
Removed WIF 3.5 as a Prerequisite
Ref: ADX-227
The previous release of Autobahn DX 5.0 failed during installation on a Windows Server 2019 system, this was because of the WIF 3.5 prerequisite. This has been replaced.
Fixed the PDF to Searchable PDF (Cloud OCR) Step properties.
Ref: ADX-228
In the previous release of Autobahn DX 5.0, the Convert to TIFF option in was not getting passed to the OCR engine when using the new Cloud OCR step.
Version 5.0.190805
Bug Fixes
SharePoint connector license check failure
Ref: ADX-222
The previous release of Autobahn DX 5.0 failed executing the SharePoint steps, this was because of a bug in the license key validation
Version 5.0.190715
Bug Fixes
Any File to PDF fails when generating PDF/A files from text documents
Ref: ADX-218
The previous release of Autobahn DX 5.0 failed when generating PDF/A files from text documents when using GenericExtension, AutoExtension and AutoExtensionEx. Updating to the latest version of BCL fixes this issue.
Batch Size gets dropped after the first iteration of job from the service
Ref: ADX-220
The previous release of Autobahn DX 5.0 was setting the job filter limit to zero after the job executes for the first time under the service.
Version 5.0.190605
Bug Fixes
CSV Log Files for Paths with Comma(s)
In the previous version of Autobahn DX, the presence of commas (,) in file paths adds unwanted columns in the CSV log file. We have fixed this issue.
Merging PDF Files with Acroforms
In the previous version of Autobahn DX, there was a bug that caused merging of PDF files with Acroforms to fail. We have fixed this issue.
Version 5.0.190430
Upgrading from earlier Versions
-
This release requires version 4.5.2 of the .NET framework. The setup will check whether they are installed on your system and if not, will take you to the appropriate Microsoft site to download and install.
-
To upgrade from earlier versions, request a new license key from Aquaforest: [email protected].
-
Upgrade blog: http://www.aquaforest.com/wp/index.php/upgrading-autobahn-dx-server/
Preserving Existing Job Definitions when Upgrading
When Upgrading to a new version of Autobahn DX, your old jobs will not have all the new step properties added. To rectify this issue, open all your old jobs from the Job Manager and save them.
License Key
Autobahn DX 5.0 uses different license keys from the previous versions of Autobahn DX. You will need to request a new license key from Aquaforest: [email protected].
Removed Files
Autobahn DX no longer makes use of the file called ‘emimap4.dll’, which was used in previous versions. If you have upgraded, this file may still exist in the ‘bin’ folder and we recommend that it is deleted.
Enhancements in v5.0
We have made a lot of changes in this version of Autobahn DX; we will discuss these enhancements in this section.
Pause Job
We have now added the ability to resume from Jobs in Autobahn DX if:
-
The Job is Interrupted By a service crash or power failure.
-
If you paused the job from the Autobahn DX GUI.
Note: If you make any changes to the Job when it is in a Paused state the job will start from the beginning.
New Job Steps
In Autobahn DX 5.0, we have added to our long list of job steps. This is to give the user more value and options. For more details, check the section 5.7.2 in the Autobahn DX 5.0 reference guide.
Cloud OCR
The optional Cloud OCR module extends Autobahn DX with additional OCR engines from Microsoft and Google, the main advantages of these OCR engines is their Handwriting recognition capabilities. These OCR engines are available as a SAAS model provided by both vendors. Before you can start using these steps in Autobahn DX, you will need to have a subscription first. See chapter 18 of the reference guide for more details.
We have added two step types to the Advanced section of the Job Designer tab of Autobahn DX, the steps are named:
-
Image to Searchable PDF (Cloud OCR)
-
PDF to Searchable PDF (Cloud OCR)
Stamp PDF Files
This step can be used to add stamps to PDF pages, we have given the user the ability to customize these stamps extensively in a very simple manner.
Autobahn DX has different ways to apply stamps to a page, this gives the user some level of flexibility.
-
StampTextAsString: When this operation has selected the text passed as the StampObject will be stamped on the PDF document as text.
-
StampPDFText: When this operation is selected the text passed as the StampObject will be stamped on the PDF document as an image.
-
StampPageNumber: When this operation is selected, every page in the PDF file will be stamped with a page number, starting from the start number. E.g. if StartNumber = 6 the first-page number will start from 6.
-
StampPageNumberBates: When this operation is selected, every page in the PDF file will be stamped with a bate number, starting from the start number. E.g. if StartNumber = 6 the first-page number will start from 000006.
-
StampVariable: This option allows a user to specify a variable like a date, filename or time. The variable specified by the StampObject will be stamped on the document. Check the table below for different Stamp variables provided.
-
StampPDFImage: When this operation is selected the text passed as the StampObject is the address of the image to be stamped on the PDF document.
Any File to Searchable PDF (Extended)
In previous versions of Autobahn DX, we use to have the OCR Any File to PDF (this has changed to Any File to Searchable PDF (Standard)) step. This step converted office files to PDF and performed an OCR on image-based files. This step use to be available only for the Standard OCR engine, in version 5.0 we have added similar step that will use the Extended Engine to OCR image-based files.
Azure Storage Download
We added this new step to allow users to download files from an Azure Storage Container to your local machine. This can be used as part of a workflow in Autobahn DX.
Azure Storage Upload
We added this new step to allow users to upload files to an Azure Storage Container from your local machine. This can be used as part of a workflow in Autobahn DX.
Using these two steps, you can download files from Azure, process them and upload the outputs back to Azure in a single job.
Distributed Polling
This step can be used to implement load balancing in Autobahn DX, it achieves this by copying a fraction of the files from a central input location to the local system where Autobahn DX is running. Multiple Autobahn DX servers can point to one input folder, as a result, the files will be shared across several servers and the processing will be more optimized.
Job API Changes
Remote API Enhancement
Previously you had to install Autobahn DX on the client and server machine in other to call a remote API in Autobahn DX. We have changed this so that you will only need to install Autobahn DX on the server computer.
GetLastRunDate
We have added the method below to the Job API
Public string GetLastRunDate();
Returns the last Date and Time the job executed.
New Alerts Method
We have changed the way alerts are setup to give the user more control over when to send alerts and what to include in the alerts.
Note: If you are upgrading your jobs from a previous version of Autobahn DX and you have alerts setup for the job, you will have to go the Alerts tab in the Job Designer and set up the alerts in your jobs again.
See section 5.2.4 of the reference guide for more details.
OCR Updates
Extended Engine
Autobahn DX 5.0 now has the latest version of the iDRS engine (iDRS 15.4.2) in the Extended OCR module.
Default Values
The default values for a few settings have been changed so that it gives good OCR results for different types of documents. These are shown below:
Setting | Changed to |
---|---|
Binarize | true |
Binarization Mode | Adaptive |
Brightness | 128 |
Smoothing Level | 248 |
Threshold | 0 |
Work Depth | 255 |
Remove Lines | true |
New High-Quality OCR engine
The iDRS™ is updated with I.R.I.S.’ brand-new High-Quality OCR: a new OCR engine developed using state of the art concepts from the artificial intelligence research domain.
This new technology brings considerable OCR accuracy improvement especially for bad quality scans, camera images or low-resolution documents, which are affected by common issues such as:
-
Touching characters
-
Broken characters
-
Distorted characters
It will also be suited for recognition of Arabic and Farsi, due to the cursive nature of these languages:
The first release uses High Quality OCR engine for English, Arabic and Farsi languages; further languages will of course be added in future releases.
-
For Latin, Cyrillic, Greek, Hebrew and Asian languages, High Quality OCR will be combined with existing OCR engine to use the strengths of both engines.
-
For Arabic and Farsi languages, it fully replaces the previous engine, and reaches an unparalleled level of accuracy.
Note that processing time with High Quality OCR engine is expected to increase for low-quality documents: more time will be spent but better accuracy will be reached.
Recognition of images scanned with dithering
This release exposes an option allowing to improve recognition of color or greyscale images scanned with dithering:
Previous releases would not have properly processed such images: in most cases, the text would simply not have been detected during page analysis step.
How to use
It can be enabled by setting the Undithering property in the Binarization object. Note that you also need to enable smoothing by setting SmoothingLevel to a value greater than ‘0’ to perform undithering.
Automatic language detection of a single-language page
Extended OCR can now automatically detect the language of an input document.
The aim of this feature is to detect the most probable language of a single-language page.
Supported languages
This release will be able to reliably detect the following scripts/languages:
-
Latin script
English, German, French, Spanish, Italian, Swedish, Danish, Norwegian, Dutch, Portuguese, Galician, Icelandic, Czech, Hungarian, Polish, Romanian, Slovak, Croatian, Slovenian, Finnish,
Turkish, Estonian, Lithuanian, Latvian, Albanian, Catalan, Irish Gaelic, Scottish Gaelic, Basque, Indonesian, Malay, Swahili, Tagalog, Haitian Creole, Kurdish, Cebuano, Ganda, Kinyarwanda, Malagasy, Maltese, Nyanja, Sotho, Sundanese, Welsh, Javanese, Azeri (Latin), Uzbek, Bosnian (Latin), Afrikaans
-
Cyrillic script
Serbian, Russian, Byelorussian, Ukrainian, Macedonian, Bulgarian, Kazakh
-
Greek script
Greek
-
Hebrew script
Hebrew
Future releases will extend the support to Arabic and Asian scripts.
Note:
-
If at least one language has been detected, recognition will be performed in the first language candidate that has been detected, and not in the language(s) set through the OCR Language x property.
-
If it fails to detect a language, recognition will be performed using the language(s) set through the OCR Language x property.
Punch-hole removal
A new feature has been added to the Extended engine that attempts to remove punch holes from pages. This feature only works when converting images to PDFs or when OCRing PDFs with Extract Images Method set to Convert to TIFF and with either Keep Original Image set to false or Keep Punch Hole Removal set to true.
Note: The punch-hole algorithm can be used on images with the following minimum dimensions width: 300px, height: 100px (computed for 300 DPI). The minimum height and width can vary with the image resolution.
Retain pre-processing settings
You can now retain specific pre-processing in the output PDF documents. For instance, if de-speckling is enabled, speckles are removed from each page to improve the OCR recognition, but this is only done internally and are not reflected in the output PDF document.
In this release, if you want to retain the de-speckling in the output document, set Keep Despeckled Image to true. Other pre-processing settings that can be preserved are deskew, dark border removal and punch-hole removal. These can be enabled using Keep Deskewed Image, Keep Dark Border Removal and Keep Punch Hole Removal respectively.
This feature only works when converting images to PDFs or when OCRing PDFs with Extract Image Method set to Convert to TIFF and with Keep Original Image set to false.
Advanced pre-processing settings
This release has new advanced settings for some existing pre-processing settings of the Extended module. These are:
-
AdvancedDeskew
-
AdjustmentMode
-
ForceDeskew
-
AdvancedDespeckle
-
Dilate
New languages available with High-Quality OCR engine
The brand-new technology ‘High-Quality OCR’ now embeds the 3 following languages:
-
Italian
-
Spanish
-
Portuguese
Note also that variants of already existing High-Quality OCR languages are now supported as well: Afrikaans, Brazilian Portuguese, British, Corsican, Frisian, Luxembourgish, Mexican Spanish, Sardinian, and Swiss-German.
Performance improved for page orientation detection on Korean documents
The algorithm used for page orientation detection with Korean language has been reviewed, allowing to drastically reduce processing time while improving a bit the accuracy.
On a set of 132 Korean documents, taken in all possible orientations for a total of 528 test cases:
-
Older versions:
-
Total time for orientation detection: 5,864 seconds
-
Orientation detection accuracy: 96,0%
-
-
This version:
-
Total time for orientation detection: 971 seconds (divided by a factor 6!)
-
Orientation detection accuracy: 97,3%
-
Memory consumption reduced for document conversion
The document output engine includes several optimizations regarding memory consumption when creating an output document. Those changes impact mostly the creation of PDF Image-Text and especially PDF iHQC documents.
In terms of peak memory consumption, considering an input image A4 at 600DPI:
-
Older versions:
-
PDF Image-Text: 343 Mb
-
PDF iHQC: 568 Mb
-
-
This version:
-
PDF Image-Text: 238 Mb
-
PDF iHQC: 359 Mb
-
Turn off PDF/A validation
In previous versions, PDF/A validation was always performed after converting to PDF/A. However, validating a PDF/A document adds a small performance penalty in terms of the overall processing time. This version allows you to turn off PDF/A validation.
Standard Engine
Default Values
The default values for a few settings have been changed so that it gives good OCR results for different types of documents. These are shown below:
Setting | Changed to |
---|---|
SavePreDespeckle | true |
Step Types that have changed name
For clarity we have changed the names and groupings of our OCR steps in Autobahn DX to represent more clearly what they do. The table below shows the old step names and the corresponding new step.
Old Step Name | New Step Name |
---|---|
Convert TIFF to PDF | Image To Searchable PDF (Standard) |
Extended Convert TIFF to PDF | Image To Searchable PDF (Extended) |
OCR Image-Only PDF | PDF to Searchable PDF (Standard) |
Extended OCR Image PDF | PDF to Searchable PDF (Extended) |
OCR Any File to PDF | Any File to Searchable PDF (Standard) |
Merge TIFFs to PDF | Merge Image to Searchable PDF (Standard) |
Extended Merge TIFF to PDF | Merge Image to Searchable PDF (Extended) |
Delete Empty Input Folders
When users select Delete Input Files or Move to Archive after Processing as the input file post processing action, it is a usual occurrence for a lot of empty folders in the input folder tree to remain. To delete these empty folders, you can use this new setting provided in Autobahn DX 5.0.
CPU Core Licensing and Job Control
Your license key will support a specific number of CPU cores. The product will limit the number of concurrent file processing operations to this number and will “throttle” jobs accordingly.
For example, if a 4-core licensed server is currently running a 2-core job and a new job starts that is configured for 4 cores the number of cores allocated to the second job will be reduced accordingly:
Autobahn DX using 2 cores out of 4 allowed.
We will reduce the number of cores in this job from 4 to 2 allowed.
As another example, if a 4-core licensed server is currently running a 4-core job and a new job starts that is configured for 2 cores then the second job will not be able to start until cores are freed up:
Autobahn DX using 4 processors out of 4 allowed.
We will attempt to start the job 18 time(s) over the next 180 seconds.
The retry interval and number of tries is determined by these two config file settings in Autobahn.config (by default this file is in C:\Aquaforest\Autobahn DX\config)
<add key=“jobqueuetimeout” value=“180” />
<add key=“jobqueueinterval” value=“10”/>
Autobahn DX Directory Changes
We have added a distribution directory to the installation directory of Autobahn DX, this directory will contain the components need for Autobahn DX to function. As a result, we have moved some folders from the top-level folder to the distribution folder, we have also created new folders for other components. The table below shows the details.
Application | Old Directory Path | New Directory Path |
---|---|---|
Extended OCR | extendedocr | distribution /extendedocr |
TIFF Junction | tj | distribution /tj |
PDF Junction | pj | distribution /pj |
Cloud OCR (new) | - | distribution /cloudocr |
SharePoint Connector (new) | - | distribution /sharepoint |
Azure Connector (new) | - | distribution /azure |
Support Tool | support | distribution /support |
Bug Fixes
[SDK-120] Graphics state
The graphics state was not being restored when processing pages that require rotation in the Standard OCR engine. This caused issues when other applications manipulated the PDF after it had been OCRed by Aquaforest. This has now been fixed.
Known Issues
Recognition of accented characters with High-Quality OCR engine (Extended OCR module)
The new Extended OCR module currently has an issue that impacts Latin languages processed with HighQuality OCR engine.
When a character with an accent (like é, è, à, ñ, etc.) is recognized but is not present in the character set (for instance if recognition is performed in English), the OCR engine will output a reject character (U+FFFD).
This is a regression compared to previous versions, where the “base” character would be output instead (e.g. ‘e’ instead of ‘é’).
This issue will be fixed with the next release.