Product Overview and Concepts
Overview
Autobahn DX is a Document Processing product designed to fit into an organization’s document workflow.
Autobahn DX provides high performance automated OCR and Conversion of a variety of different input document formats including TIFF images, PDF Files, Microsoft Office documents and HTML pages.
The Server editions expand on this capability by providing the ability to create multi-step jobs – combining different steps together – that can be scheduled automatically to run.
This can provide a “hot folder” capability in Windows File System, SharePoint (Online/Office365/On-Premises) or Azure Blob storage.
The output files can then be saved to Windows File System, SharePoint (Online/Office365/On-Premises) or Azure Blob storage.
Edition Comparison
The Table below shows you the difference between the different editions of Autobahn DX
Edition Comparison | Workstation | Server with Standard OCR | Server with Extended OCR |
---|---|---|---|
Convert TIFF, JPEG, BMP, PNG, GIF to PDF + OCR | ✓ | ✓ | ✓ |
Convert MS Office & Open Files to PDF | ✓ | ✓ | ✓ |
OCR Image Only PDF | ✓ | ✓ | ✓ |
Merge TIFF, JPEG, BMP, PNG, GIF to TIFF/PDF | ✓ | ✓ | ✓ |
Split/Merge PDF Files | ✓ | ✓ | ✓ |
Extract Text from PDF Files | ✓ | ✓ | ✓ |
Set PDF Properties | ✓ | ✓ | ✓ |
Pre-Processing (Deskew, Despeckle, Line Removal) and Auto-Rotation | ✓ | ✓ | ✓ |
OCR Language support | 23 | 23 | 129 |
TXT, RTF & HTML Output options | ✓ | ✓ | ✓ |
Advanced MRC & JBIG2 Compression on output PDFs | ✓ | ✓ | ✓ |
Processing Source PDFs with Passwords | ✓ | ✓ | ✓ |
Setting PDF Security on Output Files | ✓ | ✓ | ✓ |
Split/Rename by Barcode | ✓ | ✓ | ✓ |
Handwriting recognition via Google and Microsoft Cloud APIs | ✓ | ✓ | |
Load Balancing/ Distributed Polling across Multiple Autobahn DX Instances | ✓ | ✓ | |
Pause and Restart Jobs | ✓ | ✓ | |
Azure Storage Support | ✓ | ✓ | |
Multicore (up to a maximum of 64 cores) | ✓ | ✓ | |
Watched Folders, Scheduled Tasks, and Windows Service Support | ✓ | ✓ | |
Multistep Jobs, allowing the creation of workflows | ✓ | ✓ | |
SharePoint/Office 365 Upload/Download | ✓ | ✓ | |
Read/Send Emails using IMAP4 and Basic or OAUTH2 authentication | ✓ | ✓ | |
Use of Custom Scripts | ✓ | ✓ | |
High Availability | ✓ | ✓ | |
Integration with Aquaforest Kingfisher | ✓ | ✓ | |
XML-based Job Definitions | ✓ | ✓ | |
.NET API | ✓ | ✓ | |
Run Jobs via Command Line | ✓ | ✓ | |
GDPicture OCR engine | ✓ | ✓ | |
IRIS Extended OCR engine | ✓ | ||
Improved recognition of poorer quality documents | ✓ | ||
Support for multiple languages within a single document from the same character set | ✓ | ||
Asian Language Support | ✓ | ||
Arabic Language Support | ✓ | ||
Intelligent High-Quality Compression) | ✓ | ||
Multiple document output formats: PDF, DOCX, WORDML, RTF, CSV, XLSX, EXCELML, TXT, HTML and XPS | ✓ |
Autobahn DX Architecture and Concepts
Autobahn Administrator
This is a Windows application that provides the primary administration interface.
This application has the following sections.
Quick Job
Jobs can be defined and run interactively which provides a convenient method for testing the product’s capabilities and running simple jobs.
Job Manager
This provides a method of managing the scheduled jobs previously defined in the Designer. They can be selected and then deleted, copied, edited (in the Designer) or scheduled.
Designer
This provides a graphical interface to allow the creation of a series of steps that make up a job. The process specification can then be saved (as an XML Job File) and run or scheduled via the Job Manager section of the application.
Monitor
This shows the current service status and displays the Job Status of currently executing jobs. It also allows the service to be stopped or started.
Running Job
This displays the logging for the currently selected job. Multicore jobs will display logging after the completion of each step.
Modules & Options
Entry of the license key, display of licensed options and the email settings
Help
Display various help options and advice.
User Application
User applications can communicate with Autobahn either through the .NET API or the Command line.
The .NET API allows a user application to create and execute ad-hoc jobs. See the AUTOBAHN DX .NET API section for more details.
The Command Line interface can run any job that could be run within the Quick Job section of the product. Multi-step jobs can be created by making consecutive calls to the Command Line Interface. See the Autobahn Command Line Interface for more details.
Autobahn Windows Service
This is the heart of the product and controls the execution of both scheduled jobs and ad-hoc jobs whether submitted via the Autobahn Manager or via the Autobahn Job API. The service analyses the XML Job Definition files on start-up and when new files are created in the Job Definition directory by the Autobahn Manager, or via the Job API. The Job Definition Files describe the steps to be carried out to complete the job and the Autobahn Service will spawn sub-jobs (such as TIFF Junction or PDF Junction) where required. Job Status records and logs are maintained and can be reviewed in the Job Monitor and Job Manager.
Job Definition Files
Each Job Definition file contains the settings for the job, including where it looks for input files, where output files are saved, intermediate work folders, log files, scheduling information plus settings for each Job Step.
A Job Definition contains one or more Job Steps, each one contains the settings for that step. The Step Types include:
-
OCR
-
Conversion
-
Splitting and Merging
-
Barcodes
-
PDF operations
-
Advanced
See the Job Definition XML Files section for more details.
Debug mode
Most step types have an option for setting Debug Mode to Yes.
This will log additional information about the job step, including settings passed to sub-processes and intermediate results. This can be useful for the user to debug problem steps or their interaction with particular files and locations.
When Debug is set to Yes, it will also not delete any temporary files and folders created by that step. This needs to be monitored as this can cause large quantities of files to be left in the temporary location, especially if processing large numbers of files while in Debug mode.
Document Folders
Each job will have a set of directories containing the source documents (In), output documents (Out) and Work directories (including temporary, error, log, and work step folders).
Document Processing
Each step in a Job involves a separate Job Element process being spawned by the Autobahn Service. Details of each step are stored as elements of the Job Definition file. See Job Definition XML files and Step Types for more details.
Job Status
All jobs have an associated status file. This contains the name, state, progress, log file name, CSV log file name, last run time and cores.
Autobahn DX .NET Job API
BCL EasyPDF service
The BCL EasyPDF service is used for the conversion of some file types, it uses an installation of the relevant product for the input file type for the conversion and that the product has been set up for the service’s login user.
See Convert Any File and File Access Permissions (below).
File Access Permissions
Quick Jobs
Administrators of Autobahn DX should be aware that “Quick Job” operations are run in the context of the current logged on user so rely on the permissions granted to that user and so may process files on remote file systems may make use of either UNCs or mapped drives visible to that user.
Job Manager (Ad-Hoc or Scheduled)
Conversely, Jobs in the Job Manager are run by the Autobahn DX windows service (and in some cases the BCL easyPDF service), so when accessing remote file systems UNC paths should be used rather than mapped drives.
The job will run as the user specified in the “Log On” property page of the respective service, so it is recommended that the Service user is changed. See the Autobahn and BCL easyPDF Windows Services for more details.
What is the definition of a Core?
The number of cores licensed determines how many CPU cores the software can use concurrently. As a rough guide, the software can process 1,000 pages per hour per CPU core although this will vary according to various factors.
Autobahn DX Server can be licensed as single core, four cores or multiples of four cores. The maximum number of cores that can be used by Autobahn DX on an installation is limited to the smaller of:
-
the number of cores licensed
-
the number of logical processors (not physical cores) in the CPU.