Streamline document processing with automation

Overview

Document Automation Server (DAS) is a Document Processing product designed to fit into an organization’s document workflow.

DAS provides high performance automated OCR and Conversion of a variety of different input document formats including TIFF images, PDF Files, Microsoft Office documents and HTML pages.

The Server editions expand on this capability by providing the ability to create multi-step jobs – combining different steps together – that can be scheduled automatically to run.

This can provide a “hot folder” capability in Windows File System, SharePoint (Online/Office365/On-Premises) or Azure Blob storage.

The output files can then be saved to Windows File System, SharePoint (Online/Office365/On-Premises) or Azure Blob storage.

Edition Comparison

The Table below shows you the difference between the different editions of DAS

Edition Comparison Workstation Server with Standard OCR Server with Extended OCR
Convert TIFF, JPEG, BMP, PNG, GIF to PDF + OCR
Convert MS Office & Open Files to PDF
OCR Image Only PDF
Merge TIFF, JPEG, BMP, PNG, GIF to TIFF/PDF
Split/Merge PDF Files
Extract Text from PDF Files
Set PDF Properties
Pre-Processing (Deskew, Despeckle, Line Removal) and Auto-Rotation
OCR Language support 23 23 129
TXT, RTF & HTML Output options
Advanced MRC & JBIG2 Compression on output PDFs
Processing Source PDFs with Passwords
Setting PDF Security on Output Files
Split/Rename by Barcode
Handwriting recognition via Google and Microsoft Cloud APIs
Load Balancing/ Distributed Polling across Multiple DAS Instances
Pause and Restart Jobs
Azure Storage Support
Multicore (up to a maximum of 64 cores)
Watched Folders, Scheduled Tasks, and Windows Service Support
Multistep Jobs, allowing the creation of workflows
SharePoint/Office 365 Upload/Download
Read/Send Emails using IMAP4 and Basic or OAUTH2 authentication
Use of Custom Scripts
High Availability
Integration with Aquaforest Kingfisher (Nutrient DAS Content Extraction)
XML-based Job Definitions
.NET API
Run Jobs via Command Line
GDPicture OCR engine
IRIS Extended OCR engine
Improved recognition of poorer quality documents
Support for multiple languages within a single document from the same character set
Asian Language Support
Arabic Language Support
Intelligent High-Quality Compression)
Multiple document output formats: PDF, DOCX, WORDML, RTF, CSV, XLSX, EXCELML, TXT, HTML and XPS

DAS Architecture and Concepts

Diagram, schematic Description automatically generated

DAS Administrator

This is a Windows application that provides the primary administration interface.

This application has the following sections.

Graphical user interface, application, table Description automatically generated

Quick Job

Jobs can be defined and run interactively which provides a convenient method for testing the product’s capabilities and running simple jobs.

Job Manager

This provides a method of managing the scheduled jobs previously defined in the Designer. They can be selected and then deleted, copied, edited (in the Designer) or scheduled.

Designer

This provides a graphical interface to allow the creation of a series of steps that make up a job. The process specification can then be saved (as an XML Job File) and run or scheduled via the Job Manager section of the application.

Monitor

This shows the current service status and displays the Job Status of currently executing jobs. It also allows the service to be stopped or started.

Running Job

This displays the logging for the currently selected job. Multicore jobs will display logging after the completion of each step.

Modules & Options

Entry of the license key, display of licensed options and the email settings

Help

Display various help options and advice.

User Application

User applications can communicate with DAS either through the .NET API or the Command line.

The .NET API allows a user application to create and execute ad-hoc jobs. See the DAS .NET API section for more details.

The Command Line interface can run any job that could be run within the Quick Job section of the product. Multi-step jobs can be created by making consecutive calls to the Command Line interface. See the DAS Command Line interface for more details.

DAS Windows Service

This is the heart of the product and controls the execution of both scheduled jobs and ad-hoc jobs whether submitted via the DAS Manager or via the DAS Job API. The service analyses the XML Job Definition files on start-up and when new files are created in the Job Definition directory by the DAS Manager, or via the Job API. The Job Definition Files describe the steps to be carried out to complete the job and the DAS Service will spawn sub-jobs (such as TIFF Junction or PDF Junction) where required. Job Status records and logs are maintained and can be reviewed in the Job Monitor and Job Manager.

Job Definition Files

Each Job Definition file contains the settings for the job, including where it looks for input files, where output files are saved, intermediate work folders, log files, scheduling information plus settings for each Job Step.

A Job Definition contains one or more Job Steps, each one contains the settings for that step. The Step Types include:

  • OCR

  • Conversion

  • Splitting and Merging

  • Barcodes

  • PDF operations

  • Advanced

See the Job Definition XML Files section for more details.

Debug mode

Most step types have an option for setting Debug Mode to Yes.

This will log additional information about the job step, including settings passed to sub-processes and intermediate results. This can be useful for the user to debug problem steps or their interaction with particular files and locations.

When Debug is set to Yes, it will also not delete any temporary files and folders created by that step. This needs to be monitored as this can cause large quantities of files to be left in the temporary location, especially if processing large numbers of files while in Debug mode.

Document Folders

Each job will have a set of directories containing the source documents (In), output documents (Out) and Work directories (including temporary, error, log, and work step folders).

Document Processing

Each step in a Job involves a separate Job Element process being spawned by the DAS Service. Details of each step are stored as elements of the Job Definition file. See Job Definition XML files and Step Types for more details.

Job Status

All jobs have an associated status file. This contains the name, state, progress, log file name, CSV log file name, last run time and cores.

DAS .NET Job API

BCL EasyPDF service

The BCL EasyPDF service is used for the conversion of some file types, it uses an installation of the relevant product for the input file type for the conversion and that the product has been set up for the service’s login user.

See Convert Any File and File Access Permissions (below).

File Access Permissions

Quick Jobs

Administrators of DAS should be aware that “Quick Job” operations are run in the context of the current logged on user so rely on the permissions granted to that user and so may process files on remote file systems may make use of either UNCs or mapped drives visible to that user.

Job Manager (Ad-Hoc or Scheduled)

Conversely, Jobs in the Job Manager are run by the DAS windows service (and in some cases the BCL easyPDF service), so when accessing remote file systems UNC paths should be used rather than mapped drives.

The job will run as the user specified in the “Log On” property page of the respective service, so it is recommended that the Service user is changed. See the DAS and BCL easyPDF Windows Services for more details.

What is the definition of a Core?

The number of cores licensed determines how many CPU cores the software can use concurrently. As a rough guide, the software can process 1,000 pages per hour per CPU core although this will vary according to various factors.