Optimizing load balancing with distributed polling
The distributed polling step is used to provide a method of distributing processing load between a number of servers.
The Distributed Polling job copies a number of files based on the Limit step property from a shared location to a local folder. The local folder serves as an input file for another job to process.
This step can be used to implement load balancing in Document Automation Server (DAS). Multiple DAS servers can point to one input folder, as a result, the files will be shared across several servers and the processing will be more optimized.
The next section will work us through setting up a Distributed Polling job.
Job Setup
Before we start setting up the jobs, make sure you perform the steps below.
-
Open
“C:\Aquaforest\Autobahn DX\config\Autobahn.config”
and change the value of “inputdelay” to 30.
-
Make sure the DAS service is configured to run as a user with enough privileges to access all the shared locations.
-
Restart the DAS service
The effectiveness of the distribution of the documents from the central location to the local servers as well as the OCR will depend on the schedule intervals of the job below, the "Limit" step property of the Distributed Polling Job, and the “inputdelay”. You may need to use trial and error to get the optimum settings. Do not set the two jobs to run at the same time, because the first job needs to run at least once for the second job to have any input files to process.
To set up a distributed polling job on a DAS Server you will have to create two jobs:
-
Distributed Polling Job: To copy the files from a shared folder to a local folder.
-
Autobahn Job: This job will process the files that were copied from the Job above, for example, the Destination Folder of the first job will serve as the Source Folder of this job.
Distributed Polling Job
The Distributed Polling Job will copy files from the central shared location to the local computer’s input location to be processed by an Autobahn Job.
-
After copying the files to the local location (b)
-
They will be deleted from the central location (a).
-
The central shared location containing all the input files.
-
The folder to where the files will be copied to.
-
Do not use work folders.
-
Set the scheduler to run every x number of seconds/minutes. This must be different from the 1(e).
Screen Field | Description |
---|---|
Autobahn Job ID | The Job ID of the Job that will be processing your input files. Note: The Source Folder of this job will be the Destination Folder of the Distributed Polling Job. |
Limit | The maximum number of files to be copied to the shared location per run. |
Extensions | Enter the file extensions you want us to copy separated by a comma. For example, “.pdf,.tif,.tiff” |
Process Sub Folder | Select true if you want to copy subfolders. Note: If this is set to true, then “Process Sub-folders” needs to be ticked in the OCR job. |
Debug | Select true if you want to see more debug output. |
Autobahn Job
In this section we will show you how to set up an Autobahn Job to process the files that were copied from the Distributed Polling Job.
-
Set the source folder to a local location on the server.
-
The destination folder can either be local to the server or a shared location.
-
Set “Input Files” to either “Delete Input Files” or “Move to Archive after Processing”.
-
Make a note of the job ID. This is used in the Distributed Polling step.
-
Set the scheduler to run every x number of seconds/minutes.