How to Merge and Batch OCR Multiple Files
Autobahn allows users to set up and customize workflows with ease and run them automatically. It also works well when processing large volumes of documents and can merge many documents to form one output PDF. Here’s an example of how you can merge and batch OCR multiple documents. 1. Create a new job.
Click Create New. Fill in the Source Folder and Destination Folder fields by clicking the magnifying glass to the right of these fields. The source (input) folder is where all the files you want to merge should go. The destination (output) folder will initially be empty, but it’s where all the merged files will end up.
This will be a multi-step process because you need to first merge the input PDFs and then OCR those merged files so that they’re searchable. 2. Select the Combine PDFs step.
Go to the Split and Merge tab in the sidebar and click Combine PDFs. 3. Add a second step.
Go to the OCR tab and add any of the PDF to Searchable PDF steps. The **PDF to Searchable PDF (Extended)**option uses the IRIS OCR Engine. For best results, we recommend choosing PDF To Searchable PDF (GdPicture). A notification will show up to add work folders. Click OK. This is an intermediate step, and once the files are merged, they’ll be put in the work folder. Then, the second step will pick up the merged files and add the OCR text layer. Note: All files that were placed in one input folder and then merged into one PDF will be given the name of the folder where they existed. So be sure to specify the name of the folder to match what you want the merged PDF to be named. 4. Enable the subfolder option.
As you want to merge and batch OCR multiple files, you need to have subfolders. For example, one subfolder will contain a set of files you want to merge, the second subfolder will contain another set of files you want to merge, and so on. To enable this option, check the Process Sub-Folders box. That option can be found under the Destination Folder field and next to the Use Work Folders box. Save the job and go to the Job Manager tab. 5. Set up the job to run automatically.
As the default option, the job type will be set as ad hoc, which means it needs to be run manually. But in this instance, you want it to be processed automatically. Go back to the Designer and then to the Schedule tab. You can choose to use the once-per-day option, or you can use the **Continuous (Watched Folder)**option and set this job to run by the hour, minute, or second. This example sets the job to run once every minute. However, in practice, if you have larger sets of files, we recommend doing larger intervals. 6. Set the input files to move to an archive.
A common issue people initially encounter when setting up a job is that their input files stay in the input location. As a result, when they’re continuously processing that folder, those files are continuously processed. To prevent this, you should move them to an archive so that they’re no longer processed. After the files have been picked up the first time, go to the Processing tab under the Designer menu to set your archive location. 7. Delete empty input folders.
As you’re using subfolders for this merge step and the files in them are moved to an archive after being processed, you’ll be left with empty input folders. To clean up this input location, check the Delete Empty Input Folders box. In this way, all subfolders will be removed after the output files are produced. Now, save the job and see how Autobahn automatically merges and batch OCRs multiple documents that you submitted. Next, if you want to process another set of files, you can move them to the input folder. Then, the next time that the job runs, it will pick up this new folder and add merged and OCRed files to that new folder. If you want to try these steps yourself, download the free trial of Autobahn and make your documents searchable. Or, if you prefer to see these steps in action, check out our video tutorial below.