Extract pages from PDFs on Android

PdfProcessor can export pages from one document into another document. You can choose to extract a single page, a range of pages, or even multiple page ranges:

// Page numbers start at 0. This range contains the fifth page of the document.
val task = PdfProcessorTask.fromDocument(document).keepPages(setOf(4))

// Keep pages 5, 6, and 7.
val task = PdfProcessorTask.fromDocument(document).keepPages(setOf(4, 5, 6))

// Remove the first page.
val task = PdfProcessorTask.fromDocument(document).removePages(setOf(0))
// Page numbers start at 0. This range contains the fifth page of the document.
PdfProcessorTask task = PdfProcessorTask.fromDocument(document).keepPages(new HashSet<Integer>(Arrays.asList(4));

// Keep pages 5, 6, and 7.
PdfProcessorTask task = PdfProcessorTask.fromDocument(document).keepPages(new HashSet<Integer>(Arrays.asList(4, 5, 6));

// Remove the first page.
PdfProcessorTask task = PdfProcessorTask.fromDocument(document).removePages(new HashSet<Integer>(Arrays.asList(0));

After creating PdfProcessorTask, you can start the extraction of the pages by calling the PdfProcessor#processDocumentAsync method or the PdfProcessor#processDocument method. Note that by default, all annotations will be preserved. You can queue multiple operations on a document by calling multiple methods on a PdfProcessorTask object before starting processing. The operations will be executed in the same order as your method calls:

val outputFile = File(getFilesDir(), "extracted-pages.pdf")

// Keep pages 5, 6, and 7.
val task = PdfProcessorTask.fromDocument(document).keepPages(setOf(4, 5, 6))
PdfProcessor.processDocumentAsync(task, outputFile)
            // Run processing on the background thread.
            .subscribeOn(Schedulers.io())
            // Publish results on the main thread so we can update the UI.
            .observeOn(AndroidSchedulers.mainThread())
            .subscribe(
                { progress: PdfProcessor.ProcessorProgress -> Toast.makeText(context, "Processing page ${progress.pagesProcessed}/${progress.totalPages}", Toast.LENGTH_SHORT).show() },
                { error: Throwable -> Toast.makeText(context, "Processing has failed: ${error.message}", Toast.LENGTH_SHORT).show() },
                { Toast.makeText(context, "Processing has been completed successfully.", Toast.LENGTH_SHORT).show() }
            )
final File outputFile = new File(getFilesDir(), "extracted-pages.pdf");

// Keep pages 5, 6, and 7.
PdfProcessorTask task = PdfProcessorTask.fromDocument(document).keepPages(new HashSet<Integer>(Arrays.asList(4, 5, 6));
PdfProcessor.processDocumentAsync(task, outputFile)
            // Run processing on the background thread.
            .subscribeOn(Schedulers.io())
            // Publish results on the main thread so we can update the UI.
            .observeOn(AndroidSchedulers.mainThread())
            .subscribe(new DefaultSubscriber<PdfProcessor.ProcessorProgress>() {
                @Override
                public void onComplete() {
                    Toast.makeText(context, "Processing has been completed successfully.", Toast.LENGTH_SHORT).show();
                }

                @Override
                public void onError(Throwable e) {
                    Toast.makeText(context, "Processing has failed:" + e.getMessage(), Toast.LENGTH_SHORT).show();
                }

                @Override
                public void onNext(PdfProcessor.ProcessorProgress processorProgress) {
                    Toast.makeText(context, "Processing page " + processorProgress.getPagesProcessed() + "/" + processorProgress.getTotalPages(), Toast.LENGTH_SHORT).show();
                }
            });

💡 Tip: You can use page extraction to merge pages of two or more documents. All you need to do is load a compound PdfDocument — for example, by using PSPDFKit#openDocuments or any of the PdfActivity#showDocuments methods. Have a look at DocumentProcessingExample inside the Catalog app for a demo of this.