Efficient backup strategies for Document Engine

Information

PSPDFKit Server has been deprecated and replaced by Document Engine. To start using Document Engine, refer to the migration guide. With Document Engine, you’ll have access to robust new capabilities (read the blog for more information).

Before you deploy PSPDFKit Server in a production environment, you should set up a backup strategy with scheduled automated backups and a tested disaster recovery plan.

PSPDFKit Server uses PostgreSQL as a data store, while binary assets including PDFs are stored in a Docker volume. You need to make backups of both.

PostgreSQL

The easiest way to take a snapshot of your PostgreSQL database is to run pg_dump inside the PSPDFKit container and redirect the dump to a file you can back up using your existing backup infrastructure.

Using the recommended docker-compose.yml configuration, you can dump a snapshot to the pspdfkit_db_dump.sql file:

docker-compose run --rm pspdfkit pg_dump > pspdfkit_db_dump.sql

You can then restore the dump into a fresh PostgreSQL container:

docker-compose run --rm pspdfkit pg_isready --timeout=15 # Wait until PostgreSQL has started.
docker-compose run --rm pspdfkit psql < pspdfkit_db_dump.sql

Assets

Built-In Storage

When you use the built-in storage option, all assets are backed up with the PostgreSQL backup.

S3

Using the S3-compatible backend means you need a separate backup routine, but you should consider that:

  • As PSPDFKit Server stores files by their SHA checksums, most of the time, a daily, incremental backup will suffice.

  • You should schedule the asset storage backup right after the PostgreSQL database backup to prevent data from drifting between the two.

Large Installations

While these approaches are easy to set up and automate via cron or something similar, for larger installations we recommend using a log-shipping backup product, like Barman or WAL-E for PostgreSQL and BorgBackup or duplicity for the asset volume.