Migrate documents from other locations to Document Engine

You can migrate documents stored in your existing infrastructure to Document Engine using the remote document URL API. This enables Document Engine to fetch the documents directly from the provided URL when required, which means you won’t incur any additional storage costs.

For this to work correctly, the URLs you provide cannot require any authentication, and they need to stay valid for as long as the document exists. Furthermore, the file returned by the URL needs to match exactly with the file that’s returned on the initial upload. If, at any point, the URL for a document becomes invalid, you’ll have to delete the document and reupload it. There’s no way to update the URL for an existing document.

You can either upload the documents all together in one go, or on demand.

Uploading all documents

To upload all your documents in one go, use the API to add a document from a URL. Call this API for each document URL to register them with Document Engine. If your application already assigns unique IDs to documents, you can include them in the request to maintain consistency across systems:

POST /api/documents
Content-Type: application/json
Authorization: Token token="<secret token>"

{
  "url": "http://file.example.com/sample.pdf",
  "document_id": "my_document_id_1"
}
curl http\://127.0.0.1\:5000/api/documents \
    -X POST \
    -H "Authorization: Token token=<secret token>" \
    -H "Content-type: application/json" \
    -d '{"url": "http://file.example.com/sample.pdf", "document_id": "my_document_id_1"}'

Uploading documents on demand

When a user requests a document, first check if it already exists in Document Engine. If the document is available, serve it immediately. Otherwise, upload it using the same ID the user provided.

For example, if your application has a route like /documents/:id and a user requests my_document_id_1, you can check if the document exists using the document info endpoint:

GET /api/documents/my_document_id_1/document_info

If the document doesn’t exist, the request returns a 404 error. In that case, upload the document using the adding a document from a URL endpoint:

POST /api/documents
Content-Type: application/json
Authorization: Token token="<secret token>"

{
  "url": "http://file.example.com/sample.pdf",
  "document_id": "my_document_id_1"
}
curl http\://127.0.0.1\:5000/api/documents \
    -X POST \
    -H "Authorization: Token token=<secret token>" \
    -H "Content-type: application/json" \
    -d '{"url": "http://file.example.com/sample.pdf", "document_id": "my_document_id_1"}'