Migrate Documents from Other Locations to Document Engine
You can migrate documents stored in your existing infrastructure into Document Engine using the remote document URL API. This enables Document Engine to fetch the documents directly from the provided URL when required, which means you won’t incur any additional storage costs.
For this to work correctly, the URLs you provide cannot require any authentication, and they need to stay valid for as long as the document exists. Furthermore, the file returned by the URL needs to match exactly with the file that’s returned on the initial upload. If, at any point, the URL for a document becomes invalid, you’ll have to delete the document and reupload it. There’s no way to update the URL for an existing document.
You can either upload the documents all together in one go, or on demand.
Uploading All Documents
To upload all your documents, use the API to add a document from a URL. You can call this with each of your documents’ URLs to let Document Engine know about all your documents. If your documents already have IDs your application knows about, you can also supply them here so you have one consistent ID everywhere:
POST /api/documents Content-Type: application/json Authorization: Token token="<secret token>" { "url": "http://file.example.com/sample.pdf", "document_id": "my_document_id_1" }
curl http\://127.0.0.1\:5000/api/documents \ -X POST \ -H "Authorization: Token token=<secret token>" \ -H "Content-type: application/json" \ -d '{"url": "http://file.example.com/sample.pdf", "document_id": "my_document_id_1"}'
Uploading Documents on Demand
When a user requests one of the documents, you can check on Document Engine to see if a document with this ID already exists. If this isn’t the case, you can upload the document with the same ID your user used for the document. Otherwise, you can serve them the document immediately.
As an example, let’s say you have a route like /documents/:id
and your user requests a specific document with the ID my_document_id_1
, which you internally have mapped to a URL.
Now you can use the document info endpoint on Document Engine, GET /api/documents/my_document_id_1/document_info
, and see if the document already exists. If this isn’t the case, you’ll receive a 404 error and call the adding a document from a URL endpoint:
POST /api/documents Content-Type: application/json Authorization: Token token="<secret token>" { "url": "http://file.example.com/sample.pdf", "document_id": "my_document_id_1" }
curl http\://127.0.0.1\:5000/api/documents \ -X POST \ -H "Authorization: Token token=<secret token>" \ -H "Content-type: application/json" \ -d '{"url": "http://file.example.com/sample.pdf", "document_id": "my_document_id_1"}'