Convert PDFs to Word, Excel, or PowerPoint
Document Engine includes the ability to convert any supported file type into Word, Excel, or PowerPoint. This technology applies a unique hybrid machine learning approach to detect structural elements in the source documents such as paragraphs, tables, and columns.
The PDF-to-Office API license is required to access PDF-to-Office capabilities.
Converting a File to an Office Document
To convert a file to an Office document, post a request to the /api/build
endpoint. In the instructions, specify the type
parameter as one of the following:
-
docx
— convert to Word -
xlsx
— convert to Excel -
pptx
— convert to PowerPoint
curl -X POST http://localhost:5000/api/build \ -H "Authorization: Token token=<API token>" \ -F document=@/path/to/example-document.pdf \ -F instructions='{ "parts": [ { "file": "document" } ], "output": { "type": "docx" } }' \ -o result.docx
POST /api/build HTTP/1.1 Content-Type: multipart/form-data; boundary=customboundary Authorization: Token token=<API token> --customboundary Content-Disposition: form-data; name="document"; filename="example-document.pdf" Content-Type: application/pdf <PDF data> --customboundary Content-Disposition: form-data; name="instructions" Content-Type: application/json { "parts": [ { "file": "document" } ], "output": { "type": "docx" } } --customboundary--
For more information on the build instructions, refer to the API Reference.
Converting a Document Engine Document to an Office Document
Build API instructions can be also used to process documents managed by Document Engine. To reference existing documents, use the following part:
{ "document": { "id": "<document_id>", "layer_name": "<optional_layer_name" } }
For example, to convert a document with the ID my_document
to Excel, perform the following request:
curl -X POST http://localhost:5000/api/build \ -H "Authorization: Token token=<API token>" \ -F instructions='{ "parts": [ { "file": { "document": { "id": "my_document" } } } ], "output": { "type": "xlsx" } }' \ -o result.xlsx
POST /api/build HTTP/1.1 Content-Type: multipart/form-data; boundary=customboundary Authorization: Token token=<API token> --customboundary Content-Disposition: form-data; name="instructions" Content-Type: application/json { "parts": [ { "file": { "document": { "id": "my_document" } } } ], "output": { "type": "xlsx" } } --customboundary--