Convert PDFs to Word, Excel, or PowerPoint
Document Engine includes the ability to convert any supported file type into Word, Excel, or PowerPoint. This technology applies a unique hybrid machine learning approach to detect structural elements in the source documents such as paragraphs, tables, and columns.
The PDF-to-Office API license is required to access PDF-to-Office capabilities.
Converting a File to an Office Document
To convert a file to an Office document, post a request to the /api/build
(opens in a new tab) endpoint. In the instructions, specify the type
parameter as one of the following:
docx
— convert to Wordxlsx
— convert to Excelpptx
— convert to PowerPoint
curl -X POST http://localhost:5000/api/build \ -H "Authorization: Token token=<API token>" \ -F document=@/path/to/example-document.pdf \ -F instructions='{ "parts": [ { "file": "document" } ], "output": { "type": "docx" }}' \ -o result.docx
POST /api/build HTTP/1.1Content-Type: multipart/form-data; boundary=customboundaryAuthorization: Token token=<API token>
--customboundaryContent-Disposition: form-data; name="document"; filename="example-document.pdf"Content-Type: application/pdf
<PDF data>--customboundaryContent-Disposition: form-data; name="instructions"Content-Type: application/json
{ "parts": [ { "file": "document" } ], "output": { "type": "docx" }}--customboundary--
For more information on the build instructions, refer to the API Reference(opens in a new tab).
Converting a Document Engine Document to an Office Document
Build API instructions can be also used to process documents managed by Document Engine. To reference existing documents, use the following part:
{ "document": { "id": "<document_id>", "layer_name": "<optional_layer_name" }}
For example, to convert a document with the ID my_document
to Excel, perform the following request:
curl -X POST http://localhost:5000/api/build \ -H "Authorization: Token token=<API token>" \ -F instructions='{ "parts": [ { "file": { "document": { "id": "my_document" } } } ], "output": { "type": "xlsx" }}' \ -o result.xlsx
POST /api/build HTTP/1.1Content-Type: multipart/form-data; boundary=customboundaryAuthorization: Token token=<API token>
--customboundaryContent-Disposition: form-data; name="instructions"Content-Type: application/json
{ "parts": [ { "file": { "document": { "id": "my_document" } } } ], "output": { "type": "xlsx" }}--customboundary--