Convert PDFs to Word, Excel, or PowerPoint

Document Engine includes the ability to convert any supported file type into Word, Excel, or PowerPoint. This technology applies a unique hybrid machine learning approach to detect structural elements in the source documents such as paragraphs, tables, and columns.

The PDF-to-Office API license is required to access PDF-to-Office capabilities.

Converting a File to an Office Document

To convert a file to an Office document, post a request to the /api/build(opens in a new tab) endpoint. In the instructions, specify the type parameter as one of the following:

  • docx — convert to Word
  • xlsx — convert to Excel
  • pptx — convert to PowerPoint
Terminal window
curl -X POST http://localhost:5000/api/build \
-H "Authorization: Token token=<API token>" \
-F document=@/path/to/example-document.pdf \
-F instructions='{
"parts": [
{
"file": "document"
}
],
"output": {
"type": "docx"
}
}' \
-o result.docx

For more information on the build instructions, refer to the API Reference(opens in a new tab).

Converting a Document Engine Document to an Office Document

Build API instructions can be also used to process documents managed by Document Engine. To reference existing documents, use the following part:

{
"document": {
"id": "<document_id>",
"layer_name": "<optional_layer_name"
}
}

For example, to convert a document with the ID my_document to Excel, perform the following request:

Terminal window
curl -X POST http://localhost:5000/api/build \
-H "Authorization: Token token=<API token>" \
-F instructions='{
"parts": [
{
"file": {
"document": {
"id": "my_document"
}
}
}
],
"output": {
"type": "xlsx"
}
}' \
-o result.xlsx