Extract text from PDFs using JavaScript

Extracting text from a PDF can be a complex task, so we offer several abstractions to make this simpler. In a PDF, text usually consists of glyphs that are absolutely positioned. Nutrient heuristically splits these glyphs up into words and blocks of text. Our user interface leverages this information to allow users to select and annotate text. You can read more about this in our text selection guide.

Use textLinesForPageIndex to extract the text from a given PDF page index:

const lines = await instance.textLinesForPageIndex(0);

For Server-based deployment, use the [/pages/:page_index/text endpoint][] to fetch all text contained in a page.

Extract text from PDFs using JavaScript

Was this helpful?

Help us improve

Thank you for your feedback!

Something went wrong. Please try again or let us know.