Extract Text from PDFs Using JavaScript
Extracting text from a PDF can be a complex task, so we offer several abstractions to make this simpler. In a PDF, text usually consists of glyphs that are absolutely positioned. PSPDFKit heuristically splits these glyphs up into words and blocks of text. Our user interface leverages this information to allow users to select and annotate text. You can read more about this in our text selection guide.
Use textLinesForPageIndex
to extract the text from a given PDF page index:
const lines = await instance.textLinesForPageIndex(0);
For Server-based deployment, use the /pages/:page_index/text
endpoint to fetch all text contained in a page.