Create Text Highlight Annotations from Text Extraction
Extracting text from a PDF file is a common task, but it isn’t always as straightforward as it should be. For that reason, PSPDFKit offers APIs to retrieve text from a document. On PSPDFKit for Web, you can extract text from a page using textLinesForPageIndex
and instance.create
.
The first step is to extract the text from a page of the PDF document:
// Getting all text lines from page `0`. const textLines = await instance.textLinesForPageIndex(0); textLines.forEach((textLine) => console.log(textLine.contents));
Then, retrieve the text lines bounding boxes using PSPDFKit.TextLine#boundingBox
:
const boundingBoxes = textLines.map((textLine) => textLine.boundingBox);
This will return a PSPDFKit.Geometry.Rect
record for any textLine
on that page. In this case, it returns a PSPDFKit.Immutable.List
of two records because there are two lines of text on page 0
of the document.
The final step is to create an highlight annotation using the boundingBoxes
, like this:
instance.create( new PSPDFKit.Annotations.HighlightAnnotation({ pageIndex: 0, rects: boundingBoxes, boundingBox: PSPDFKit.Geometry.Rect.union(boundingBoxes) });