Find and Convert PDF Coordinates with JavaScript
By specification, PDF documents have their own coordinate space, which is different from the coordinate space used by Nutrient Web SDK. While Web APIs have their origin in the top-left corner, with y coordinates increasing in a downward direction, the y coordinates in PDF documents increase upward, starting in the bottom-left corner.
Additionally, the PDF coordinate space can be offset from the visible bottom-left corner of a page due to a crop box, and it can also be rotated due to the page being rotated. This is efficient because it means cropping or rotating a page is just setting a value: You don’t need to change the content stream or any of the annotations on the page. However, this can also be confusing to work with. Therefore, Nutrient exposes a normalized page coordinate space, which always puts the origin in the top-left corner of the visible area of the page.
ℹ️ Note: Our Instant JSON format uses a coordinate space where the origin is the top-left corner of the page, with the y-axis increasing downward.
Obtaining page dimensions
The getPageInfoForIndex()
function can be used to fetch the dimensions of a page. Note that some PDF files can also have rotation set on pages. You can get this value by checking the rotation property of a page. This can be a value between 0 and 270 in 90-degree increments.
Understanding XFDF/PDF rects
In addition to having their own coordinate spaces, PDFs also represent the bounding box of an annotation differently than Nutrient. For example, consider the case where we export the XFDF of a rectangle annotation like so:
<!-- Other attributes omitted for clarity --> <square rect="50.000000, 100.000000, 80.000000, 120.000000" />
The rect
attribute contains the following information in this order:
-
The left side of the rectangle is 50 units from the left of the page.
-
The bottom side of the rectangle is 100 units from the bottom of the page.
-
The right side of the rectangle is 80 units from the left of the page.
-
The top side of the rectangle is 120 units from the bottom of the page.
The width of the rectangle annotation is 30 units (80-50) and the height is 20 units (120-100).
We already have the left value, but if we want to calculate the distance between the top side of the rectangle and the top of the page (Nutrient bounding boxes use width, height, left, and top values), we can subtract the XFDF top value from the height of the page. With a page height of 800, the adjusted top value would be 680 (800-120).
Thus, the equivalent Nutrient bounding box would be:
{ "top": 680, "left": 50, "width": 30, "height": 20 }
How to convert between raster image pixels and points
The concept of resolution does not apply to PDF documents unless they have been converted into raster images, i.e. images whose dimensions are expressed in pixels. The default unit Nutrient returns for page sizes is the point, which is easily converted into inches by considering the fact that 1 inch is equal to 72 points. Inch separation results from dividing the size in points of a particular page by 72. Resolution, expressed in DPI (dots per inch), is thus the result of dividing the page size in points by the inch separation. To summarize, these are the relations you have to consider when you need to convert between points/inches and pixels in PDF:
1 inch = 72 points
Inch separation = points / 72
DPI (resolution) = pixels / inch separation
Note that, since PDF 1.6, the relationship between inches and points may be specified as greater than 1⁄72 by means of the UserUnit
entry of the page dictionary. See table 30 on page 79 of the PDF 1.7 specification for more information.