Extract metadata from PDFs on Android
Nutrient comes with DocumentPdfMetadata
and DocumentXmpMetadata
, which allow you to retrieve or modify a document’s metadata. This guide covers extracting metadata (to modify metadata, see our separate guide for editing metadata).
Dictionary-based metadata
Use DocumentPdfMetadata
to work with the dictionary-based metadata in a PDF.
All values specified in the PdfValue
are represented by the following types:
-
Boolean
-
long
-
double
-
String
-
List<PdfValue>
-
Map<String, PdfValue>
By default, the dictionary metadata may contain the following information keys:
-
Author
-
CreationDate
-
Creator
-
Keywords
-
ModDate
-
Producer
-
Title
You can, of course, add any supported key-value dictionary to the metadata. When dealing with these predefined keys, it’s recommended to use the DocumentPdfMetadata
getters and setters so that you get out-of-the-box conversions from objects such as Date
.
To get an entry of the metadata dictionary (e.g. the Author
), you can use the following code snippet:
val document = ... val pdfMetadata = document.getPdfMetadata() val author = pdfMetada.getAuthor()
PdfDocument document = ... DocumentPdfMetadata pdfMetadata = document.getPdfMetadata(); String author = pdfMetadata.getAuthor();
For any custom values, use this:
val document = ... val pdfMetadata = document.pdfMetadata val value = pdfMetada.get("Custom key")
PdfDocument document = ...
DocumentPdfMetadata pdfMetadata = document.getPdfMetadata();
PdfValue value = pdfMetadata.get("Custom key");
XMP metadata
Use DocumentXmpMetadata
to work with the metadata stream containing XMP data.
Each key in the XMP metadata stream has to have a namespace set. You can define your own namespace or use one of the already existing ones. PSPDFKit exposes two constants for common namespaces:
-
DocumentPdfMetadata#XMP_PDF_NAMESPACE
/DocumentPdfMetadata#XMP_PDF_NAMESPACE_PREFIX
— the XMP PDF namespace created by Adobe §3.1 -
DocumentXmpMetadata#XMP_DC_NAMESPACE
/DocumentXmpMetadata#XMP_DC_NAMESPACE_PREFIX
— the Dublin Core namespace
When setting a value, you also have to pass along a suggested namespace prefix, as this can’t be generated automatically.
Use the following code snippet to get an object from the XMP metadata:
val xmpMetadata = document.xmpMetadata val pdfValue = xmpMetadata.get("Key", NAMESPACE)
DocumentXmpMetadata xmpMetadata = document.getXmpMetadata();
PdfValue pdfValue = xmpMetadata.get("Key", NAMESPACE);