What is PDF/A? The complete guide to PDF/A‑1/2/3/4
Table of contents
PDF/A is an ISO standard for long-term document preservation. This guide covers all four PDF/A versions (PDF/A-1 through PDF/A-4), their conformance levels, when to use each version, conversion strategies, validation with veraPDF, and how to implement PDF/A conversion at scale.
This post covers the PDF/A standard: what it requires, which version to pick, and how to convert and validate documents at scale.
Already familiar with PDF/A? Jump straight to our configurable PDF/A library and supported conversion strategies.
What is PDF/A?
In PDF/A, the A stands for archival. PDF/A is an open ISO standard designed to preserve electronic documents for the long term, regardless of future technological changes. The standard evolves across versions to address new archival requirements, accessibility needs, and document formats.
History of PDF/A
PDF/A, Portable Document Format Archive, originated in the early 2000s as a response to the lack of a standard archival format for electronic documents. In 2005, ISO published the first version, PDF/A-1, which was based on the PDF 1.4 specification.
Later revisions followed: PDF/A-2 in 2011 and PDF/A-3 in 2012. Each version added features like embedded metadata support and Unicode character requirements. PDF/A is now the dominant archival standard for electronic documents, and it has been adopted by organizations and governments globally.
Who uses PDF/A and why?
The PDF/A family of standards was introduced in 2005 and gained wide adoption. Many organizations, especially in Europe, mandate PDF/A for electronic document preservation. National archives and libraries, including NARA and PACER in the United States, also recommend PDF/A.
Regulatory compliance drives much of this adoption. Businesses facing audits must keep complete, accurate, and unchanged document records for decades — sometimes indefinitely. Those records need to remain usable across many generations of technology. Legal institutions mandate PDF/A for contracts and court documents because the format guarantees integrity regardless of future software changes.
PDF/A’s widespread acceptance and advantages over other archival formats simplify this process. Banks, libraries, and government bodies that need reliable long-term storage rely on it heavily.
PDF/A benefits
PDF/A is a platform-independent format containing all the necessary resources for universal rendering of the original intended visual appearance of documents.
Beyond visible information, PDF/A preserves hidden file elements like document and object metadata, which are critical to efficient document management. It also enforces strict file structure and text encoding standards, which protect a document’s long-term technical accessibility.
The standard PDF file format, unlike PDF/A, allows for various security features and the inclusion of multimedia content such as audio and video, which aren’t permitted in PDF/A.
PDF/A also inherits key benefits from the PDF format:
- PDF/A, governed by the PDF imaging model, itself based on PostScript, combines vector graphics and raster images for accurate display and printing.
- It uses sophisticated compression (FlateDecode) and advanced image formats (JPEG2000) to reduce file sizes for storage while preserving quality in both scanned and image-heavy documents.
- You can open and view PDF/A consistently using any of the widely available PDF readers, including in any standard browser through simple drag and drop. The same can’t be said for alternatives like TIFF images or Word documents.
When to use PDF/A
PDF/A works well for digitally born documents — those created by desktop publishing programs — that include live text, complex formatting, embedded raster images, colors, vector graphics, and pagination. The PDF/A-2u conformance level adds reliable text searching and copying for long-term preservation.
PDF/A’s support for advanced compression and composition techniques also makes it suitable for scanned documents, producing smaller files that are easier to store.
PDF/A comes in many versions, or “flavors,” serving niche applications — such as archiving hybrid electronic invoices(opens in a new tab), engineering documents with embedded 3D models, and emails with file attachments.
That said, PDF/A is less suited to content with dynamic or multimedia aspects, like websites and spreadsheets, as they require removing or “flattening” most dynamic parts into static images, which results in lost information.
Why can’t I just archive regular PDFs? PDF vs. PDF/A
Regular PDFs can contain external dependencies — linked fonts, embedded media, or references to outside resources — that may become unavailable over time. PDF/A eliminates this risk. Through PDF/A conversion and verification, the document becomes completely self-contained and device independent, ensuring it remains readable for long-term archiving and regulatory submissions.
| Capability | Regular PDF | PDF/A |
|---|---|---|
| Font embedding | Recommended | Required |
| External content references | Allowed | Forbidden |
| JavaScript | Allowed | Forbidden |
| Audio/video | Allowed | Forbidden |
| Encryption | Allowed | Forbidden |
| XMP metadata | Optional | Required |
| Device-independent color | Optional | Required |
| Long-term readability | Not guaranteed | Guaranteed by ISO standard |
Why PDF doesn’t work for archiving — The problem of malformed files
The initial concept behind regular PDFs was for them to preserve the original look of documents — that regardless of the system or software used, PDFs would render identically. But, in practice, that’s not always the case, as there are many ways to create a PDF file, even if they look the same.
This surface-level uniformity can be deceptive.
When viewed, a malformed PDF often looks identical to a well-formed one because the viewer’s built-in repair function silently fixes issues. Contemporary users see no difference, but problems surface years later when older repair heuristics no longer apply, making upfront verification critical to avoid issues with fonts and other preservation risks.
Example: Non-embedded fonts
Consider the example of fonts. The PDF specification highly recommends embedding font sets for rendering visible characters in a document. But not all PDF generators embed font information; instead, they reference a font that’s somewhere else, such as a user-installed font or a preinstalled system font.
When a modern viewer can’t recover font data, users encounter substituted fonts or synthesized fonts after auto-repair. Users can then encounter modified text or less-than-smooth performance, because synthesizing vector fonts eats up more memory. Even so, the text is (usually) still readable.
In an extreme scenario, however, text completely disappears or becomes garbled and thus illegible. Even today, a PDF viewer can fail where text data is unrecoverable.


Figure 1 — The images above show an extreme example from a PDF.js community issue(opens in a new tab) for non-embedded fonts. The parent issue was reported in 2014 and closed(opens in a new tab) in 2021 without an official PDF.js contributor community fix. The recommended solution is for all PDF creators to embed their font sets, or at least a subset — something PDF/A requires.
How PDF/A works to safeguard your PDFs
The reliability of accessing external resources like fonts in PDFs diminishes over time, and developers’ ability to compensate for malformed files degrades with it. PDF/A addresses these preservation risks through additional restrictions and requirements.
Some examples:
- All fonts must be correctly embedded to ensure universal rendering.
- External content references are forbidden.
- LZW image compression is banned.(opens in a new tab)
- Audio and video content are forbidden, meaning no audio or movie annotations are allowed.
- Use of a machine-readable and standards-based metadata is required via the Extensible Metadata Platform (XMP)(opens in a new tab).
- JavaScript and executable files are forbidden.
- Color spaces must be specified(opens in a new tab) in a device-independent manner to preserve color fidelity.
- File-level encryption is not allowed(opens in a new tab).
What is PDF/A verification and why is it necessary?
PDF/A verification (also called validation) checks whether a PDF actually meets the PDF/A standard. Our PDF-to-PDF/A converter performs validation to confirm compliance.
A PDF/A converter must produce files that pass a strict verification process. Developers need to confirm their converters work correctly, and users need to choose tools that follow the PDF/A and ISO specifications.
Automated verification is critical because you cannot distinguish an archive-safe PDF/A from an unsafe PDF by visual inspection alone, and manual object inspection doesn’t scale to high-throughput server environments.
How verification works
During PDF/A verification, a tool inspects every object in a file and reports errors, such as elements that need removal (encryption) or information that needs embedding (fonts).
Developers who produce well-formed PDFs will pass verification with fewer repairs. Converting files whose creation you cannot control or targeting a stricter conformance level increases the difficulty.
Once a file passes, it receives the PDF/A metadata flag, indicating it’s an authentic PDF/A document suitable for archiving.
PDF/A verification challenges
In the past, PDF/A files that passed verification in one tool would fail in another due to different vendor interpretations(opens in a new tab) of the PDF/A specifications. These inconsistencies made the PDF/A flag an unreliable indicator of a file’s actual archive-readiness.
The veraPDF suite now provides a reliable, detailed PDF/A verification tool that resolved these past conflicts. Nutrient uses veraPDF’s precision to fine-tune its own PDF/A conversion and validation engine.
What is veraPDF and how can you use it?

The PDF industry formed the veraPDF consortium in 2014 with European Union PREFORMA project funding. The coalition, led by the Open Preservation Foundation (OPF) and the PDF Association, built an open source PDF/A validator intended as the reference tool for individuals, vendors, and archivists.
veraPDF performs a close, object-by-object examination of documents, positioning it as an ideal benchmarking tool. However, veraPDF’s analysis, while highly precise, isn’t suited for server conversions requiring validation in seconds, as these demand a more efficient approach.
Today, the veraPDF suite includes the following:
- A graphical user interface (GUI) desktop version to test a single file
- A command-line interface (CLI) for batch processing
- An online demo(opens in a new tab)
- A Java library — An API for developers to embed veraPDF(opens in a new tab) in their Java-based apps
What are the different PDF/A versions and conformances? Why do they matter?
PDF/A comes in different versions to support different archival uses, and the conformance level you pick will impact the ease with which your files pass PDF/A verification and features present in your files.
The PDF/A standard has four versions, each released in a different year and based on a newer version of the core PDF specification:
- PDF/A-1 (ISO 19005-1:2005), based on PDF 1.4 (no ISO)
- PDF/A-2 (ISO 19005-2:2011), based on PDF 1.7 (ISO 32000-1:2008(opens in a new tab))
- PDF/A-3 (ISO 19005-3:2012), also based on PDF 1.7 (ISO 32000-1:2008), with minor changes from PDF/A-2
- PDF/A-4 (ISO 19005-4:2020), based on PDF 2.0 (ISO 32000-2:2020(opens in a new tab))
PDF 2.0 was first published by ISO in 2017 as 32000-2:2017. It was then updated in 2020 to 32000-2:2020.
PDF/A also supports several conformance levels; the available conformances vary with the selected version of PDF/A.
Here are the available conformances by PDF/A version:
- PDF/A-1a and PDF/A-1b
- PDF/A-2a, PDF/A-2b, and PDF/A-2u
- PDF/A-3a, PDF/A-3b, and PDF/A-3u
- PDF/A-4, PDF/A-4e, and PDF/A-4f
PDF/A version comparison
| Feature | PDF/A-1 (2005) | PDF/A-2 (2011) | PDF/A-3 (2012) | PDF/A-4 (2020) |
|---|---|---|---|---|
| Base PDF spec | PDF 1.4 | PDF 1.7 (ISO 32000-1) | PDF 1.7 (ISO 32000-1) | PDF 2.0 (ISO 32000-2) |
| Conformance levels | a, b | a, b, u | a, b, u | (base), e, f |
| JPEG2000 compression | No | Yes | Yes | Yes |
| Transparency support | No | Yes | Yes | Yes |
| Embed PDF/A files | No | Yes | Yes | Yes |
| Embed non-PDF/A files | No | No | Yes | Yes (level f) |
| 3D content (PRC/U3D) | No | No | No | Yes (level e) |
| Unicode required | Level a only | Levels a, u | Levels a, u | All levels |
| Tagged structure required | Level a only | Level a only | Level a only | Optional |
| Digital signatures (PAdES) | No | Yes | Yes | Yes (enhanced) |
| Recommended for | Legacy compliance | General archiving | Archiving with attachments | New projects, accessibility |
What is the “right” PDF/A version and conformance?
Choosing the right PDF/A version and conformance during conversion is a common challenge, and the distinctions between types can be subtle. The following sections clarify the differences.
Choosing a PDF/A version and conformance
It’s tempting to treat PDF/A versions as straightforward replacements, like newer software releases. Newer specifications do bring clearer archival workflows and improvements from the underlying PDF versions.
However, the version and conformance you choose affects converted file sizes, the types of information preserved, and how easily files pass validation. Different PDF/A variants serve different purposes. Pick the version that best fits your requirements.
Which version should you use?
PDF/A-2b is the most compatible variant and the widely recommended default, especially for scanned documents.
In contrast, PDF/A-1 is considered out of date; the only reason some use it is because of a policy or requirement by the receiving party.
If your archival use case is more complex, consider PDF/A-4, which offers enhanced capabilities for complexities such as preservation of logical structure, PDF’s trademark dynamic, interactive forms, 3D model annotations, and digital signatures.
All PDF/A standards are backward compatible, so even PDF/A-1 files will open correctly in any ISO-compliant PDF reader.
PDF/A-4 — The latest version
PDF/A-4 (ISO 19005-4:2020(opens in a new tab)) reflects more than a decade of industry experience. It focuses on making PDF/A more cost-effective and flexible.
Beyond archiving-specific clarifications, PDF/A-4 inherits PDF 2.0 improvements: richer content support, better machine-readable text, dynamic user-fillable forms, and stronger logical structure.
These features lower the cost of producing accessible documents that can be repurposed for purposes such as:
- Text extraction and AI
- Conversion to responsive HTML for small devices
- Use within assistive technologies like screen readers and text-to-speech systems
PDF/A-4 also inherits PDF 2.0 support for PDF Advanced Electronic Signatures (PAdES), a digital signature type verifiable over long periods of time, along with several other improvements.
How to convert documents to PDF/A
Nutrient offers a configurable PDF/A library that integrates into any web or server app. It supports conversion and validation to any PDF/A variant, with individual files converting and validating in seconds during server processing.
You can also convert PDFs, Office documents, and images to PDF/A client-side. This offloads processing from the server, reduces load during peak traffic, and limits data transmission over the network.
Selecting your PDF/A conversion strategy
Not all PDF instructions are universally compatible with all conformances, but our PDF/A converter can be configured as needed to any of the following strategies in support of your archival needs and policy goals.
- Page content stream conversion — Converts instructions within a PDF’s content stream to a compatible format. By preserving the original structure of the content, it ensures accurate rendering and fidelity during conversion. This is the preferred strategy, and when converting to the default (recommended) PDF/A-2b conformance, the success ratio is around 97 percent.
- Page content vectorization — Converts page content into scalable vector graphics, enabling high-quality scaling and smooth rendering across different devices and resolutions. Transforming instructions into a mathematical representation allows for precisely reproduced shapes, lines, and text.
- Page content rasterization — Converts page content into a grid of pixels, resulting in a raster image representation. This method may sacrifice some scalability, but it offers broader compatibility and ensures consistent visual appearance, regardless of the rendering environment. This approach is required chiefly as a fallback when converting modern documents to older PDF/A conformance formats like PDF/A-1b, which have limited page content composition features.
Dynamically selecting the most appropriate conversion strategy increases the success rate and reduces the manual effort needed to get files through validation.
The library also includes additional features for handling difficult conversions:
- Built-in repair — The engine repairs malformed or corrupted documents automatically during conversion, recovering data from problematic files without manual intervention.
- Built-in optimization — Font subsetting, resource deduplication, and PDF size reduction produce compact output documents during the conversion process.
How to convert and validate PDF/A at scale
If you operate high‑volume pipelines (millions → 100M PDFs/month), prioritize an accuracy‑first flow with independent validation. A practical blueprint:
- Preflight and normalize
- Detect encryption, broken cross‑references, missing fonts/ICC profiles, and malformed structure; repair when safe.
- Choose the target conformance upfront.
- Convert deterministically
- Embed fonts and ICC profiles; avoid DeviceRGB by default.
- Handle transparency/patterns per conformance; fall back to vectorization or rasterization only when required by policy.
- Validate independently
- Run veraPDF(opens in a new tab) for the exact target profile and persist the full report (include validator version).
- Post‑process and attest
- Linearize and losslessly optimize; stamp provenance metadata (tool and validator versions, strategy used, source hash).
- Route outcomes
- PASS → deliver; recoverable FAIL → retry with alternate strategy; critical FAIL → quarantine with reason codes.
- Observe and scale
- Export metrics and traces (success rate, common failure checks, latency, CPU/memory) and autoscale workers.
For guidance on selecting the right PDF/A version and conformance for your use case, refer to the section above on choosing a PDF/A version and conformance. In brief: Default to PDF/A‑2b for most workflows, use PDF/A‑2u when text extraction matters, use PDF/A‑3b/3u for attachments, use PDF/A‑4 for PDF 2.0 features, and use PDF/A‑1b only when mandated by policy.
Implementation resources: See the Document Engine PDF/A overview, convert to PDF/A, and validate PDF/A guides. For product packaging, see the PDF/A conversion SDK page.
Implementation options (server-side)
- .NET library (in-process): Use the Nutrient .NET PDF SDK for high‑throughput, in‑process conversion and validation inside your own services or functions. See the Nutrient .NET conversion guides and the .NET PDF-to-PDF/A guide.
- Java library (JVM stacks): Build server‑side batch processors using the Nutrient Java SDK and its document processing APIs. Start with the Java guides.
- Document Engine (containers): Run conversion plus validation as containerized services with built‑in metrics and operational guidance. See Document Engine PDF/A.
- DWS Processor API (hosted): Convert PDFs, Office files, and images to PDF/A via a REST API without managing infrastructure. See the PDF-to-PDF/A API guide.
- Document Converter (SharePoint/Power Automate): Automate PDF/A conversion within Microsoft ecosystems. See the SharePoint PDF-to-PDF/A guide or Power Automate PDF-to-PDF/A guide.
Quick example — .NET convert to PDF/A
using GdPicture14;
// Convert a PDF to PDF/A‑2b with vectorization and rasterization fallbacks.using GdPicturePDF pdf = new GdPicturePDF();pdf.LoadFromFile(@"C:\\in\\source.pdf");pdf.ConvertToPDFA( @"C:\\out\\archived.pdf", PdfConversionConformance.PDF_A_2b, true, // Allow vectorization when needed. true // Allow rasterization as last resort.);pdf.CloseDocument();Building on the JVM? Explore the Nutrient Java SDK and batch guides: Java guides.
Productivity and automation
PDF/A’s standardized format supports automated document workflows in industries like finance, healthcare, and government. Its consistency makes it suitable for automated processing and validation.
Software tools can check PDF/A files for compliance with regulatory requirements such as embedded metadata or digital signatures. Many tools also automate conversion from formats like Word or Excel into PDF/A, reducing manual effort.
Accessibility and collaboration
PDF/A supports assistive technologies, making documents usable by individuals with disabilities. Features like text-to-speech and screen reader compatibility can be built into PDF/A files.
Although PDF/A focuses on long-term preservation, it also supports secure sharing. When combined with document collaboration software, PDF/A documents can be shared via email or online platforms while maintaining their integrity across systems. PDF/A-aware tools enable collaborative workflows like document review and approval, with digital signatures to verify authenticity.
Next steps
To test PDF/A conversion, visit the downloads page for trial information and packages in your preferred language. Then see the PDF/A documentation for demos and samples.
Nutrient’s PDF/A conversion library supports integration across a range of environments. Key features:
- ISO-compliant and veraPDF-tested engine
- Highly configurable in support of your archival strategy and policies
- Built-in repair engine to recover information from malformed records
- Advanced compression for highly efficient storage without quality loss
- All PDF/A versions and conformances supported, including PDF/A-4, PDF/A-4e, and PDF/A-4f
- Fast and smooth bulk processing, including client-side conversion
For production use, speak with our Sales team. If you have questions, contact the engineers who built the product.
FAQ
This section covers some PDF/A alternatives for documents and, in contrast, the benefits of PDF/A.
PDF/A vs. Word
The Word format, while common, has notable drawbacks for archiving purposes compared to PDF/A:
- As a proprietary format, it lacks an ISO standard designated for archiving, and as a result, it doesn’t have the guarantees of international standardization in terms of interoperability and universal viewing.
- Word has limits for ensuring color consistency and preserving a fixed representation of documents with complex graphics and formatting.
- PDF has better compression than Word, leading to smaller files.
- Word’s font handling is less reliable for long-term preservation.
PDF/A vs. TIFF
Before PDF/A, archivists used fully rendered image formats such as JPEG and TIFF, which were invented when most archived documents still came in paper form.
TIFF documents might still exist in your archives, and specific business programs could continue generating TIFF imagery. Even files that look like PDFs may only wrap images within PDF coverings.
But TIFF has its known drawbacks for business infrastructures that need consistent document searchability, universal viewing, and efficient storage.
Documents rasterized into TIFF create a bitmap, which is a file storing image data for every pixel. Thus, as a sizable static snapshot of the original, TIFF doesn’t natively support text searchability, and it consumes more disk space.
PDF/A vs. HTML
HTML isn’t inherently bad for archiving, but it has certain limits for long-term preservation compared to PDF/A. HTML depends on live web resources and is subject to changes in web technologies, which can break links or render certain features unusable over time.
Archiving malformed PDFs leads to many preservation risks encompassing a range of potential pitfalls, with lasting consequences for the integrity and usability of your archived documents.
Here’s a look at some:
- Live PDF text becoming static or unreadable — Insufficient character encoding not only hinders text legibility, but it also impedes reliable machine processing, meaning text is also no longer accurately searchable or extractable.
- Issues with images, scanned documents, and drawings — Proprietary formats and obsolete compressions, like LZW, can result in content failing to display correctly or at all, because a future viewer won’t know how to correctly interpret the compressed data.
- Inaccurate color representation — The absence of any color management leads to incorrect renderings on different devices. For example, branded imagery will change shades when displayed on different screens or when printed.
- Lost document histories and keywords — Future users will face frustration if metadata is non-standard or missing, causing a loss of machine-readable keywords used in discovery and retrieval, along with loss of document origin details.
- Expiration of digital signatures — Digital signatures break or become invalidated when certificates expire or trusted authorities close shop, making long-term validation (LTV) a challenge.
- External content references that return a 404 error — A PDF, not unlike websites, may also embed a reference to an external resource such as a URL, leading to rendering failure as soon as a link breaks.
Earlier conformances added extra requirements, like Unicode (U) or logical structure (A).
In contrast, level B conformance stands for “basic” and is thus considered the easiest to work with, as it doesn’t require Unicode or logical structure, whereas U requires Unicode, and A requires both Unicode and logical structure. Many documents lack these features, and in the case of logical structure, it’s a time-consuming and manual process to add via PDF tagging.
The two new PDF/A-4 conformances (E and F) add provisions to embed non-PDF/A files where you’re typically restricted to embedding PDF/A files only.
E = engineering — This lets you embed 3D data in PDFs using PRC or U3D formats as annotations.
F = file attachment — This lets you embed files of any other format in your PDFs.
All PDF/A-4 conformances now require Unicode for all text, like under A and U in earlier versions. PDF/A-4 also streamlines conformance by eliminating the A, B, and U.
Run an independent validator such as veraPDF(opens in a new tab) against the exact target profile (e.g. PDF/A‑2b). Persist the full validation report with the final file, including validator version and conformance. For server pipelines, see validate PDF/A.
Yes. Use a two‑stage approach: Convert deterministically, and then validate independently at scale. Employ strategy fallbacks (convert in place → vectorize → rasterize), enforce timeouts and per‑job resource limits, and autoscale workers. Reference implementations are available in the Document Engine PDF/A guides.
Use a PDF/A conversion tool that supports your target conformance level. Nutrient offers conversion through multiple channels: the .NET SDK for in-process conversion, Document Engine for containerized server workflows, and DWS Processor API for hosted REST API conversion. Each option supports all PDF/A versions (1 through 4) and conformance levels.
No. PDF/A is a subset of PDF with stricter requirements for long-term archiving. Regular PDFs allow JavaScript, encryption, external references, and multimedia content — all of which are forbidden in PDF/A. PDF/A also requires font embedding, device-independent color, and XMP metadata. These restrictions ensure the document remains readable regardless of future software or hardware changes.
For most use cases, PDF/A-2b is the recommended default. It supports JPEG2000 compression, transparency, and embedding other PDF/A files, while remaining broadly compatible. Use PDF/A-2u if you need reliable text extraction, PDF/A-3b if you need to embed non-PDF attachments, and PDF/A-4 for new projects that benefit from PDF 2.0 features like enhanced PAdES digital signatures, 3D content, or stronger logical structure. Use PDF/A-1b only when required by a specific policy.
Yes. PDF/A files are valid PDF files and open in any standard PDF reader, including web browsers. The difference is internal: PDF/A files have passed verification against the ISO 19005 standard, confirming they are self-contained and suitable for long-term preservation.