Extract File Attachments from PDFs in C#
This guide explains how to extract files from PDF documents.
PDF documents can contain files in the following ways:
-
A file is embedded in the PDF document.
-
A file is added to the PDF document as a file attachment annotation.
The method for extracting the file is different in each case.
Extracting Files Embedded in a PDF
To extract files embedded in a PDF, follow these steps:
-
Create a
GdPicturePDF
object. -
Select the source document by passing its path to the
LoadFromFile
method. -
Determine the number of embedded files with the
GetEmbeddedFileCount
method and loop through them. -
Determine the file name by passing the index of the file to the
GetEmbeddedFileName
method. -
Create an empty byte array where you’ll save the file data.
-
Extract the file by passing the index of the file and the empty byte object to the
ExtractEmbeddedFile
method. -
Write the file using the standard
System.IO.Stream
class.
The example below extracts all embedded files from a PDF document:
using GdPicturePDF gdpicturePDF = new GdPicturePDF(); // Select the source document. gdpicturePDF.LoadFromFile(@"C:\temp\source.pdf"); // Determine the number of embedded files and loop through them. int embeddedFileCount = gdpicturePDF.GetEmbeddedFileCount(); for (int fileIndex = 0; fileIndex < embeddedFileCount; fileIndex++) { // Determine the file name. string fileName = gdpicturePDF.GetEmbeddedFileName(fileIndex); // Create an empty byte array. byte[] fileData = null; // Extract the file. gdpicturePDF.ExtractEmbeddedFile(fileIndex, ref fileData); // Write the file. using System.IO.Stream file = File.OpenWrite(@"C:\temp\" + fileName); file.Write(fileData, 0, fileData.Length); }
Using gdpicturePDF As GdPicturePDF = New GdPicturePDF() ' Select the source document. gdpicturePDF.LoadFromFile("C:\temp\source.pdf") ' Determine the number of embedded files and loop through them. Dim embeddedFileCount As Integer = gdpicturePDF.GetEmbeddedFileCount() For fileIndex = 0 To embeddedFileCount - 1 ' Determine the file name. Dim fileName As String = gdpicturePDF.GetEmbeddedFileName(fileIndex) ' Create an empty byte array. Dim fileData As Byte() = Nothing ' Extract the file. gdpicturePDF.ExtractEmbeddedFile(fileIndex, fileData) ' Write the file. Dim file As Stream = File.OpenWrite("C:\temp\" & fileName) file.Write(fileData, 0, fileData.Length) Next End Using
Related Topics
Extracting Files from File Attachment Annotations
To extract files from file attachment annotations, follow these steps:
-
Create a
GdPicturePDF
object. -
Select the source document by passing its path to the
LoadFromFile
method. -
Determine the number of pages with the
GetPageCount
method and loop through them. -
Determine the number of annotations on the page with the
GetAnnotationCount
method and loop through them. -
Determine the annotation subtype passing the index of the annotation to the
GetAnnotationSubType
method. -
If the annotation is a file attachment annotation, determine the file name by passing the index of the annotation to the
GetFileAttachmentAnnotFileName
method. -
Create an empty byte array where you’ll save the file data.
-
Extract the file by passing the index of the annotation and the empty byte object to the
GetFileAttachmentAnnotEmbeddedFile
method. -
Write the file using the standard
System.IO.Stream
class.
The example below extracts all files added to a PDF document as file attachment annotations:
using GdPicturePDF gdpicturePDF = new GdPicturePDF(); // Select the source document. gdpicturePDF.LoadFromFile(@"C:\temp\source.pdf"); // Determine the number of pages and loop through them. int pageCount = gdpicturePDF.GetPageCount(); for (int page = 1; page <= pageCount; page++) { gdpicturePDF.SelectPage(page); // Determine the number of annotations on the page and loop through them. int annotationCount = gdpicturePDF.GetAnnotationCount(); for (int annotationIndex = 0; annotationIndex < annotationCount; annotationIndex++) { // Determine the annotation subtype. string annotationSubtype = gdpicturePDF.GetAnnotationSubType(annotationIndex); if (annotationSubtype.Equals("FileAttachment")) { // Determine the file name. string fileName = gdpicturePDF.GetFileAttachmentAnnotFileName(annotationIndex); // Create an empty byte array. byte[] fileData = null; // Extract the file. gdpicturePDF.GetFileAttachmentAnnotEmbeddedFile(annotationIndex, ref fileData); // Write the file. using System.IO.Stream file = File.OpenWrite(@"C:\temp\" + fileName); file.Write(fileData, 0, fileData.Length); } } }
Using gdpicturePDF As GdPicturePDF = New GdPicturePDF() ' Select the source document. gdpicturePDF.LoadFromFile("C:\temp\source.pdf") ' Determine the number of pages and loop through them. Dim pageCount As Integer = gdpicturePDF.GetPageCount() For page = 1 To pageCount gdpicturePDF.SelectPage(page) ' Determine the number of annotations on the page and loop through them. Dim annotationCount As Integer = gdpicturePDF.GetAnnotationCount() For annotationIndex = 0 To annotationCount - 1 ' Determine the annotation subtype. Dim annotationSubtype As String = gdpicturePDF.GetAnnotationSubType(annotationIndex) If annotationSubtype.Equals("FileAttachment") Then ' Determine the file name. Dim fileName As String = gdpicturePDF.GetFileAttachmentAnnotFileName(annotationIndex) ' Create an empty byte array. Dim fileData As Byte() = Nothing ' Extract the file. gdpicturePDF.GetFileAttachmentAnnotEmbeddedFile(annotationIndex, fileData) ' Write the file. Dim file As Stream = File.OpenWrite("C:\temp\" & fileName) file.Write(fileData, 0, fileData.Length) End If Next Next End Using