Extract File Attachments from PDFs in C#

This guide explains how to extract files from PDF documents.

PDF documents can contain files in the following ways:

  • A file is embedded in the PDF document.

  • A file is added to the PDF document as a file attachment annotation.

The method for extracting the file is different in each case.

Extracting Files Embedded in a PDF

To extract files embedded in a PDF, follow these steps:

  1. Create a GdPicturePDF object.

  2. Select the source document by passing its path to the LoadFromFile method.

  3. Determine the number of embedded files with the GetEmbeddedFileCount method and loop through them.

  4. Determine the file name by passing the index of the file to the GetEmbeddedFileName method.

  5. Create an empty byte array where you’ll save the file data.

  6. Extract the file by passing the index of the file and the empty byte object to the ExtractEmbeddedFile method.

  7. Write the file using the standard System.IO.Stream class.

The example below extracts all embedded files from a PDF document:

using GdPicturePDF gdpicturePDF = new GdPicturePDF();
// Select the source document.
gdpicturePDF.LoadFromFile(@"C:\temp\source.pdf");
// Determine the number of embedded files and loop through them.
int embeddedFileCount = gdpicturePDF.GetEmbeddedFileCount();
for (int fileIndex = 0; fileIndex < embeddedFileCount; fileIndex++)
{
    // Determine the file name.
    string fileName = gdpicturePDF.GetEmbeddedFileName(fileIndex);
    // Create an empty byte array.
    byte[] fileData = null;
    // Extract the file.
    gdpicturePDF.ExtractEmbeddedFile(fileIndex, ref fileData);
    // Write the file.
    using System.IO.Stream file = File.OpenWrite(@"C:\temp\" + fileName);
    file.Write(fileData, 0, fileData.Length);
}
Using gdpicturePDF As GdPicturePDF = New GdPicturePDF()
    ' Select the source document.
    gdpicturePDF.LoadFromFile("C:\temp\source.pdf")
    ' Determine the number of embedded files and loop through them.
    Dim embeddedFileCount As Integer = gdpicturePDF.GetEmbeddedFileCount()

    For fileIndex = 0 To embeddedFileCount - 1
        ' Determine the file name.
        Dim fileName As String = gdpicturePDF.GetEmbeddedFileName(fileIndex)
        ' Create an empty byte array.
        Dim fileData As Byte() = Nothing
        ' Extract the file.
        gdpicturePDF.ExtractEmbeddedFile(fileIndex, fileData)
        ' Write the file.
        Dim file As Stream = File.OpenWrite("C:\temp\" & fileName)
        file.Write(fileData, 0, fileData.Length)
    Next
End Using
Used Methods

Related Topics

Extracting Files from File Attachment Annotations

To extract files from file attachment annotations, follow these steps:

  1. Create a GdPicturePDF object.

  2. Select the source document by passing its path to the LoadFromFile method.

  3. Determine the number of pages with the GetPageCount method and loop through them.

  4. Determine the number of annotations on the page with the GetAnnotationCount method and loop through them.

  5. Determine the annotation subtype passing the index of the annotation to the GetAnnotationSubType method.

  6. If the annotation is a file attachment annotation, determine the file name by passing the index of the annotation to the GetFileAttachmentAnnotFileName method.

  7. Create an empty byte array where you’ll save the file data.

  8. Extract the file by passing the index of the annotation and the empty byte object to the GetFileAttachmentAnnotEmbeddedFile method.

  9. Write the file using the standard System.IO.Stream class.

The example below extracts all files added to a PDF document as file attachment annotations:

using GdPicturePDF gdpicturePDF = new GdPicturePDF();
// Select the source document.
gdpicturePDF.LoadFromFile(@"C:\temp\source.pdf");
// Determine the number of pages and loop through them.
int pageCount = gdpicturePDF.GetPageCount();
for (int page = 1; page <= pageCount; page++)
{
   gdpicturePDF.SelectPage(page);
   // Determine the number of annotations on the page and loop through them.
   int annotationCount = gdpicturePDF.GetAnnotationCount();
   for (int annotationIndex = 0; annotationIndex < annotationCount; annotationIndex++)
   {
        // Determine the annotation subtype.
        string annotationSubtype = gdpicturePDF.GetAnnotationSubType(annotationIndex);
        if (annotationSubtype.Equals("FileAttachment"))
        {
            // Determine the file name.
            string fileName = gdpicturePDF.GetFileAttachmentAnnotFileName(annotationIndex);
            // Create an empty byte array.
            byte[] fileData = null;
            // Extract the file.
            gdpicturePDF.GetFileAttachmentAnnotEmbeddedFile(annotationIndex, ref fileData);
            // Write the file.
            using System.IO.Stream file = File.OpenWrite(@"C:\temp\" + fileName);
            file.Write(fileData, 0, fileData.Length);
        }
   }
}
Using gdpicturePDF As GdPicturePDF = New GdPicturePDF()
    ' Select the source document.
    gdpicturePDF.LoadFromFile("C:\temp\source.pdf")
    ' Determine the number of pages and loop through them.
    Dim pageCount As Integer = gdpicturePDF.GetPageCount()
    For page = 1 To pageCount
        gdpicturePDF.SelectPage(page)
        ' Determine the number of annotations on the page and loop through them.
        Dim annotationCount As Integer = gdpicturePDF.GetAnnotationCount()
        For annotationIndex = 0 To annotationCount - 1
            ' Determine the annotation subtype.
            Dim annotationSubtype As String = gdpicturePDF.GetAnnotationSubType(annotationIndex)
            If annotationSubtype.Equals("FileAttachment") Then
                ' Determine the file name.
                Dim fileName As String = gdpicturePDF.GetFileAttachmentAnnotFileName(annotationIndex)
                ' Create an empty byte array.
                Dim fileData As Byte() = Nothing
                ' Extract the file.
                gdpicturePDF.GetFileAttachmentAnnotEmbeddedFile(annotationIndex, fileData)
                ' Write the file.
                Dim file As Stream = File.OpenWrite("C:\temp\" & fileName)
                file.Write(fileData, 0, fileData.Length)
            End If
        Next
    Next
End Using
Used Methods

Related Topics