Working with PDFs in ASP.NET
ASP.NET has a long history, and it has changed its makeup drastically over the years. Today, with the latest iteration of ASP.NET — ASP.NET Core — we have a highly flexible framework for creating web applications, RESTful APIs, microservices, and more. And to top it off, it’s now cross-platform compatible. That’s right Linux, you’ve got some official .NET love.
In this post, I want to introduce you to how we can work with PDFs in ASP.NET using the PSPDFKit .NET Library. The post will be focusing on a simple PDF form reading and writing mechanism in a web application, but there are many other features we could expose — for example, document editing, redaction, annotation manipulation, and more.
Introducing the Use Case
In this use case, there are two goals we want to achieve.
First, the user should be able to upload a PDF and the application will extract all the form fields from the PDF. It needs to be able to give the user a list of names and values set for these form fields. Second, the user should be able to both upload a PDF and apply form field values to the given form field names. The resulting document will be saved and downloaded to the user’s file system.
These will show us how we can open a document sent from the user and perform analysis and manipulation on the document. It will also introduce the saving mechanism and the options we have when saving.
The following blog post will be completed in C#, but note that .NET is compatible with a few different languages.
For the remainder of the post, I’ll walk through some of what is required to set up a project and what source code to add. Alternatively, the following project setup and code is available to clone from our PSPDFKit-Labs repository. Doing so will allow you to jump straight to the Running the Application step.
Setting Up the Project
Lucky for us, ASP.NET is supported by the cross-platform dotnet
command line application, which means it doesn’t matter which operating system we are working on. Win number one for ASP.NET.
You’ll find all the ways to download dotnet
on the Microsoft .NET website.
In our use case, we’re going to create a web application. To do so, we use the following dotnet
command in the terminal:
dotnet new WebApp -o MyWebApp --no-https
Next, we’ll need to add the PSPDFKit .NET Library as a NuGet dependency:
dotnet add package PSPDFKit.NET
Now we’re ready to jump into the code.
Initializing the PSPDFKit .NET Library
In order to initialize the PSPDFKit .NET Library, you’ll have to call the InitializeTrial
method prior to calling any other PSPDFKit method:
PSPDFKit.Sdk.InitializeTrial();
Or if you have purchased a license key:
PSPDFKit.Sdk.Initialize("YOUR_LICENSE_KEY_GOES_HERE");
Replace YOUR_LICENSE_KEY_GOES_HERE
with your license key.
I placed the above call in the ConfigureServices
method of the Startup.cs
file, but all that’s important is that it’s called once per instance and that it’s called before any other PSPDFKit API is called.
Calling the PSPDFKit API
From here on out, we are going to be working on a new webpage. The new page is aptly named Read
, as we are going to read the form field values. The page will have a Read.cshtml
template which describes the HTML that will be passed to the browser, and a Read.cshtml.cs
model C# file to handle the actions involved with the page.
As we’ve already set up the PSPDFKit .NET Library, we can now call the API to start operating on the PDF.
In the example I’ve laid out, I set up a small web form that can take in a file with a .pdf
extension and save it to a temporary location. The file upload step is not important to this post, although if you’d like to see the code for it, please refer to the repository accompanying the blog post.
Opening a PDF and Saving the Form Field Values
The following code will show all that is required to add to a Razor page to open a PDF and display the form fields and values found:
// Used to show the form field values on the page. [BindProperty(SupportsGet = true)] public IList<FormFieldValue> FormFieldValues { get; } = new List<FormFieldValue>(); // Used to pass the form field values as URL parameters. [BindProperty(SupportsGet = true)] public string FormFieldsJson { get; set; } = null; … // Open the PDF and retrieve the form field values. var document = new PSPDFKit.Document(new FileDataProvider(filePath)); var fieldValuesJson = document.GetFormProvider().GetFormFieldValuesJson(); // Refresh the page with the form field data shown. return RedirectToPage(new {FormFieldsJson = fieldValuesJson.ToString()});
Above we see the document that has been uploaded (filePath
) opened and queried for the form values. As the name of the method (GetFormFieldValuesJson
) suggests, the return value is a JObject
, which is very useful because we can pass it directly to the next step.
By calling RedirectToPage
without a new URL, we are asking the browser to load the same page again, but in the command, we are also setting FormFieldsJson
, which passes the JSON data as a parameter of the URL.
Parsing the Newly Loaded JSON Data
In the previous step, we saw the form field values passed as parameter values in a JSON format. On the reload of the page, we can extract these JSON values and display them on the page with the following:
public void OnGet() { if (FormFieldsJson == null) return; var formFieldsJson = JObject.Parse(FormFieldsJson); foreach (var (key, value) in formFieldsJson) { FormFieldValues.Add(new FormFieldValue {Name = key, Value = value.ToString()}); } }
The OnGet
method, found in Read.cshtml.cs
, will be called for every load of the page. When localhost:5000/Read
is called, FormFieldsJson
is null
. The data bound to the FormFieldsJson
variable will be whatever is passed to the URL parameters. Therefore, the localhost:5000/Read?FormFieldsJson=my_fields_are_here
request binds my_fields_are_here
to the FormFieldsJson
variable.
Another way of binding variables to the URL call was seen in the previous code block. By calling RedirectToPage
with route values of new {FormFieldsJson = fieldValuesJson.ToString()}
, the contents of fieldValuesJson.ToString()
will be assigned to FormFieldsJson
upon the next load.
Now the OnGet
method will pass the null
test and proceed to parse the JSON into our FormFieldValues
objects.
Displaying the Form Field Values on the Loaded Page
We have populated FormFieldValues
, so all we need now is to generate the HTML to represent the form fields in Read.cshtml
:
<table class="table"> <thead> <tr> <th> @Html.DisplayNameFor(model => model.FormFieldValues[0].Name) </th> <th> @Html.DisplayNameFor(model => model.FormFieldValues[0].Value) </th> </thead> <tbody> @foreach (var item in Model.FormFieldValues) { <tr> <td> @Html.DisplayTextFor(modelItem => item.Name) </td> <td> @Html.DisplayTextFor(modelItem => item.Value) </td> </tr> } </tbody> </table>
The slightly funky syntax, called Razor syntax, allows us to dynamically generate our HTML based on data bound in the Read.cshtml.cs
model. You can see the “magic bind” annotation from the first code block where you saw FormFieldValues
annotated with the BindProperty
:
[BindProperty(SupportsGet = true)] public IList<FormFieldValue> FormFieldValues { get; } = new List<FormFieldValue>();
We take the values populated in FormFieldValues
and create a table with headings generated from the class member variable names. Then we iterate over the List
to dynamically add the number of fields found in the document.
To find out more about Razor syntax and how to dynamically generate HTML, please follow the guides in the Microsoft documentation.
Running the Application
The instructions above have outlined areas of interest relating to PDF operations. If you’d like to see and run the full source, please clone the complete example from our PSPDFKit-Labs repository.
From within the repository directory that you cloned, you can run the following command:
dotnet run
The command will launch the web server locally and open your default browser with a path to localhost:5000
with the web application up and running! If for some reason it doesn’t, just head to localhost:5000
in a browser on your machine.
Try to upload a document with a form field.
Conclusion
This post set out to introduce you to a simple ASP.NET web application and show how we can operate on PDFs with the PSPDFKit .NET Library. If you explore the ASP.NET application repository, you’ll find another page with an example of how to fill out form fields.
We could take the application further and implement more features like document redaction (where we can irrecoverably remove data from a document), or document editing (where we can add, remove, and move pages, as well as merge one or more documents together).
To find out more about form filling with the PSPDFKit .NET Library or the many other features supported by the library, please download the free trial.
When Nick started tinkering with guitar effects pedals, he didn’t realize it’d take him all the way to a career in software. He has worked on products that communicate with space, blast Metallica to packed stadiums, and enable millions to use documents through Nutrient, but in his personal life, he enjoys the simplicity of running in the mountains.