PDF UPLOAD METADATA EXTRACTOR (sample SharePoint 2013 & 2010 project) on Codeplex

When you upload MS Office documents to SharePoint document libraries their document titles are used in SharePoint to set the default Title column of list item of the uploaded document.

This does not work for PDF files, but it’s easy to reproduce the functionality.

I have created a simple VS2012 SharePoint project. It’s based on the C# (“iTextSharp”) version of the community version of iTextPDF (http://itextpdf.com) that can be downloaded here: http://sourceforge.net/projects/itextsharp/files/itextsharp/

You can download source code and solution packages (“binaries”) from Codeplex:

http://sppdfmetadataextract.codeplex.com/

The project is published under LGPL license because iTextSharp v4.1.6 requires that. – The latest version of iTextSharp (5.3.4) is published under AGPL. Codeplex does not provide AGPL licencing. So I had to use the last version of iTextSharp published under LGPL.

 

Description:

1. On (Web-) feature activation an feature event receiver iterates through each document library in the web that is not hidden.

2. For each of them the feature event receiver registers a list item event receiver that fires on “ItemAdded” events.

3. Furthermore an list item receiver is installed for the web to fire on “ListAdded” events to register the list item event receiver mentioned before on newly created lists.

4. During upload of files to document libraries the list item event receiver look for files ending with “.pdf” (case insensitive).

5. If there is such an file it opens the file using iTextSharp library and reads its “Title” information.

6. This information is set for the default “Title” column of the SharePoint list item.

7. The change is commited by “SystemUpdate” on the SPListItem object.

8. If an error occures inside the event handler there is no action. The user will never see an error in the module. If it is not possible to extract the title of the PDF document the module will not set the title column of the list item.

 

Usage:

To use the feature just deploy the SharePoint Solution Package (WSP-file) to your SharePoint farm. It’s not a “sandboxed solution”! After that you need to activate the feature in each web where you need it. If you need to activate it on each new web you could use “feature stapling” to activate it by default. If you need this please write me an comment.

Demo in SharePoint 2010:

1. Create a Word document with a title and save it as PDF:

SNAGHTMLb6b29ba

 

SNAGHTMLb6bd515

2. Check the document title by using Adobe Reader or Adobe Acrobat or any other PDF reader

SNAGHTMLb6da159

3. First try to upload the DOCX and it’s PDF into a document library without the new feature activated on the web:

image

As you can see: The “Title” of the DOCX is used for the Title column of the SharePoint list item. For the PDF file the Title column is empty.

4. Now activate the feature:

image

5. After that delete the files uploaded before in the document library. Than upload both files again:

image

Now both “Title” columns are set!

6. My last test is to create a new Asset libary in the web. Than I upload both files and check the PDF’s properties:

image

The Title column is set as expected!!

Demo in SharePoint 2013:

I’ve added a second project just for SP2013. Here is a single screenshot…

image

3 thoughts on “PDF UPLOAD METADATA EXTRACTOR (sample SharePoint 2013 & 2010 project) on Codeplex”

  1. As a non-programmer the details of this are too technical for me, but will it get around the same problem in SharePoint Online, which also doesn’t seem to show the titles of PDFs I upload to it?

  2. This worked flawlessly for me (Thx!).
    Now I need to expand this capability to extract several embedded properties/attributes from our PDF’s and populate several columns in the destination document library.
    Is this project extensible like that?
    Has anyone done this before?

Leave a Reply

Your email address will not be published. Required fields are marked *