PDF UPLOAD METADATA EXTRACTOR (sample SharePoint 2013 & 2010 project) on Codeplex

When you upload MS Office documents to SharePoint document libraries their document titles are used in SharePoint to set the default Title column of list item of the uploaded document.

This does not work for PDF files, but it’s easy to reproduce the functionality.

I have created a simple VS2012 SharePoint project. It’s based on the C# (“iTextSharp”) version of the community version of iTextPDF (http://itextpdf.com) that can be downloaded here: http://sourceforge.net/projects/itextsharp/files/itextsharp/.

You can download source code and solution packages (“binaries”) from Codeplex:

http://sppdfmetadataextract.codeplex.com/

The project is published under LGPL license because iTextSharp v4.1.6 requires that. – The latest version of iTextSharp (5.3.4) is published under AGPL. Codeplex does not provide AGPL licencing. So I had to use the last version of iTextSharp published under LGPL.

Description:

1. On (Web-) feature activation an feature event receiver iterates through each document library in the web that is not hidden.

2. For each of them the feature event receiver registers a list item event receiver that fires on “ItemAdded” events.

3. Furthermore an list item receiver is installed for the web to fire on “ListAdded” events to register the list item event receiver mentioned before on newly created lists.

4. During upload of files to document libraries the list item event receiver look for files ending with “.pdf” (case insensitive).

5. If there is such an file it opens the file using iTextSharp library and reads its “Title” information.

6. This information is set for the default “Title” column of the SharePoint list item.

7. The change is commited by “SystemUpdate” on the SPListItem object.

8. If an error occures inside the event handler there is no action. The user will never see an error in the module. If it is not possible to extract the title of the PDF document the module will not set the title column of the list item.

Usage:

To use the feature just deploy the SharePoint Solution Package (WSP-file) to your SharePoint farm. It’s not a “sandboxed solution”! After that you need to activate the feature in each web where you need it. If you need to activate it on each new web you could use “feature stapling” to activate it by default. If you need this please write me an comment.

Demo in SharePoint 2010:

1. Create a Word document with a title and save it as PDF:

2. Check the document title by using Adobe Reader or Adobe Acrobat or any other PDF reader

3. First try to upload the DOCX and it’s PDF into a document library without the new feature activated on the web:

As you can see: The “Title” of the DOCX is used for the Title column of the SharePoint list item. For the PDF file the Title column is empty.

4. Now activate the feature:

5. After that delete the files uploaded before in the document library. Than upload both files again:

Now both “Title” columns are set!

6. My last test is to create a new Asset libary in the web. Than I upload both files and check the PDF’s properties:

The Title column is set as expected!!

Demo in SharePoint 2013:

I’ve added a second project just for SP2013. Here is a single screenshot…

3 thoughts on “PDF UPLOAD METADATA EXTRACTOR (sample SharePoint 2013 & 2010 project) on Codeplex”

Bruce Officer on February 5, 2015 at 17:59 said:

As a non-programmer the details of this are too technical for me, but will it get around the same problem in SharePoint Online, which also doesn’t seem to show the titles of PDFs I upload to it?

Reply ↓
- ikarstein on February 28, 2015 at 00:05 said:
  
  this is correct! SharePoint Online will not show titles! You could solve this by creating a remote event receiver.
  
  Reply ↓
Barry Prentiss on September 24, 2015 at 22:44 said:

This worked flawlessly for me (Thx!).
Now I need to expand this capability to extract several embedded properties/attributes from our PDF’s and populate several columns in the destination document library.
Is this project extensible like that?
Has anyone done this before?

Reply ↓

Ingo Karstein's Blog @ kenaro

kenaflow, SharePoint, Workflows, Microsoft Certified Master: SharePoint, PowerShell, Enterprise Web Development

PDF UPLOAD METADATA EXTRACTOR (sample SharePoint 2013 & 2010 project) on Codeplex

Related

3 thoughts on “PDF UPLOAD METADATA EXTRACTOR (sample SharePoint 2013 & 2010 project) on Codeplex”

Leave a Reply Cancel reply

Share this:

Related

3 thoughts on “PDF UPLOAD METADATA EXTRACTOR (sample SharePoint 2013 & 2010 project) on Codeplex”

Leave a Reply Cancel reply