PDF Sanitizer

PDF Sanitizer

Upload, remove hyperlinks, and download sanitized PDF files with Team 5's free PDF Sanitizer.

Meet the Team

Jorge L. Santos

Computer Information Systems Major

PDF Sanitization & Site

santosmendezj1@udayton.edu

Jacob Babal

Computer Information Systems Major

Backend & Frontend

babalj1@udayton.edu

Keivan Jiang

Computer Information Systems Major

Frontend

jiangk3@udayton.edu

Jessica Rees

Computer Information Systems Major

Frontend

reesj3@udayton.edu

Andrew Manory

Computer Information Systems Major

Contanerization

manorya1@udayton.edu

Why a PDF Sanitizer?

In today's environment, we often interact a lot with PDF documents. Maybe it's homework a professor gave you, or it's a cool pamphlet for a concert, or you're doing graduate research and need to read up on a study! Whatever the reason is, most likely the PDF document contains hyperlinks. Now, if you downloaded the document from an untrusted source, one of the hyperlinks inside the PDF file might be malicious! Accidentally clicking on it may cause you to download malware, get phished, or even suffer a CSRF attack! Our solution is to remove hyperlink elements from PDFs to make them unclickable while still remaining in the PDF file.

How To Use

Step 1: Browse for the PDF File or simply drag it onto the program.

Step 2: Click the upload button.

Step 3: Download the Zip file for the Sanitized PDF and the Log.

Step 4: Enjoy your sanitized PDF file

Technology

React TS
Axios
Python PDF ToolKit
Docker

Demo Video

The team would like to thank the following people for their contributions to the project

Todd Irlbeck

Concepcion Garcia Jr

Nicholas Stiffler

Erika Terrado

Sprint Blog

1 / 4
Over the past few weeks, the final portion of the PDF Sanitizer has been finished. A user is able to upload a pdf, have the PDF sanitized, and download the sanitized version of the PDF. All that's left is figuring out the finishing touches.
April 17th | Jessica Rees
2 / 4
Over the past 3 weeks, we have had extreme amounts of success with the project. We've gone from a large amount of disconeccted parts and have moved into, having one fully connected project. This means that you can go to the website, upload a pdf, which then all the hyperlinks within the PDF are detected. We currently need to continue working on sending the output to the user, as currently a log file detailing all the detected hyperlinks is generated, but not sent anywhere. Additionally, we need to find a way to remove the hyperlinks from the PDF file, without completely destroying the PDF's format.
March 27th | Andrew Manory & Jessica Rees
3 / 4
This week, we have managed to finally get our frontend to communicate with the backend. We are able to successfully upload a PDF file through our web app, and receive it on a local folder on our repository. It also limits uploads to .pdf types. We are working on developing a drag and drop for the front end. For now, our backend just receives the file. Our next step would be to get our sanitizer functionality to do something with the files received.
March 6th | Keivan Jiang
4 / 4
We’re working on connecting our front and back end. This last week, we were finally able to parse a PDF and find all the data within. We are also able to manually remove a hyperlink element from text. Our next step would be to automate the process of searching for the hyperlink and removing it. We’re also working on uploading files from the front, which can be received from the back. We’ve stumbled a bit, struggling to find the best solution to this issue. With some more research and testing, we should get it working soon.
February 27th | Keivan Jiang