Search Write Sign up Sign in
Downloading “undownloadable”
web PDFs with Fiddler.
A B · Follow
7 min read · Jul 13, 2018
29 3
I was once teaching a course in the area of backend software engineering. I
didn’t own the course material, my duties included going over and
presenting the slide deck that I had been provided by the course coordinator,
answering any outstanding questions from the class, being on time, having
lunch, and timely getting lost at 5:30 pm. At the end of the course, naturally,
the students asked me to share the slide deck with them so they could go
over it on their own. And that’s when the issue revealed itself — the course
slides were provided to me via a secure document sharing platform, let’s call
it PDFLord [I won’t mention the actual name for the sake of… reasons],
which imposed downloading and printing restrictions on all the course
PDFs. So, unfortunately, the students had to leave the class empty-handed.
However, something didn’t seem right in my mind — if you can see the
document on your screen, surely its source is hiding somewhere in the files
downloaded/cached by your browser, and consequently the download
restriction is artificial in a sense. In this article I will show you a method to
overcome these restrictions that I discovered in the two days following the
course. My tutorial will assume MacOS (High Sierra) development
environment, Chrome browser, and PDFLord platform, but similar steps
could be undertaken for other operating systems and other document
sharing platforms.
Membership
To begin with, let’s list the reasons why PDFLord was a bane of my existence:
Free
Access the best member-only stories.
Distraction-free reading. No ads. Support independent authors.
Organize your knowledge with lists and Listen to audio narrations.
Sign up to discover human highlights.
Read offline.
stories that deepen your Tell your story. Find your audience.
understanding of the world. Join the Partner Program and earn for
your writing.
Sign up for free
Try for $5/month
1. As mentioned before, the PDFs had downloading and printing
restrictions (as indicated by the grayed out icons in the top right corner).
2. The PDFs were copy-protected, meaning I could not select any text (as
indicated by the “Protected File” pop-up on mouse click).
3. The PDFs were unsearchable, meaning I had to memorize the page
numbers of all sections in the course that I wanted to quickly navigate to.
4. There was no fullscreen or present button.
My first intuition was to examine the page source files. I will skip the parts
where I was randomly clicking through all possible directories and folders
while looking for the right files, and instead will go straight to the ones
relevant to this tutorial. You can press Command+Shift+C to bring up the
developer console in Chrome. Then open the Sources tab.
As you can see there is a pdflord.com directory, with a plugins folder under
assets. If you scroll down, you will find a folder called pdfjs, which contains
two files — pdf.js and viewer.js. It turns out that PDFLord is using an open-
source PDF rendering and parsing javascript library by Mozilla, which you
can find here https://mozilla.github.io/pdf.js/
Let’s dig through the viewer.js file a bit more. After some inspection we find a
method which sounds like it deals with page rendering:
function webViewerPageRendered(evt)
Let’s add a breakpoint on line 2141 inside this method right after the
pageView variable and reload the page. Our goal is to examine what the
object pointed at by this variable represents.
After clicking through a bunch of object members… voila! We finally
stumble on what we have been looking for — an integer array that very likely
represents pixel data of the image of page 1 of the PDF.
Surely, now we can just write a script to go over every page in the PDF,
extract the image data arrays, convert them to jpegs, and end up with a
sequence of images of the PDF file. To be honest, I wasn’t quite satisfied with
this finding — I would still not be able to select any text or search through
the images. I was looking for a better way.
If we examine the viewer.js file a bit more, we find another interesting
function:
In particular, there is this very intriguing line which looks like it deals with
restricting downloads:
if (PDFViewerApplication &&
PDFViewerApplication.appConfig.allowdownload) {
And then we also find the following sequence which deals with binding
events to button click listeners. It’s amusing how the “print” and “download”
events are very sloppily commented out, most likely to handle print and
download logic in a different part of the code.
At this point our action plan is clear:
1. We will rebind one of the buttons to serve as a download button (simply
uncommenting the download event listener didn’t work, I didn’t dig too
much into why).
2. Change the download permissions logic to not require allowdownload.
3. ???
4. Proceed to downloading the PDF.
To make changes to javascript files returned by a web page we need a man-
in-the-middle proxy server. For this purpose, we will be using Fiddler — a
free web debugging proxy by Telerik https://www.telerik.com/fiddler. Fiddler
was originally developed as a Windows application, and only recently got
ported to Mac. On MacOS it runs using Mono — an open-source
implementation of .NET Framework. You can follow this tutorial
https://www.telerik.com/blogs/introducing-fiddler-for-os-x-beta-1 to install
Mono and Fiddler. The only difference is that Fiddler 64bit version doesn’t
work on OS X, so you would need to use this command to start Fiddler and
avoid errors:
mono --arch=32 Fiddler.exe
Most websites nowadays use https, so we need to configure Fiddler to
correctly capture and decrypt https traffic. Open Tools->Options->HTTPS,
and check the Decrypt HTTPS Traffic checkbox.
Since Fiddler acts as a proxy, browser traffic gets redirected to it. All
browsers know how to protect user data from man-in-the-middle attacks, so
they don’t let the traffic be delivered to actors whose certificates are not
trusted. To bypass this constraint we click on Actions->Export Root Certificate
To Desktop. Next, open Keychain Access — MacOS app that manages
certificates — and drag-n-drop the generated certificate from your desktop
to the Keychain window. The certificate will appear as
DO_NOT_TRUST_FiddlerRoot. Double click on it, and in the new window
select Always Trust.
The final step is to actually redirect the traffic from Chrome to Fiddler. Open
System Preferences->Network->Advanced->Proxies. Check Web Proxy and
Secure Web Proxy, and for both set the host to 127.0.0.1 and the port to 8888.
Click Ok, then Apply.
You should now start seeing the traffic from your browser in the main
Fiddler window. If you don’t see anything, try using an Incognito Window.
Now the fun part: hacking the javascript files and serving them in place of
the original files. Download (or copy paste) the viewer.js file, open it in your
favorite editor, and replace line 10279 with:
items.zoomIn.addEventListener('click', function() {
//eventBus.dispatch('zoomin');
eventBus.dispatch('download');
});
In short, we are binding the download event to the zoom-in button. Next,
remove `PDFViewerApplication.appConfig.allowdownload` from lines 1475
and 5067 (and anywhere else in the file for that matter):
if (PDFViewerApplication)
Our substitute viewer.js file is ready for deployment. Find and select the
viewer.js resource in Fiddler (you might want to stop capturing traffic to
prevent the window from refreshing by disabling File->Capture Traffic).
Actual name of the website replaced with pdflord.
Then in the panel on the right select AutoResponder->Add Rule. In the
bottom drop-down menu choose Find File, select your substitute viewer.js file
and click Save. Make sure both Enable rules and Unmatched requests
passthrough are checked.
Actual name of the website replaced with pdflord.
Aaaaaand… drum roll… we are done! We are ready to download our PDF.
Open your Chrome window with the PDF viewer. With your debugging
console being open, right click the refresh button and click on Empty Cache
and Hard Reload. Don’t forget to reenable Capture Traffic in Fiddler.
Actual name of the website replaced with pdflord.
Emptying the cache is necessary to not let Chrome pick up the original
version of viewer.js and instead make it download it again from the web. The
downloaded javascript file gets intercepted by Fiddler and replaced with our
custom one.
Now, whenever you click on the Zoom In button (“+”), your PDF will get
downloaded. Great success!
Final thoughts and lessons learned:
When any data reaches your computer, there is absolutely no way to
guarantee its complete integrity.
Basing your business model on a premise that the data you share is fully
secure and protected is a terrible idea.
Hope y’all who got this far had as much fun with this tutorial as I did when
fiddling with this challenge.
Disclaimer: use at your own risk. Make sure you are not breaching any
contracts with your document providers. There is a very obvious potential
harm to the business models of the secure document sharing companies.
JavaScript Fiddler Hacking Pdf Chrome
29 3
Written by A B Follow
46 Followers
I do things.
More from A B
AB AB
How I Earned $1000 on Two Preparing for Coding Interviews
Freelance Projects in One Week. Like Nobody Told You Before.
Enter the Freelancer.com Ecosystem as a Or how to use LeetCode the right way.
Complete Beginner.
8 min read · Nov 4, 2018 12 min read · Nov 11, 2018
1 1 8
See all from A B
Recommended from Medium
Sandeep Kumar Hamza Bin Munir
Pdf Upload and Pdf View CSS: Style and Elegance for the
Web
CSS, or Cascading Style Sheets, is a
cornerstone of modern web development,…
2 min read · 3 days ago · 2 min read · Feb 24, 2024
402 8
Lists
Stories to Help You Grow as a General Coding Knowledge
Software Developer 20 stories · 977 saves
19 stories · 865 saves
Living Well as a Generative AI Recommended
Neurodivergent Person Reading
10 stories · 620 saves 52 stories · 774 saves
Artturi Jalli Cloudmersive
I Built an App in 6 Hours that How to Optimize PDF Files in
Makes $1,500/Mo Node.js
Copy my strategy! To make PDFs containing large images more
manageable for storage & for sharing acros…
· 3 min read · Jan 23, 2024 2 min read · Sep 30, 2023
11.9K 147 4
Anish Singh Walia in 𝐀𝐈 𝐦𝐨𝐧𝐤𝐬.𝐢𝐨 Prithiv Sassisegarane in NanoNets
7 Secret Websites That Pay You to How to Extract Text from PDF
Work from Anywhere in 2024 — …
Looking for websites that pay you to work
from anywhere? Check out these 7 secret…
6 min read · Jan 10, 2024 4 min read · Oct 4, 2023
8.6K 116 1
See more recommendations
Help Status About Careers Blog Privacy Terms Text to speech Teams