Pull the plots, figures, and diagrams out of a PDF and save each one as its own PNG image.
Point it at a datasheet or a research paper, and you get a folder of cropped images: every graph, every figure, every embedded picture, as a separate file. It works on a regular computer, no graphics card needed.
It runs two ways:
- Command line — type one command, get a folder of images.
- Web page — run one command, open your browser, drag in a PDF, click a button.
This guide assumes you have never used a terminal before. Follow it line by line.
- A computer running Linux (this tool does not run on Windows or macOS).
- Python 3.9 or newer. Most Linux systems already have it.
Everything else gets installed automatically in the steps below.
Press Ctrl + Alt + T. A dark window opens where you type commands. That window is "the terminal." Every command in this guide is typed there, and you press Enter after each one.
Type this and press Enter:
python3 --versionIf you see something like Python 3.10.12, you are good. The number just needs to be 3.9 or higher.
If instead you see command not found, install Python by running:
sudo apt update
sudo apt install -y python3 python3-venv python3-pipIt will ask for your password. Type it (you will not see the characters as you type, that is normal) and press Enter.
"From source" means you download the code and set it up yourself. Five steps.
If you already have the project folder on your computer, skip to Step 2.
Otherwise, download it with git:
git clone https://github.com/aula-id/pdf-nano.gitIf git is not installed, run sudo apt install -y git first, then run the clone command again.
cd pdf-nanocd means "change directory" — it moves you into the folder. If your folder has a different name, use that name instead of pdf-nano.
This creates a clean, isolated space so pdfnano's parts do not interfere with the rest of your system. It is called a "virtual environment."
python3 -m venv .venvThis makes a hidden folder named .venv. You only do this once.
source .venv/bin/activateAfter this, your terminal line starts with (.venv). That tag means the workspace is active.
Important: every time you open a new terminal to use pdfnano, you must run this source command again first. If pdfnano ever says "command not found," this is almost always the reason — you forgot to turn the workspace on.
pip install -e .This downloads the libraries pdfnano needs and sets up the pdfnano command. It takes a minute or two the first time. When it finishes, check that it worked:
pdfnano --versionYou should see pdfnano 0.1.0. That is it — pdfnano is installed.
pdfnano extract yourfile.pdfReplace yourfile.pdf with the name of your PDF. If the PDF is in a different folder, write the full path, for example:
pdfnano extract /home/you/Downloads/datasheet.pdfBy default the images are saved into a new folder called graphs next to where you ran the command. Open that folder in your file browser to see the results.
pdfnano extract yourfile.pdf --out my_imagesThis saves into a folder named my_images instead.
If you only want pages 3 and 4:
pdfnano extract yourfile.pdf --pages 3,4A range works too — pages 3 through 6:
pdfnano extract yourfile.pdf --pages 3-6Out of the box, pdfnano pulls everything visual it can find on each page:
- Plots — graphs with axes (current vs. voltage, that kind of thing).
- Figures — diagrams and drawings made of lines and shapes.
- Images — pictures embedded in the PDF.
You do not need to turn anything on. It is all on by default.
If you only want the axis graphs and nothing else:
pdfnano extract yourfile.pdf --no-figures --no-images--no-figuresskips diagrams and drawings.--no-imagesskips embedded pictures.
For graphs that have a printed code under them (common on datasheets, like IT16804), the file is named after that code: IT16804.png.
pdfnano figures out the code pattern from the PDF on its own — you do not have to tell it anything. If a graph has no code, the file is named by its position on the page instead, like p3_r1c2.png (page 3, row 1, column 2). Figures and pictures are named like p3_fig1.png and p3_img1.png.
If you would rather always use position-based names and ignore codes:
pdfnano extract yourfile.pdf --figid-pattern ''If typing commands is not your thing, use the built-in web page.
pdfnano webYou will see:
pdfnano web GUI running at http://127.0.0.1:8000/
Press Ctrl+C to stop.
Open a web browser and go to http://127.0.0.1:8000 (you can copy and paste that). You will see a simple page where you:
- Choose your PDF file.
- Optionally type which pages you want.
- Click Extract.
The extracted images show up right on the page, and you can click any of them to download it.
When you are done, go back to the terminal and press Ctrl + C to stop the web page.
If port 8000 is already used by something else, pick another number:
pdfnano web --port 9000Then open http://127.0.0.1:9000 instead.
Some PDFs are locked so you cannot copy or extract their contents (you can still open and read them, but tools get blocked). pdfnano can unlock a working copy on the fly:
pdfnano extract locked.pdf --decryptFor this to work you need a small helper program called qpdf. Install it once:
sudo apt install -y qpdfpdfnano does not change your original file — it makes a temporary unlocked copy, uses it, and deletes it afterward.
If you write Python, you can call pdfnano directly:
from pdfnano import extract_graphs
results = extract_graphs("datasheet.pdf", pages=[3, 4], out="figs")
for r in results:
print(r["path"], r["width"], r["height"])extract_graphs returns a list, one entry per image, each a dictionary with the file path, page number, size, and a few other details.
pdfnano: command not found
Your workspace is off. Run source .venv/bin/activate from inside the project folder, then try again.
Nothing found.
pdfnano did not detect anything on the pages you asked for. Double-check the page numbers, or add --debug to save a marked-up image showing what it looked at:
pdfnano extract yourfile.pdf --debugThe debug images land in your output folder. Red boxes are what it picked; thin blue boxes are everything it considered.
The wrong things got extracted, or too many.
Turn off what you do not want with --no-figures and --no-images, or narrow down with --pages.
pdfnano extract --helpWhen you are completely finished, you can turn the workspace off with:
deactivateThe (.venv) tag disappears. Next time, just cd back into the folder and run source .venv/bin/activate again.
A simpler one-command install (pip install pdfnano) is planned. This section will be filled in once it is published.