Skip to content

Support font subsetting to reduce size of pdf #103

@Yang-Xijie

Description

@Yang-Xijie

Describe the bug

I want to add Chinese and Japanese in PDF. I did present Chinese and Japanese characters (は哈) successfully, but the size of output.pdf is too large (14MB).

I read the example doc and found the chapter 8.6.2 Composite fonts. I just want to render each character seperately, namely extract the font of a single character and then package these characters in PDF file. How to achieve this using borb? I wonder if there is an exact configuration in borb?

To Reproduce

Steps to reproduce the behaviour:

Download Microsoft Yahei.ttf at https://github.com/dolbydu/font/blob/master/unicode/Microsoft%20Yahei.ttf

from borb.pdf.document.document import Document
from borb.pdf.page.page import Page
from borb.pdf.canvas.layout.page_layout.multi_column_layout import SingleColumnLayout
from borb.pdf.canvas.layout.page_layout.page_layout import PageLayout
from borb.pdf.canvas.layout.text.paragraph import Paragraph
from borb.pdf.pdf import PDF
from borb.pdf.canvas.font.simple_font.true_type_font import TrueTypeFont
import time

from pathlib import Path

def print_current_time():
    print(time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()))

if __name__ == "__main__":

    print_current_time()

    font_path = Path(__file__).parent / "font" / "Microsoft Yahei.ttf"
    custom_font = TrueTypeFont.true_type_font_from_file(font_path)

    print_current_time()

    doc = Document()
    page = Page()
    doc.append_page(page)
    layout = SingleColumnLayout(page)
    layout.add(Paragraph("はははは哈哈", font=custom_font))

    print_current_time()

    timestamp = time.strftime("%Y_%m_%d_%H_%M_%S", time.localtime())
    pdf_name = timestamp + ".pdf"
    pdf_path = Path(__file__).parent / "pdf" / pdf_name
    with open(pdf_path, "wb") as pdf_file_handle:
        PDF.dumps(pdf_file_handle, doc)

    print_current_time()
2022-05-27 21:19:11
2022-05-27 21:19:26
2022-05-27 21:19:27
2022-05-27 21:20:02
[ 288]  .
├── [  97]  README.md
├── [ 128]  font
│   ├── [ 21M]  Microsoft Yahei.ttf
│   └── [ 74M]  PingFang.ttc
├── [1.3K]  main.py
└── [  96]  pdf
    └── [ 14M]  2022_05_27_20_49_11.pdf

Expected behaviour

The size of PDF file should be less than 1MB.

Desktop (please complete the following information):

  • OS: macOS 12.3
  • borb version 2.0.26
  • Python 3.9.5

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions