perf: make Version a little faster #987

henryiii · 2025-11-24T22:30:24Z

I made a quick benchmark with ChatGPT's help:

tasks/benchmark_version.py:

import timeit
from packaging.version import Version

TEST_VERSIONS = [
    "1.0.0",
    "2.7",
    "1.2.3rc1",
    "0.9.0.dev4",
    "10.5.1.post2",
    "1!2.3.4",
    "1.0+abc.1",
    "2025.11.24",
    "3.4.5-preview.8",
    "v1.0.0",
] * 10_000


def bench():
    for v in TEST_VERSIONS:
        Version(v)


if __name__ == "__main__":
    t = timeit.timeit("bench()", globals=globals(), number=5)
    print(f"Time: {t:.4f} seconds")
    print(f"Per version: {1_000_000 * t / len(TEST_VERSIONS) / 5:.9f} µs")

Then ran Python 3.15's sampling profiler:

$ sudo -E uv run --python 3.15 python -m profiling.sampling tasks/benchmark_version.py
Time: 1.3528 seconds
Per version: 2.705616084 µs
Captured 13646 samples in 1.36 seconds
Sample rate: 10000.01 samples/sec
Error rate: 20.57%
Profile Stats:
       nsamples   sample%  tottime (ms)    cumul%   cumtime (s)  filename:lineno(function)
        1/10703       0.0         0.100      99.4         1.070  _sync_coordinator.py:193(_execute_script)
        0/10703       0.0         0.000      99.4         1.070  _sync_coordinator.py:234(main)
        0/10703       0.0         0.000      99.4         1.070  _sync_coordinator.py:251(<module>)
        0/10703       0.0         0.000      99.4         1.070  <frozen runpy>:88(_run_code)
        0/10703       0.0         0.000      99.4         1.070  <frozen runpy>:198(_run_module_as_main)
        0/10661       0.0         0.000      99.0         1.066  <timeit-src>:6(inner)
        0/10661       0.0         0.000      99.0         1.066  timeit.py:183(Timer.timeit)
        0/10661       0.0         0.000      99.0         1.066  timeit.py:240(timeit)
        0/10661       0.0         0.000      99.0         1.066  benchmark_version.py:25(<module>)
      670/10660       6.2        67.000      99.0         1.066  benchmark_version.py:21(bench)
        82/9990       0.8         8.200      92.7         0.999  __init__:0(__init__)
      2613/2623      24.3       261.300      24.4         0.262  version.py:201(Version.__init__)
       951/2106       8.8        95.100      19.6         0.211  version.py:218(Version.__init__)
      1660/1813      15.4       166.000      16.8         0.181  version.py:208(Version.__init__)
      1068/1151       9.9       106.800      10.7         0.115  version.py:206(Version.__init__)

Legend:
  nsamples: Direct/Cumulative samples (direct executing / on call stack)
  sample%: Percentage of total samples this function was directly executing
  tottime: Estimated total time spent directly in this function
  cumul%: Percentage of total samples when this function was on the call stack
  cumtime: Estimated cumulative time (including time in called functions)
  filename:lineno(function): Function location and name

Summary of Interesting Functions:

Functions with Highest Direct/Cumulative Ratio (Hot Spots):
  0.818 direct/cumulative ratio, 58.4% direct samples: version.py:(Version.__init__)
  0.063 direct/cumulative ratio, 6.2% direct samples: benchmark_version.py:(bench)
  0.008 direct/cumulative ratio, 0.8% direct samples: __init__:(__init__)

Functions with Highest Call Frequency (Indirect Calls):
  10703 indirect calls, 99.4% total stack presence: _sync_coordinator.py:(main)
  10703 indirect calls, 99.4% total stack presence: _sync_coordinator.py:(<module>)
  10703 indirect calls, 99.4% total stack presence: <frozen runpy>:(_run_code)

Functions with Highest Call Magnification (Cumulative/Direct):
  10703.0x call magnification, 10702 indirect calls from 1 direct: _sync_coordinator.py:(_execute_script)
  121.8x call magnification, 9908 indirect calls from 82 direct: __init__:(__init__)
  15.9x call magnification, 9990 indirect calls from 670 direct: benchmark_version.py:(bench)

Looking at the result, I was surprised to find re wasn't dominating as heavily as I expected it to. This also explained why playing with the regex to add 3.11 atomic features wasn't measurable. Looking at the slow functions, I noticed a line making lists and tuples unnecessarily, so I replaced it with a ~20x more performant function and that cut the amount of time this line (218 above/217 below) took nearly in half, with around a 10% overall improvement.

$ sudo -E uv run --python 3.15 python -m profiling.sampling tasks/benchmark_version.py
Time: 1.2303 seconds
Per version: 2.460599834 µs
Captured 12417 samples in 1.24 seconds
Sample rate: 10000.01 samples/sec
Error rate: 22.82%
Profile Stats:
       nsamples   sample%  tottime (ms)    cumul%  cumtime (ms)  filename:lineno(function)
         0/9481       0.0         0.000      99.7       948.100  _sync_coordinator.py:193(_execute_script)
         0/9481       0.0         0.000      99.7       948.100  _sync_coordinator.py:234(main)
         0/9481       0.0         0.000      99.7       948.100  _sync_coordinator.py:251(<module>)
         0/9481       0.0         0.000      99.7       948.100  <frozen runpy>:88(_run_code)
         0/9481       0.0         0.000      99.7       948.100  <frozen runpy>:198(_run_module_as_main)
         0/9449       0.0         0.000      99.3       944.900  timeit.py:240(timeit)
         0/9449       0.0         0.000      99.3       944.900  benchmark_version.py:25(<module>)
         0/9448       0.0         0.000      99.3       944.800  <timeit-src>:6(inner)
         0/9448       0.0         0.000      99.3       944.800  timeit.py:183(Timer.timeit)
       623/9446       6.5        62.300      99.3       944.600  benchmark_version.py:21(bench)
        76/8823       0.8         7.600      92.7       882.300  __init__:0(__init__)
      2133/2133      22.4       213.300      22.4       213.300  version.py:200(Version.__init__)
      1848/1889      19.4       184.800      19.9       188.900  version.py:207(Version.__init__)
       932/1237       9.8        93.200      13.0       123.700  version.py:217(Version.__init__)
      1070/1099      11.2       107.000      11.6       109.900  version.py:205(Version.__init__)

Legend:
  nsamples: Direct/Cumulative samples (direct executing / on call stack)
  sample%: Percentage of total samples this function was directly executing
  tottime: Estimated total time spent directly in this function
  cumul%: Percentage of total samples when this function was on the call stack
  cumtime: Estimated cumulative time (including time in called functions)
  filename:lineno(function): Function location and name

Summary of Interesting Functions:

Functions with Highest Direct/Cumulative Ratio (Hot Spots):
  0.941 direct/cumulative ratio, 62.9% direct samples: version.py:(Version.__init__)
  0.066 direct/cumulative ratio, 6.5% direct samples: benchmark_version.py:(bench)
  0.009 direct/cumulative ratio, 0.8% direct samples: __init__:(__init__)

Functions with Highest Call Frequency (Indirect Calls):
  9481 indirect calls, 99.7% total stack presence: _sync_coordinator.py:(_execute_script)
  9481 indirect calls, 99.7% total stack presence: _sync_coordinator.py:(main)
  9481 indirect calls, 99.7% total stack presence: _sync_coordinator.py:(<module>)

Functions with Highest Call Magnification (Cumulative/Direct):
  116.1x call magnification, 8747 indirect calls from 76 direct: __init__:(__init__)
  15.2x call magnification, 8823 indirect calls from 623 direct: benchmark_version.py:(bench)

Inspired by the caching in #986 and #985.

Signed-off-by: Henry Schreiner <henryfs@princeton.edu>

notatallshaw · 2025-11-25T02:42:39Z

I think the impact improving Version performance will be very dependent on the versions tested against.

I ran this via my recent pip benchmark I've been using to find hotspots, to see a real world use, here are the results (there's some slight randomness in pip which is why there isn't an identical number of calls to Version.__init__):

Before this PR:

After this PR:

We can see that the time spent in _cmpkey reduces by ~40% during __init__.

But I do wonder, why eagerly calculate _cmpkey? Once I have a moment I think I will see if making calculating it lazily saves any time for pip.

notatallshaw · 2025-11-25T02:50:30Z

Worth noting, the above diagrams are constructed with the cprofiler, and that can cause different parts of the code to be affected differently in how long they take to execute.

But the relative improvement in _cmpkey seems big enough that it's likely and matches your bench marking.

henryiii · 2025-11-25T02:58:36Z

Yes, looks similar to the statistical profile I took, the whole function (_cmpkey) spends time doing some other things, so it's a little less than 50% faster.

I got the regex rework working so I'll make a draft PR for that in a minute.

henryiii changed the title ~~fix: make Version a little faster~~ perf: make Version a little faster Nov 24, 2025

fix: make Version a little faster

8337faf

Signed-off-by: Henry Schreiner <henryfs@princeton.edu>

henryiii force-pushed the henryiii/fix/fasterversion branch from d393893 to 8337faf Compare November 25, 2025 02:23

notatallshaw approved these changes Nov 25, 2025

View reviewed changes

henryiii merged commit 28294b6 into pypa:main Nov 25, 2025
72 of 74 checks passed

henryiii deleted the henryiii/fix/fasterversion branch November 25, 2025 02:58

notatallshaw mentioned this pull request Nov 25, 2025

perf: lazily calculate _key in Version #989

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: make Version a little faster #987

perf: make Version a little faster #987

henryiii commented Nov 24, 2025 •

edited

Loading

Uh oh!

notatallshaw commented Nov 25, 2025

Uh oh!

notatallshaw commented Nov 25, 2025 •

edited

Loading

Uh oh!

henryiii commented Nov 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

perf: make Version a little faster #987

perf: make Version a little faster #987

Conversation

henryiii commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

notatallshaw commented Nov 25, 2025

Uh oh!

notatallshaw commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

henryiii commented Nov 25, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

henryiii commented Nov 24, 2025 •

edited

Loading

notatallshaw commented Nov 25, 2025 •

edited

Loading