Skip to content

Conversation

@henryiii
Copy link
Contributor

@henryiii henryiii commented Nov 24, 2025

I made a quick benchmark with ChatGPT's help:

tasks/benchmark_version.py:
import timeit
from packaging.version import Version

TEST_VERSIONS = [
    "1.0.0",
    "2.7",
    "1.2.3rc1",
    "0.9.0.dev4",
    "10.5.1.post2",
    "1!2.3.4",
    "1.0+abc.1",
    "2025.11.24",
    "3.4.5-preview.8",
    "v1.0.0",
] * 10_000


def bench():
    for v in TEST_VERSIONS:
        Version(v)


if __name__ == "__main__":
    t = timeit.timeit("bench()", globals=globals(), number=5)
    print(f"Time: {t:.4f} seconds")
    print(f"Per version: {1_000_000 * t / len(TEST_VERSIONS) / 5:.9f} µs")

Then ran Python 3.15's sampling profiler:

$ sudo -E uv run --python 3.15 python -m profiling.sampling tasks/benchmark_version.py
Time: 1.3528 seconds
Per version: 2.705616084 µs
Captured 13646 samples in 1.36 seconds
Sample rate: 10000.01 samples/sec
Error rate: 20.57%
Profile Stats:
       nsamples   sample%  tottime (ms)    cumul%   cumtime (s)  filename:lineno(function)
        1/10703       0.0         0.100      99.4         1.070  _sync_coordinator.py:193(_execute_script)
        0/10703       0.0         0.000      99.4         1.070  _sync_coordinator.py:234(main)
        0/10703       0.0         0.000      99.4         1.070  _sync_coordinator.py:251(<module>)
        0/10703       0.0         0.000      99.4         1.070  <frozen runpy>:88(_run_code)
        0/10703       0.0         0.000      99.4         1.070  <frozen runpy>:198(_run_module_as_main)
        0/10661       0.0         0.000      99.0         1.066  <timeit-src>:6(inner)
        0/10661       0.0         0.000      99.0         1.066  timeit.py:183(Timer.timeit)
        0/10661       0.0         0.000      99.0         1.066  timeit.py:240(timeit)
        0/10661       0.0         0.000      99.0         1.066  benchmark_version.py:25(<module>)
      670/10660       6.2        67.000      99.0         1.066  benchmark_version.py:21(bench)
        82/9990       0.8         8.200      92.7         0.999  __init__:0(__init__)
      2613/2623      24.3       261.300      24.4         0.262  version.py:201(Version.__init__)
       951/2106       8.8        95.100      19.6         0.211  version.py:218(Version.__init__)
      1660/1813      15.4       166.000      16.8         0.181  version.py:208(Version.__init__)
      1068/1151       9.9       106.800      10.7         0.115  version.py:206(Version.__init__)

Legend:
  nsamples: Direct/Cumulative samples (direct executing / on call stack)
  sample%: Percentage of total samples this function was directly executing
  tottime: Estimated total time spent directly in this function
  cumul%: Percentage of total samples when this function was on the call stack
  cumtime: Estimated cumulative time (including time in called functions)
  filename:lineno(function): Function location and name

Summary of Interesting Functions:

Functions with Highest Direct/Cumulative Ratio (Hot Spots):
  0.818 direct/cumulative ratio, 58.4% direct samples: version.py:(Version.__init__)
  0.063 direct/cumulative ratio, 6.2% direct samples: benchmark_version.py:(bench)
  0.008 direct/cumulative ratio, 0.8% direct samples: __init__:(__init__)

Functions with Highest Call Frequency (Indirect Calls):
  10703 indirect calls, 99.4% total stack presence: _sync_coordinator.py:(main)
  10703 indirect calls, 99.4% total stack presence: _sync_coordinator.py:(<module>)
  10703 indirect calls, 99.4% total stack presence: <frozen runpy>:(_run_code)

Functions with Highest Call Magnification (Cumulative/Direct):
  10703.0x call magnification, 10702 indirect calls from 1 direct: _sync_coordinator.py:(_execute_script)
  121.8x call magnification, 9908 indirect calls from 82 direct: __init__:(__init__)
  15.9x call magnification, 9990 indirect calls from 670 direct: benchmark_version.py:(bench)

python-performance-flamegraph

Looking at the result, I was surprised to find re wasn't dominating as heavily as I expected it to. This also explained why playing with the regex to add 3.11 atomic features wasn't measurable. Looking at the slow functions, I noticed a line making lists and tuples unnecessarily, so I replaced it with a ~20x more performant function and that cut the amount of time this line (218 above/217 below) took nearly in half, with around a 10% overall improvement.

$ sudo -E uv run --python 3.15 python -m profiling.sampling tasks/benchmark_version.py
Time: 1.2303 seconds
Per version: 2.460599834 µs
Captured 12417 samples in 1.24 seconds
Sample rate: 10000.01 samples/sec
Error rate: 22.82%
Profile Stats:
       nsamples   sample%  tottime (ms)    cumul%  cumtime (ms)  filename:lineno(function)
         0/9481       0.0         0.000      99.7       948.100  _sync_coordinator.py:193(_execute_script)
         0/9481       0.0         0.000      99.7       948.100  _sync_coordinator.py:234(main)
         0/9481       0.0         0.000      99.7       948.100  _sync_coordinator.py:251(<module>)
         0/9481       0.0         0.000      99.7       948.100  <frozen runpy>:88(_run_code)
         0/9481       0.0         0.000      99.7       948.100  <frozen runpy>:198(_run_module_as_main)
         0/9449       0.0         0.000      99.3       944.900  timeit.py:240(timeit)
         0/9449       0.0         0.000      99.3       944.900  benchmark_version.py:25(<module>)
         0/9448       0.0         0.000      99.3       944.800  <timeit-src>:6(inner)
         0/9448       0.0         0.000      99.3       944.800  timeit.py:183(Timer.timeit)
       623/9446       6.5        62.300      99.3       944.600  benchmark_version.py:21(bench)
        76/8823       0.8         7.600      92.7       882.300  __init__:0(__init__)
      2133/2133      22.4       213.300      22.4       213.300  version.py:200(Version.__init__)
      1848/1889      19.4       184.800      19.9       188.900  version.py:207(Version.__init__)
       932/1237       9.8        93.200      13.0       123.700  version.py:217(Version.__init__)
      1070/1099      11.2       107.000      11.6       109.900  version.py:205(Version.__init__)

Legend:
  nsamples: Direct/Cumulative samples (direct executing / on call stack)
  sample%: Percentage of total samples this function was directly executing
  tottime: Estimated total time spent directly in this function
  cumul%: Percentage of total samples when this function was on the call stack
  cumtime: Estimated cumulative time (including time in called functions)
  filename:lineno(function): Function location and name

Summary of Interesting Functions:

Functions with Highest Direct/Cumulative Ratio (Hot Spots):
  0.941 direct/cumulative ratio, 62.9% direct samples: version.py:(Version.__init__)
  0.066 direct/cumulative ratio, 6.5% direct samples: benchmark_version.py:(bench)
  0.009 direct/cumulative ratio, 0.8% direct samples: __init__:(__init__)

Functions with Highest Call Frequency (Indirect Calls):
  9481 indirect calls, 99.7% total stack presence: _sync_coordinator.py:(_execute_script)
  9481 indirect calls, 99.7% total stack presence: _sync_coordinator.py:(main)
  9481 indirect calls, 99.7% total stack presence: _sync_coordinator.py:(<module>)

Functions with Highest Call Magnification (Cumulative/Direct):
  116.1x call magnification, 8747 indirect calls from 76 direct: __init__:(__init__)
  15.2x call magnification, 8823 indirect calls from 623 direct: benchmark_version.py:(bench)

Inspired by the caching in #986 and #985.

@henryiii henryiii changed the title fix: make Version a little faster perf: make Version a little faster Nov 24, 2025
Signed-off-by: Henry Schreiner <henryfs@princeton.edu>
@henryiii henryiii force-pushed the henryiii/fix/fasterversion branch from d393893 to 8337faf Compare November 25, 2025 02:23
@notatallshaw
Copy link
Member

I think the impact improving Version performance will be very dependent on the versions tested against.

I ran this via my recent pip benchmark I've been using to find hotspots, to see a real world use, here are the results (there's some slight randomness in pip which is why there isn't an identical number of calls to Version.__init__):

Before this PR:

image

After this PR:

image

We can see that the time spent in _cmpkey reduces by ~40% during __init__.

But I do wonder, why eagerly calculate _cmpkey? Once I have a moment I think I will see if making calculating it lazily saves any time for pip.

@notatallshaw
Copy link
Member

notatallshaw commented Nov 25, 2025

Worth noting, the above diagrams are constructed with the cprofiler, and that can cause different parts of the code to be affected differently in how long they take to execute.

But the relative improvement in _cmpkey seems big enough that it's likely and matches your bench marking.

@henryiii
Copy link
Contributor Author

Yes, looks similar to the statistical profile I took, the whole function (_cmpkey) spends time doing some other things, so it's a little less than 50% faster.

I got the regex rework working so I'll make a draft PR for that in a minute.

@henryiii henryiii merged commit 28294b6 into pypa:main Nov 25, 2025
72 of 74 checks passed
@henryiii henryiii deleted the henryiii/fix/fasterversion branch November 25, 2025 02:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants