Skip to content

Rewrite ElementPath as a Cython .pxi backed by a libxml2 walker#501

Open
mahomaho wants to merge 4 commits into
lxml:masterfrom
mahomaho:cython-elementpath
Open

Rewrite ElementPath as a Cython .pxi backed by a libxml2 walker#501
mahomaho wants to merge 4 commits into
lxml:masterfrom
mahomaho:cython-elementpath

Conversation

@mahomaho

@mahomaho mahomaho commented Apr 29, 2026

Copy link
Copy Markdown

Replaces the pure-Python lxml._elementpath module with two new
Cython files compiled into etree:

  • iterfind.pxi -- a self-contained C-level engine: a path
    tokenizer/compiler producing an array of step structs, a lazy
    depth-first cursor walker over libxml2 nodes (xmlNode*), and an
    _IterFindResult cdef class that yields one xmlNode* per call.
    Operates entirely in libxml2 land: no _Element wrapping inside the
    walker, no regex, no Python-level string juggling.

  • _elementpath.pxi -- a thin Python-facing wrapper exposing
    find / findall / iterfind / findtext as cdef entry
    points that take an _Element, run the C engine, and call
    _elementFactory only on results actually returned to Python.

The compiler handles {uri}name Clark notation, prefix:name
resolution against the namespaces dict (including default-namespace
binding via the None / '' key), and the path-syntax rules that the old
Python wrapper enforced separately (reject leading /, treat
trailing / as implicit /*). Predicates supported: [@attr],
[@attr="value"], [@attr!="value"], [tag], [tag="value"],
[tag!="value"], [N], [last()], [last()-N],
[. = "value"], [. != "value"] -- the same set the Python
implementation supported.

The cursor walker is lazy: a path like item/name walks until the
first match and stops. find() short-circuits, findall()
materialises only as it iterates. This was the single biggest fix --
an earlier eager-BFS prototype regressed find() 75x because it
materialised every descendant before returning the first.

The cursor struct is two pointers (scope, current); the three logical
states (fresh / in-progress / exhausted) are encoded by current
alone using scope as the fresh sentinel. The steps and cursors
arrays share a single allocation. Predicates that compare strings go
through xmlGetNoNsProp / xmlNodeGetContent directly with no
Python copies.

Performance

Measured against released lxml 6.1.0 (PYTHONPATH switching) on a 200x20
doc, 5000 iters per case, Python 3.14:

  case                                     old us  new us  speedup
  -----------------------------------------------------------------
  find("item")                              2.27    0.39   5.82x
  find(".//child")                          2.40    0.41   5.85x
  find("item/name")                         4.08    0.48   8.50x
  findtext("item/name")                     4.68    0.55   8.51x
  findall("item/name")                    296.27   71.30   4.16x
  findall("item/child")                   834.94  425.98   1.96x
  findall("item[@id]")                     74.79   30.40   2.46x
  findall("item[@type=\"odd\"]")           84.60   25.13   3.37x
  findall("item[1]")                     1553.77   86.12  18.04x
  findall("item[last()]")                1596.55    2.69 593.51x
  findall("item[name]")                   288.86   17.38  16.62x
  findall(".//child[@key=\"k5\"]")       1449.26  376.61   3.85x
  miss: findall("nope")                     3.60    1.43   2.52x

  findall(".//child")                     512.74  458.81   1.12x
  findall(".//*")                        1546.46  779.29   1.98x

The biggest wins (predicates) come from skipping the per-call Python
re-parse the old code did via the LRU cache hit; find() wins from
short-circuit + no factory calls on intermediate matches; descendants
are flatter because the bottleneck is libxml2 traversal itself.

Validation

  • test_elementpath: 28/28 OK on Python 3.11 and 3.14
  • benchmark/verify_descendant.py: byte-identical findall() output
    vs released lxml across 8 descendant-style paths
  • The xpath_tokenizer doctest in selftest.py (and a few tokenizer
    unit tests in test_elementpath.py) tested the implementation of
    the regex-based parser; they're skipped/removed since the new
    parser has no Python tokenizer to test.

Files

  • New: src/lxml/iterfind.pxi (~1100 lines), src/lxml/_elementpath.pxi
  • Removed: src/lxml/_elementpath.py
  • etree.pyx: drop the from lxml import _elementpath import,
    include the two new pxi files, route the four call sites to the
    new cdef entry points
  • setupinfo.py: drop lxml._elementpath from COMPILED_MODULES
  • selftest.py, test_elementpath.py: remove tests of the
    removed regex tokenizer
  • benchmark/: bench_compare, bench_elementpath, bench_find_scaling,
    bench_find_vs_iter, verify_descendant -- the benchmarks/cross-checks
    used to validate the rewrite

Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com

Replaces the pure-Python ``lxml._elementpath`` module with two new
Cython files compiled into ``etree``:

* ``iterfind.pxi`` -- a self-contained C-level engine: a path
  tokenizer/compiler producing an array of step structs, a lazy
  depth-first cursor walker over libxml2 nodes (xmlNode*), and an
  ``_IterFindResult`` cdef class that yields one xmlNode* per call.
  Operates entirely in libxml2 land: no _Element wrapping inside the
  walker, no regex, no Python-level string juggling.

* ``_elementpath.pxi`` -- a thin Python-facing wrapper exposing
  ``find`` / ``findall`` / ``iterfind`` / ``findtext`` as cdef entry
  points that take an _Element, run the C engine, and call
  ``_elementFactory`` only on results actually returned to Python.

The compiler handles ``{uri}name`` Clark notation, ``prefix:name``
resolution against the namespaces dict (including default-namespace
binding via the None / '' key), and the path-syntax rules that the old
Python wrapper enforced separately (reject leading ``/``, treat
trailing ``/`` as implicit ``/*``). Predicates supported: ``[@attr]``,
``[@attr="value"]``, ``[@attr!="value"]``, ``[tag]``, ``[tag="value"]``,
``[tag!="value"]``, ``[N]``, ``[last()]``, ``[last()-N]``,
``[. = "value"]``, ``[. != "value"]`` -- the same set the Python
implementation supported.

The cursor walker is *lazy*: a path like ``item/name`` walks until the
first match and stops. ``find()`` short-circuits, ``findall()``
materialises only as it iterates. This was the single biggest fix --
an earlier eager-BFS prototype regressed ``find()`` 75x because it
materialised every descendant before returning the first.

The cursor struct is two pointers (scope, current); the three logical
states (fresh / in-progress / exhausted) are encoded by ``current``
alone using ``scope`` as the fresh sentinel. The steps and cursors
arrays share a single allocation. Predicates that compare strings go
through ``xmlGetNoNsProp`` / ``xmlNodeGetContent`` directly with no
Python copies.

## Performance

Measured against released lxml 6.1.0 (PYTHONPATH switching) on a 200x20
doc, 5000 iters per case, Python 3.14:

  case                                     old us  new us  speedup
  -----------------------------------------------------------------
  find("item")                              2.27    0.39   5.82x
  find(".//child")                          2.40    0.41   5.85x
  find("item/name")                         4.08    0.48   8.50x
  findtext("item/name")                     4.68    0.55   8.51x
  findall("item/name")                    296.27   71.30   4.16x
  findall("item/child")                   834.94  425.98   1.96x
  findall("item[@id]")                     74.79   30.40   2.46x
  findall("item[@type=\"odd\"]")           84.60   25.13   3.37x
  findall("item[1]")                     1553.77   86.12  18.04x
  findall("item[last()]")                1596.55    2.69 593.51x
  findall("item[name]")                   288.86   17.38  16.62x
  findall(".//child[@key=\"k5\"]")       1449.26  376.61   3.85x
  miss: findall("nope")                     3.60    1.43   2.52x

  findall(".//child")                     512.74  458.81   1.12x
  findall(".//*")                        1546.46  779.29   1.98x

The biggest wins (predicates) come from skipping the per-call Python
re-parse the old code did via the LRU cache hit; ``find()`` wins from
short-circuit + no factory calls on intermediate matches; descendants
are flatter because the bottleneck is libxml2 traversal itself.

## Validation

* test_elementpath: 28/28 OK on Python 3.11 and 3.14
* benchmark/verify_descendant.py: byte-identical findall() output
  vs released lxml across 8 descendant-style paths
* The xpath_tokenizer doctest in selftest.py (and a few tokenizer
  unit tests in test_elementpath.py) tested the *implementation* of
  the regex-based parser; they're skipped/removed since the new
  parser has no Python tokenizer to test.

## Files

* New: ``src/lxml/iterfind.pxi`` (~1100 lines), ``src/lxml/_elementpath.pxi``
* Removed: ``src/lxml/_elementpath.py``
* ``etree.pyx``: drop the ``from lxml import _elementpath`` import,
  ``include`` the two new pxi files, route the four call sites to the
  new cdef entry points
* ``setupinfo.py``: drop ``lxml._elementpath`` from COMPILED_MODULES
* ``selftest.py``, ``test_elementpath.py``: remove tests of the
  removed regex tokenizer
* ``benchmark/``: bench_compare, bench_elementpath, bench_find_scaling,
  bench_find_vs_iter, verify_descendant -- the benchmarks/cross-checks
  used to validate the rewrite

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@scoder scoder left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks – that's a long chunk of code.

Did you compile the _elementpath.py module when running your benchmarks? It certainly makes a difference whether you're timing the pure Python module or the Cython compiled one.

A certain advantage of the current implementation is that it is shared with CPython's implementation (just faster), so changes on either side are easy to incorporate in the other. This is lost with a new implementation. I'm not generally opposed to a faster implementation (I always advocate .find*() over .xpath() because it's faster most of the time), but this loss is part of the decision.

To reduce the amount of added code and keeping it more maintainable, I suggest implementing the tokeniser and parser part in (compiled) Python, and only implementing the tree walker in C-ish code. Parsing shouldn't normally be performance critical and the result can be cached, as before.

lxml.etree already has a few tree walker implementations (like the ElementDepthFirstIterator) that can probably be reused and extended here. In any case, the implementation should reuse as much of the existing infrastructure as possible. The _MultiTagMatcher is definitely worth using, for example, since it pre-caches the tag name pointers.

Your new tree iterator does not currently hold a reference to Python _Element objects. It needs to keep the relevant parts of the current tree alive across iteration steps, though, and for that, it needs to instantiate at least one element and keep a reference to it. Look at the existing tree iterators to see how it's done.

I'm also suspicious about the lazy tree walking. We need to make sure that the tree walker doesn't get diverted (more than reasonable) when users remove parts of the current tree between iteration steps. It's certainly debatable where we draw the line, but it would be good to keep the bar low.

Comment thread src/lxml/tests/selftest.py
Comment thread src/lxml/tests/test_elementpath.py Outdated
Comment on lines 28 to 29
@unittest.skip("internal _elementpath._cache API removed; replaced by C-level pxi")
def test_cache(self):

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test can be deleted.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good, I will do so

Comment thread src/lxml/tests/test_elementpath.py Outdated
Comment on lines +32 to +34
def _assert_tokens(self, tokens, path, namespaces=None):
if namespaces is None:
namespaces = self._empty_namespaces
self.assertEqual(tokens, list(self._elementpath.xpath_tokenizer(path, namespaces)))
# The Python xpath_tokenizer is gone; this helper is unused by skipped tests.
self.skipTest("internal xpath_tokenizer API removed")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Method to be removed.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in next commit

Comment thread src/lxml/iterfind.pxi

return NULL

# ---- C-level path tokenizer / compiler ----

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this necessarily needs to be done in C. A Python/Cython implementation should be fine and would probably be easier to maintain.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why I started to look into this were due to some performance issues I have in a tool using lxml. This change alone saved me 25% of startup time, not from the big wins in the comparison but from the small gain cases, and that is the reason why I sent this in. You might be right that this does not need to be done c-style, my guess is that the big win here is to leave the old regex based solution and going into a solution that does not have to allocate a number of objects for each search.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your observation that the repeated element instantiation is a costly part of the traversal is spot-on. It's probably the single most costly part of it. I've worked on that for the index predicates in #502

(Don't take that as rejecting your PR, it's just really low hanging fruit that can be done without replacing the whole implementation.)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, we lost the caching with your change, right? Meaning, it now parses the path each time it is evaluated? I'd rather go back to an LRU cache (mapping path strings to prepared evaluators) and keep the parser simple. For most applications, the parser performance really doesn't matter as long as each path is only parsed once (within reasonable cache size boundaries).

@scoder

scoder commented Apr 30, 2026

Copy link
Copy Markdown
Member
  findall("item[1]")                     1553.77   86.12  18.04x
  findall("item[last()]")                1596.55    2.69 593.51x
  findall("item[name]")                   288.86   17.38  16.62x

These are probably worth investigating separately. There might be easy wins in the current implementation (which CPython could possibly also benefit from).

etree.ElementTree(elem).iterfind, "/tag")


class ElementTreeElementPathTestCase(EtreeElementPathTestCase):

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test class is for compatibility comparisons and uses xml.etree.ElementTree and its ElementPath implementation. It should stay in as a canary to notice functional divergence between both implementations.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@mahomaho

mahomaho commented May 3, 2026

Copy link
Copy Markdown
Author

I am currently looking at the problem you pinpointed that a user may modify the lxml tree model after the iterator is created and before the iterator search has finished, leading to an access of freed memory. I hope that I will push a fix for this before the end of this week(the solution will be to save a reference to the last search result instead of the xml doc). I actually hope that this change will make the find even more efficient than my previous push

Compiled paths become a doubly-linked list of _IfSearcher instances,
one per step. The chain is self-driving: ``_first(scope)`` and
``_next(cursor)`` walk forward at this level (descending into
``self.next._first`` on a match, cascading via ``self.prev._next``
on exhaustion) and return the matched leaf xmlNode* directly --
or NULL when the chain is exhausted. Callers never see the chain
mechanics; they see only ``head._first(root)`` to start and
``leaf._next(prev_match)`` to resume.

Searcher state is minimal: ``_depth`` for the STEP_DESCENDANTS
walker (replacing a tree-top pointer with a hop counter) and
``_scope_c`` captured by ``_first`` for the cascade key invariant
"my scope IS prev's cursor, because prev's last match was the
scope it handed us". No per-searcher _Element refs, no per-yield
proxy allocation on intermediate steps -- only the leaf wraps.
The four hot methods (compile, _first, _next, _scan) are
@cython.final so per-yield calls go through direct dispatch.

Path tokenizer / compiler is inlined into ``_IfSearcher.compile``;
namespace prefix and default-namespace resolution happen there
directly. Predicate state uses a discriminated union
(_IfPredVariant + _IfPredType) over three variants (attr, node,
pos). Tag and attribute namespace constraints share a three-state
``c_href`` encoding: NULL = any (matches {*} and bare *), non-NULL
with len==0 = no-namespace sentinel (matches {} and bare names
when no default ns applies), non-NULL with len>0 = exact URI.
Path bytes and namespace str values are read straight from the
caller-pinned strings via PyUnicode_AsUTF8AndSize -- no copies.

The Python-facing layer in _elementpath.pxi is a thin shell:
- ``_elementpath_find`` calls ``head._first`` once.
- ``_elementpath_findall`` calls ``head._first`` then loops
  ``leaf._next``; the function-local ``namespaces`` parameter
  pins prefix/URI strs for the call's lifetime.
- ``_ElementPathIterator`` inherits _IfSearcher (so the iterator
  IS the head), caches the leaf returned by compile() in
  ``_searcher``, and dispatches first call -> self._first(root),
  subsequent calls -> self._searcher._next(prev_match). On
  exhaustion ``_searcher`` is nulled to enforce the
  StopIteration-is-sticky iterator contract. The shallow
  ``_namespaces`` dict copy and the latest yielded
  ``_current_element`` pin lifetimes (str caches and xmlDoc).

Restore ElementTreeElementPathTestCase as a canary that runs the
suite against stdlib's xml.etree.ElementPath, so behavioral
divergence between lxml and CPython surfaces in tests. Restore
selftest.py to ISO-8859-1 byte-for-byte against master apart
from the two structural removals (private import + dead
xpath_tokenizer doctest function).

Benchmarks vs pip lxml 6.1 (5000 iters, smaller is faster):
- find / findtext family: 3-7x faster.
- findall("item[1]") / findall("item[name]"): ~14-16x.
- findall("item[last()]"): 500x+ (Python ref re-counts siblings
  per candidate; the Cython walker counts forward once).
- raw walks (.//child, .//*): at parity to slightly faster.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mahomaho

mahomaho commented May 3, 2026

Copy link
Copy Markdown
Author

I pushed a slightly modified implementation. Main difference is the handling of tree modifications while iter searcher is alive and not finished. Previously I used the doc reference to keep tree alive, now I use current search result.

The new implementation has improved speed with 10-20%

@scoder scoder left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked through the details of the implementation, simply because it brings in way too much duplication of existing functionality. I do appreciate the goal of making the path evaluation fast, but any code duplication also means additional maintenance overhead in the longer term.

Comment thread src/lxml/iterfind.pxi
Comment on lines +918 to +924
# Populate the table from the user's namespaces dict. The
# prefix/URI byte pointers point into each str's internal
# UTF-8 cache (PyUnicode_AsUTF8AndSize), so the str objects
# -- and therefore the dict that holds them -- must outlive
# the searcher chain. Iterators keep a copy on themselves;
# find/findall keep the dict alive via the function-local
# parameter.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A user provided data structure should be copied before relying on its content to not change. This problem is already solved in several other spots in lxml (tag matching, XPath extensions, XSLT parameters, …), but this one can probably be eliminated by not replicating the tag matching code.

Comment thread src/lxml/iterfind.pxi

return NULL

# ---- C-level path tokenizer / compiler ----

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, we lost the caching with your change, right? Meaning, it now parses the path each time it is evaluated? I'd rather go back to an LRU cache (mapping path strings to prepared evaluators) and keep the parser simple. For most applications, the parser performance really doesn't matter as long as each path is only parsed once (within reasonable cache size boundaries).

Comment thread src/lxml/_elementpath.pxi

@cython.final
@cython.internal
cdef class _ElementPathIterator(_IfSearcher):

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please try if you can base this class on _ElementMatchIterator. At least the _MultiTagMatcher should be usable. It pre-caches the tag name pointers to replace slow string comparisons with fast pointer comparisons. While it might not be perfectly designed for an incremental path search, it's probably still acceptable since its comparisons are really fast, possibly even fast enough to keep all tag names of the path in one matcher and adding a second level filter afterwards that checks if the match it found is what we expect at the current path level. Path expressions usually only contain very few different tag names with little risk of ambiguity.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hear what you say and I appreciate your feedback. I will first focus om performance. I can admit that I didn't expect any feedback at all when I pushed the PR. What I find interesting is the potential for performance improvements. I am actually working on another PR where I merge the libxml2 and python node/element object into one to improve performance. While working on that one I found area around find and since it was quite standalone I decided to try and create a standalone PR for that one.
My main target is an open source python package for Autosar configuration stored in xml. I am working on performance optimizations and next step would be to either go c/cython or do something around xml handling. I kind of like lxml and I have integrated many features from there so I decided to see what could be done and that is where I am right now.

I hope that I will get some time to investigate what you say and try out how it would impact the results. It is a simple thing to try out when you have help from AI, things that you used to think about trying out but never happended because there never were time enough can now be tried out in an afternoon.

@scoder

scoder commented May 4, 2026

Copy link
Copy Markdown
Member

BTW, I understand that this implementation is geared more towards performance than integration and maintainability. That is not a criticism of the implementation as such, but prevents me (the maintainer) from accepting it into mainline lxml in its current state.

In order to keep its current performance characteristics (and to keep them under your own control), one thing that you can do is create your own PyPI package that reimplements ElementPath on top of lxml for your own purposes. lxml has a C-API that allows getting at the underlying libxml2 data structure, which should be enough to get your implementation going. I consider the functionality sufficiently large, self-contained and independent to merit its own separate package.

That said, I'd be happy to integrate as much of the relevant improvements into mainline lxml as I can afford from a maintenance perspective, so further work on this PR is appreciated.

@mahomaho

Copy link
Copy Markdown
Author

Hi again,

I have now had time to analyze your feedback, and I first must say that I appreciate your feedback and I totally understand your point about maintainability. Now to the details: I have looked at your suggestions about reuse and I do not see a straight path here and therefore I reach out to you for some feedback before I start to rewrite things.

I have analyzed the _ElementMatchIterator and it has major similarities with my iterator class. But there is a problem to resuse it if I am correct:
The class has a "walker" stateless function pointer to walk the model. It also has a filter test function to test if current node fulfills the request.

The problem is the stateless walker function. When doing a "xpath" search then the walker function depends on where you are in the search. Lets say that you have an xpath like "./a//b" then the walker shall first walk the shildren and search for a, then it shall walk all decendants and search for b. My solution for that is the linked list of searcher objects, one list element per level with different walker functions.

What I can do is to merge these classes. I would then delete my _ElementPathIterator and update the _ElementMatchIterator to something in the _ElementPathIterator direction. Second step would be to delete merge my searcher class with the _MultiTagMatcher. Finally I would update all points where these functions are used.

Is this what you are after? This will doulbe the size of the PR but I do not mind doing the job.

@scoder

scoder commented Jun 5, 2026

Copy link
Copy Markdown
Member

The _ElementMatchIterator itself is intentionally limited. It's the base class for lxml's simple iterators that iterate children, siblings, ancestors. There is also the ElementDepthFirstIterator for deep tree traversal, which comes closer to what path traversal would need. I'd like a path based iterator to integrate with these classes, or at least with their design if something new is needed, with the intention to keep the overall complexity low (and knowing that path traversal is more complex than what is there).

you have an xpath like "./a//b" then the walker shall first walk the shildren and search for a, then it shall walk all decendants and search for b. My solution for that is the linked list of searcher objects, one list element per level with different walker functions.

Yes, something like that is needed. It's also somewhat how the current Python generator based implementation works, it creates one generator per path construct/level and then chains them. The main issue there is that it processes Python Element objects. In order to implement a step-by-step path iterator, we need a series of "next node" traversal functions together with selector state. We already have most of the required iterators, but they always start from Python Element instances, which, again, we try to avoid here. The optimisation goal should be to push the selection into the libxml2 tree level and only instantiate tree nodes that match the whole path, not just intermediate parts of it.

Taking the "list of searcher objects" literally, what do you think about simply running down a Python list of Python searcher objects to process the path step by step? They can expose a cdef method that takes an xmlNode* and returns the next xmlNode*, which can then be passed into the method of the subsequent searcher object. A NULL return value makes it back up to the previous searcher. That keeps the traversal functionality and its selector configuration together (although the traversal could probably be extracted into a simple C function to make it reusable by itself – some already exist, some are part of iterator classes). And this seems a much simpler design than storing everything in low-level C structures. Certainly easier to extend.

@scoder

scoder commented Jun 5, 2026

Copy link
Copy Markdown
Member

only instantiate tree nodes that match the whole path, not just intermediate parts of it.

This is becoming even more interesting with the newly added locking in lxml 7.0, both to support Free-threading Python and to make thread concurrency generally more robust. The most visible side-effect is that Element object instantiation is somewhat slower, which hits the current ElementPath implementation quite hard, plus 10-30%, due to the sheer number of one-time visited elements. If the whole path traversal was done at the C node level, the whole tree could be locked once for reading and then traversed in one go, as is currently the case for XPath expressions (evaluated by libxml2 entirely).

@scoder

scoder commented Jun 8, 2026

Copy link
Copy Markdown
Member

Taking the "list of searcher objects" literally

Please see #511

I borrowed from your ideas but didn't reuse your code, I hope you don't mind. I really appreciate that you brought up the discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants