Skip to content

Infer item type of comprehensions #7687

Open
omkar-334 wants to merge 9 commits into
cython:masterfrom
omkar-334:infer-type
Open

Infer item type of comprehensions #7687
omkar-334 wants to merge 9 commits into
cython:masterfrom
omkar-334:infer-type

Conversation

@omkar-334

@omkar-334 omkar-334 commented May 13, 2026

Copy link
Copy Markdown
Contributor

This PR fixes the issue #7684 where cython.typeof([(a, a+1) for a in x]) reports a bare list object even when the items have a known shape. and Adds tests for this in tests/run/type_inference_comprehensions.pyx.

ComprehensionNode.infer_type was a one-line stub that returned self.type. Now it looks at the body expression (self.append.expr for list/set, key_expr/value_expr for dict), infers each item's type, and calls specialize_here on the container. Falls back to the bare container whenever any item is unresolvable, so no path can regress.

After the change:

[(a, a + 1) for a in x]      # list[tuple object] object   (was: list object)
[42 for _ in range(3)]        # list[long] object
{str(i): 42 for i in range(2)}  # dict[str object,long] object

@omkar-334

Copy link
Copy Markdown
Contributor Author

Testing script:

#!/usr/bin/env bash
# Demo: comprehension item-type inference. Run from the repo root.

T=$(mktemp -d)

cat > $T/issue_example.pyx <<'EOF'
# cython: language_level=3, infer_types=True
from cython cimport typeof

def issue_example():
    cdef list[int] x = [1, 2, 3, 4]
    result = [(a, a + 1) for a in x]
    return result, typeof(result)
EOF

PYTHONPATH=$(pwd) /tmp/cy313venv/bin/python \
    -m Cython.Build.Cythonize -i $T/issue_example.pyx >/dev/null 2>&1

/tmp/cy313venv/bin/python - <<PYEOF
import sys
sys.path.insert(0, "$T")
from issue_example import issue_example
result, type_str = issue_example()
print(result)
print(type_str)
PYEOF

Output
Screenshot 2026-05-13 at 2 05 58 PM

@omkar-334

omkar-334 commented May 13, 2026

Copy link
Copy Markdown
Contributor Author

Do we need to add more tests for this? I removed a few from my original commit because i thought they were not needed.

@omkar-334

omkar-334 commented May 13, 2026

Copy link
Copy Markdown
Contributor Author

Also, to achieve the best case (getting list[tuple[int, int]]) -do I need to change the infer_type method of TupleNode, ListNode, SetNode and DictNode?

FOr ListNode - Cython/Compiler/ExprNodes.py:9251: - There is already a TODO: Infer non-object list arrays.

You need to change .infer_type() for the comprehension nodes in ExprNodes.py to infer the type of the appended items.

I think you meant the same thing with this right

@scoder

scoder commented May 13, 2026

Copy link
Copy Markdown
Contributor

do I need to change the infer_type method of TupleNode, ListNode, SetNode and DictNode?

That's already on its way in #7686

Comment thread Cython/Compiler/ExprNodes.py Outdated
Comment on lines +9488 to +9489
# DictComprehensionAppendNode is defined later in this module, so
# dispatch by attribute shape to avoid a forward reference.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forward references inside of the same module are just fine (inside of methods), but here, I think we should rather use recursion and ask the append node to infer its type.

@omkar-334 omkar-334 May 13, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in DictComprehensionAppendNode and ComprehensionAppendNode I can add the infer_type(env) method and then ComprehensionNode.infer_type can just collect these right?

@omkar-334

omkar-334 commented May 13, 2026

Copy link
Copy Markdown
Contributor Author

so with these changes im getting this
image
The last case - it ends at tuple, even though the tuple contains further items inside it
There seems to be some issue with PyTupleObject and CTupleType

@omkar-334

Copy link
Copy Markdown
Contributor Author

For [(a, a + 1) for a in xs] with xs: cdef list[int], typeof printslist[(int, long)] object. But the list actually stores Python tuples at runtime, not C tuples (comprehensions always box items). The displayed type is misleading.

  r = [(a, a + 1) for a in xs]
  print(typeof(r))      # list[(int, long)] object   <- claims CTuple items
  print(type(r[0]))     # <class 'tuple'>            <- actual: Python tuple

@scoder

scoder commented May 13, 2026

Copy link
Copy Markdown
Contributor

There seems to be some issue with PyTupleObject and CTupleType

Tuple items are not type-inferred yet, pretty much for that reason.

@scoder

scoder commented May 13, 2026

Copy link
Copy Markdown
Contributor

the list actually stores Python tuples at runtime, not C tuples

It's more concrete to infer the underlying C types rather than the Python types. A ctuple seems as good as it gets for that. Note that the item types only become relevant when requesting the items (and potentially unpacking them). They don't need to represent the actual storage used inside of the list/set/dict.

It'll be a tradeoff in the end, though. If we automatically unpack the tuple values into ctuples that are then passed back into some Python API as Python tuples, we end up wasting time and space converting back and forth for no good reason. But if the user code then unpacks the items into C variables and processes them as C values further on, we can have big wins. Overall, I would expect the latter case to be more relevant, so let's keep the ctuples until we find a problem with it.

@omkar-334

Copy link
Copy Markdown
Contributor Author

@scoder Agreed, keeping the ctuples. The implementation already does that. [(a, a+1) for a in x] infers list[(int, long)]... what do you think about the current implementation?

@omkar-334

Copy link
Copy Markdown
Contributor Author

@scoder there were 10 tests failing , so i added a fix related to sliced/nested objects,.... and now 5 are failing..

@omkar-334

Copy link
Copy Markdown
Contributor Author

@scoder any suggestions?

# Conflicts:
#	Cython/Compiler/ExprNodes.py
@da-woods

da-woods commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

and now 5 are failing..

Looking at this in hindsight, I think most of the failures were problems we were having with the master branch at the time. So hopefully it's better now.... We shall see.

@omkar-334

Copy link
Copy Markdown
Contributor Author

thanks @da-woods , looks like all tests have passed

Comment on lines +9485 to +9496
# Bail early for container types that cannot be parameterised.
if not self.type.supports_container_type:
return self.type
# Resolve names in the comprehension's own scope. NameNode.infer_type
# caches its entry, so using the outer env here would steal the inner
# loop variable's binding (regression: list_comp_in_closure_T598).
body_env = self.expr_scope or env
# Propagate the iterator's item type into the loop target before
# inspecting the body. Without this, `[a for a in xs]` where
# xs: list[int] sees `a` as py_object because ControlFlowAnalysis
# has not yet typed the comprehension's loop variable.
self._propagate_loop_target_type(env)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's difficult to spot the code between the long comments.

Suggested change
# Bail early for container types that cannot be parameterised.
if not self.type.supports_container_type:
return self.type
# Resolve names in the comprehension's own scope. NameNode.infer_type
# caches its entry, so using the outer env here would steal the inner
# loop variable's binding (regression: list_comp_in_closure_T598).
body_env = self.expr_scope or env
# Propagate the iterator's item type into the loop target before
# inspecting the body. Without this, `[a for a in xs]` where
# xs: list[int] sees `a` as py_object because ControlFlowAnalysis
# has not yet typed the comprehension's loop variable.
self._propagate_loop_target_type(env)
if not self.type.supports_container_type:
return self.type
body_env = self.expr_scope or env

I've removed the target type assignment. See my other comment.

Comment on lines +9497 to +9505
item_types = []
for item_type in self.append.infer_item_types(body_env):
if (item_type is None
or item_type is py_object_type
or item_type.is_error
or item_type.is_unspecified
or item_type.is_fused):
return self.type
item_types.append(item_type)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think unspecified types should ever originate from a type inference method, so we shouldn't handle them specially here. I also don't think the error type needs to be handled specially. It probably won't hurt to specialise the comprehension type with it.

Why are fused types an issue here?

If possible, I'd just do the following:

Suggested change
item_types = []
for item_type in self.append.infer_item_types(body_env):
if (item_type is None
or item_type is py_object_type
or item_type.is_error
or item_type.is_unspecified
or item_type.is_fused):
return self.type
item_types.append(item_type)
item_types = self.append.infer_item_types(body_env)

Comment on lines +9512 to +9513
if (target is None or not getattr(target, 'is_name', False)
or target.entry is None):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a node, there should be an .is_name attribute.

Suggested change
if (target is None or not getattr(target, 'is_name', False)
or target.entry is None):
if target is None or not target.is_name or target.entry is None:

Comment on lines +9520 to +9524
item_type = sequence_type.infer_iterator_type()
if (item_type is None
or item_type is py_object_type
or item_type.is_unspecified):
return

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
item_type = sequence_type.infer_iterator_type()
if (item_type is None
or item_type is py_object_type
or item_type.is_unspecified):
return
item_type = sequence_type.infer_iterator_type()
if item_type is None or item_type is py_object_type:
return

Comment on lines +9492 to +9496
# Propagate the iterator's item type into the loop target before
# inspecting the body. Without this, `[a for a in xs]` where
# xs: list[int] sees `a` as py_object because ControlFlowAnalysis
# has not yet typed the comprehension's loop variable.
self._propagate_loop_target_type(env)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm. Looking at this now, I think this is the wrong place to assign a type because it drops a side-effect into .infer_type(). That type analysis step shouldn't modify the tree, it should really only analyse it. I think this part needs to go. I've removed the method call from the reordering above. We can do it in .analyse_types() later, maybe even rather in the for-loop node than here.

Although … why is this an issue at all? The item values that we collect in the comprehension are necessarily Python objects. Does it matter whether they are typed as anything but py_object_type ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants