Infer item type of comprehensions #7687
Conversation
|
Testing script: #!/usr/bin/env bash
# Demo: comprehension item-type inference. Run from the repo root.
T=$(mktemp -d)
cat > $T/issue_example.pyx <<'EOF'
# cython: language_level=3, infer_types=True
from cython cimport typeof
def issue_example():
cdef list[int] x = [1, 2, 3, 4]
result = [(a, a + 1) for a in x]
return result, typeof(result)
EOF
PYTHONPATH=$(pwd) /tmp/cy313venv/bin/python \
-m Cython.Build.Cythonize -i $T/issue_example.pyx >/dev/null 2>&1
/tmp/cy313venv/bin/python - <<PYEOF
import sys
sys.path.insert(0, "$T")
from issue_example import issue_example
result, type_str = issue_example()
print(result)
print(type_str)
PYEOF
|
|
Do we need to add more tests for this? I removed a few from my original commit because i thought they were not needed. |
|
Also, to achieve the best case (getting FOr
I think you meant the same thing with this right |
That's already on its way in #7686 |
| # DictComprehensionAppendNode is defined later in this module, so | ||
| # dispatch by attribute shape to avoid a forward reference. |
There was a problem hiding this comment.
Forward references inside of the same module are just fine (inside of methods), but here, I think we should rather use recursion and ask the append node to infer its type.
There was a problem hiding this comment.
So in DictComprehensionAppendNode and ComprehensionAppendNode I can add the infer_type(env) method and then ComprehensionNode.infer_type can just collect these right?
|
For r = [(a, a + 1) for a in xs]
print(typeof(r)) # list[(int, long)] object <- claims CTuple items
print(type(r[0])) # <class 'tuple'> <- actual: Python tuple |
Tuple items are not type-inferred yet, pretty much for that reason. |
It's more concrete to infer the underlying C types rather than the Python types. A ctuple seems as good as it gets for that. Note that the item types only become relevant when requesting the items (and potentially unpacking them). They don't need to represent the actual storage used inside of the list/set/dict. It'll be a tradeoff in the end, though. If we automatically unpack the tuple values into ctuples that are then passed back into some Python API as Python tuples, we end up wasting time and space converting back and forth for no good reason. But if the user code then unpacks the items into C variables and processes them as C values further on, we can have big wins. Overall, I would expect the latter case to be more relevant, so let's keep the ctuples until we find a problem with it. |
|
@scoder Agreed, keeping the ctuples. The implementation already does that. |
|
@scoder there were 10 tests failing , so i added a fix related to sliced/nested objects,.... and now 5 are failing.. |
|
@scoder any suggestions? |
# Conflicts: # Cython/Compiler/ExprNodes.py
Looking at this in hindsight, I think most of the failures were problems we were having with the master branch at the time. So hopefully it's better now.... We shall see. |
|
thanks @da-woods , looks like all tests have passed |
| # Bail early for container types that cannot be parameterised. | ||
| if not self.type.supports_container_type: | ||
| return self.type | ||
| # Resolve names in the comprehension's own scope. NameNode.infer_type | ||
| # caches its entry, so using the outer env here would steal the inner | ||
| # loop variable's binding (regression: list_comp_in_closure_T598). | ||
| body_env = self.expr_scope or env | ||
| # Propagate the iterator's item type into the loop target before | ||
| # inspecting the body. Without this, `[a for a in xs]` where | ||
| # xs: list[int] sees `a` as py_object because ControlFlowAnalysis | ||
| # has not yet typed the comprehension's loop variable. | ||
| self._propagate_loop_target_type(env) |
There was a problem hiding this comment.
It's difficult to spot the code between the long comments.
| # Bail early for container types that cannot be parameterised. | |
| if not self.type.supports_container_type: | |
| return self.type | |
| # Resolve names in the comprehension's own scope. NameNode.infer_type | |
| # caches its entry, so using the outer env here would steal the inner | |
| # loop variable's binding (regression: list_comp_in_closure_T598). | |
| body_env = self.expr_scope or env | |
| # Propagate the iterator's item type into the loop target before | |
| # inspecting the body. Without this, `[a for a in xs]` where | |
| # xs: list[int] sees `a` as py_object because ControlFlowAnalysis | |
| # has not yet typed the comprehension's loop variable. | |
| self._propagate_loop_target_type(env) | |
| if not self.type.supports_container_type: | |
| return self.type | |
| body_env = self.expr_scope or env |
I've removed the target type assignment. See my other comment.
| item_types = [] | ||
| for item_type in self.append.infer_item_types(body_env): | ||
| if (item_type is None | ||
| or item_type is py_object_type | ||
| or item_type.is_error | ||
| or item_type.is_unspecified | ||
| or item_type.is_fused): | ||
| return self.type | ||
| item_types.append(item_type) |
There was a problem hiding this comment.
I don't think unspecified types should ever originate from a type inference method, so we shouldn't handle them specially here. I also don't think the error type needs to be handled specially. It probably won't hurt to specialise the comprehension type with it.
Why are fused types an issue here?
If possible, I'd just do the following:
| item_types = [] | |
| for item_type in self.append.infer_item_types(body_env): | |
| if (item_type is None | |
| or item_type is py_object_type | |
| or item_type.is_error | |
| or item_type.is_unspecified | |
| or item_type.is_fused): | |
| return self.type | |
| item_types.append(item_type) | |
| item_types = self.append.infer_item_types(body_env) |
| if (target is None or not getattr(target, 'is_name', False) | ||
| or target.entry is None): |
There was a problem hiding this comment.
It's a node, there should be an .is_name attribute.
| if (target is None or not getattr(target, 'is_name', False) | |
| or target.entry is None): | |
| if target is None or not target.is_name or target.entry is None: |
| item_type = sequence_type.infer_iterator_type() | ||
| if (item_type is None | ||
| or item_type is py_object_type | ||
| or item_type.is_unspecified): | ||
| return |
There was a problem hiding this comment.
| item_type = sequence_type.infer_iterator_type() | |
| if (item_type is None | |
| or item_type is py_object_type | |
| or item_type.is_unspecified): | |
| return | |
| item_type = sequence_type.infer_iterator_type() | |
| if item_type is None or item_type is py_object_type: | |
| return |
| # Propagate the iterator's item type into the loop target before | ||
| # inspecting the body. Without this, `[a for a in xs]` where | ||
| # xs: list[int] sees `a` as py_object because ControlFlowAnalysis | ||
| # has not yet typed the comprehension's loop variable. | ||
| self._propagate_loop_target_type(env) |
There was a problem hiding this comment.
Hmmm. Looking at this now, I think this is the wrong place to assign a type because it drops a side-effect into .infer_type(). That type analysis step shouldn't modify the tree, it should really only analyse it. I think this part needs to go. I've removed the method call from the reordering above. We can do it in .analyse_types() later, maybe even rather in the for-loop node than here.
Although … why is this an issue at all? The item values that we collect in the comprehension are necessarily Python objects. Does it matter whether they are typed as anything but py_object_type ?
This PR fixes the issue #7684 where
cython.typeof([(a, a+1) for a in x])reports a barelist objecteven when the items have a known shape. and Adds tests for this intests/run/type_inference_comprehensions.pyx.ComprehensionNode.infer_typewas a one-line stub that returnedself.type. Now it looks at the body expression (self.append.exprfor list/set,key_expr/value_exprfor dict), infers each item's type, and callsspecialize_hereon the container. Falls back to the bare container whenever any item is unresolvable, so no path can regress.After the change: