-
-
Notifications
You must be signed in to change notification settings - Fork 66
Description
Welcome, dear bear friends, to the possible unveiling of @beartype's new built-in tensor typing API. It doesn't exist yet – but it could. And daydreaming is half the battle. This is all I learned from my childhood.
Currently, @beartype just farms out all responsibility for typing tensors to third-party packages like @patrick-kidger's wondrous jaxtyping. That's fine, of course. That works. Sure. No problem-o. We're all BFFLs here.
But third-party dependencies get awkward after awhile. Because it's not simply jaxtyping, is it? It never ends at a single dependency, does it? It's a brutal gauntlet of third-party packages like pandera and deal and the twenty-thousand pound QA gorilla Pydantic and the list just goes on and on. All of these packages purport to work well with one another, but so infrequently do. There's always friction at the interface between two packages – let alone n packages whose intersection is usually the empty set. Which brings us to...
Pydantic: The Twenty-Thousand Pound QA Gorilla
LangGraph + PydanticAI now dominates the LLM space. We're probably all aware of that by now. LangChain? Old hat already. AutoGen? New hat but the hat is painful. CrewAI? New hat mostly only for rapid prototyping. Which means that Pydantic effectively dominates the Python space. And I ask myself:
"How did this wondrous magic come to be? How did Pydantic rise above its competitors to so thoroughly dominate the power law distribution for Python packages?"
There are many answers – but the simplest is just that Pydantic literally does everything. Pydantic users don't have to seek outside Pydantic. Want JSON? Pydantic's got that already. Schema inference? Pydantic. Type casting? Pydantic. Data ingestion? Pydantic. And so on ad nauseum.
Pydantic's a monolithic mecha-kaiju with batteries included. Pydantic is the Systemd of the Python world. It does everything you think it does and everything else you didn't think it could possibly do. How did this feature request become an advertisement for Pydantic!?!? 😮💨
@beartype will never be monolithic in the way that Pydantic is monolithic. @beartype generally prefers the UNIX philosophy of: "Do one thing and do that thing well." But UNIX philosophy is a gradated spectrum of possibility. There's no practical justification for @beartype to literally just do one thing and only one thing.
@beartype can do many things and still be @beartype. Which leads us to...
beartype.hint: A New Subpackage for a New Millenium
Gods! What a lame one-liner! Let's never use that slogon in anything users will see. 😂
beartype.hint will be a new public @beartype API. It's gonna be great! beartype.hint will define PEP-compliant and mypy-friendly type hint factories that are generically usable by any runtime type-checker – Pydantic, typeguard, or otherwise.
beartype.hint type factories will include:
beartype.hint.Tensor[...]a type hint factory for type-checking tensors defined by arbitrary third-party packages – including tensors defined by PyTorch, JAX, NumPy, SciPy, and so on.beartype.hint.Tensor[...]type hints are subscripted by tensor types, dtypes, ndims, and/or shapes. The syntax is the same old familiar Pythonictypingsyntax we're all familiar with – only extended to tensors:beartype.hint.Tensor[numpy.ndarray, int, typing.Literal[3]], a three-dimensional NumPy array of integers (of any size).beartype.hint.Tensor[torch.Tensor, torch.float, tuple[typing.Literal[2560, 1440]]], a two-dimensional PyTorch tensor of floats (of any size) and the exact shape2560 x 1440.
The exact signature of beartype.hint.Tensor is a bit arduous to spec out, because Python doesn't even have the concept of a "type hint factory signature". Still, it looks something like a series of overloads subscripted by increasingly many child type hints:
- The unsubscripted
beartype.hint.Tensorattribute, matching any possible tensor from any third-party package. No idea if this is usable, but could be fun to support. beartype.hint.Tensor[{tensor_type}], matching any tensor of the single third-party type{tensor_type}.beartype.hint.Tensor[{tensor_type}, {tensor_dtype}], matching any tensor of the single third-party type{tensor_type}whose dtype is a subtype of{tensor_dtype}.beartype.hint.Tensor[{tensor_type}, typing.Literal[{tensor_ndim}]], matching any tensor of the single third-party type{tensor_type}whose number of dimensions is exactly{tensor_ndim}.beartype.hint.Tensor[{tensor_type}, tuple[typing.Literal[{tensor_dimension_1_size}], ..., typing.Literal[{tensor_dimension_N_size}]]], matching any tensor of the single third-party type{tensor_type}whose shape (i.e., size of theNdimensions of this tensor) is exactly{tensor_dimension_1_size}through{tensor_dimension_N_size}.- All possible permutations and combinations of the above. The only constraint is that the first child hint is always
{tensor_type}.
Is implementing PEP-compliant and mypy-friendly type hint factories that are generically usable by any runtime type-checker even feasible, though? It's trivial. In fact, it's so trivial I already specced out a working solution over at #522. No problem-o. Even as I said that, though, my face was sweating. 😰 🥵
Since beartype.hint.Tensor[...] type hints are the most succinct description of tensors, beartype.hint.Tensor[...] type hints are what most users are likely to use as actual type hints in end user apps. Under the hood, though, @beartype will reduce beartype.hint.Tensor[...] type hints to equivalent...
@beartype Tensor Validators: A New Victor Emerges from the Rubble of the @beartype API
This feature request sure got long fast, didn't I? We're exhausted – and so are you. So, let's just finish up by exhibiting a few new beartype.vale validators unique to typing tensors. Users will be welcome to use these longer-winded public validators, even though nobody wants to:
typing.Annotated[{tensor_type}, IsTensorDtype[{tensor_dtype}]], semantically equivalent to the more compact formbeartype.hint.Tensor[{tensor_type}, {tensor_dtype}]outlined above.typing.Annotated[{tensor_type}, IsTensorNdim[{tensor_ndim}]], semantically equivalent to the more compact formbeartype.hint.Tensor[{tensor_type}, typing.Literal[{tensor_ndim}]]outlined above.typing.Annotated[{tensor_type}, IsTensorShape[{tensor_shape}]], semantically equivalent to the honestly far more verbose formbeartype.hint.Tensor[{tensor_type}, tuple[typing.Literal[{tensor_dimension_1_size}], ..., typing.Literal[{tensor_dimension_N_size}]]]]outlined above. Python makes PEP-compliant type hints usetyping.Literalfor literally (...get it?) every magic number in a type hint. Interestingly, this ensures that the @beartype tensor validator approach beats out thebeartype.hint.Tensor[...]approach in terms of readability. Whatevah!
In Conclusion, I Have a Tired Face
That's it. That's @beartype tensor type hints. A similar approach can be extended to Pandas and Polars dataframes by defining a new beartype.hint.DataFrame[...] type hint factory backed by corresponding new beartype.vale validators.
Is any of this magic feasible? Absolutely. In fact, not only is this magic feasible, but this magic is trivially feasible. I should have done it years ago. So why didn't I?
Laziness. Actually, it's even worse than laziness. It's foolish ideology. I foolishly believed a bit too zealously in the UNIX philosophy. I wanted to help co-create a rich and diverse ecosystem of small little Python packages that each worked together to form a much larger and even more complete holonomy of vibrant software holons, all harmoniously working in concert for the good of all.
In other words, I was dumb. I should have just done what Pydantic did, which was to do everything and do everything well. Why farm essential type-checking work out to sibling third-party packages when @beartype could just do all of that work itself, right? Sure, it's more work for me – but it's a lot less work for you, the user. And you, the user, are most of what matters here.
beartype.hint.Tensor[...]: because users matter more than @leycec's sanity. 🥲