Skip to content

[Feature Request] Native Tensor Typing API #543

@leycec

Description

@leycec

Welcome, dear bear friends, to the possible unveiling of @beartype's new built-in tensor typing API. It doesn't exist yet – but it could. And daydreaming is half the battle. This is all I learned from my childhood.

Currently, @beartype just farms out all responsibility for typing tensors to third-party packages like @patrick-kidger's wondrous jaxtyping. That's fine, of course. That works. Sure. No problem-o. We're all BFFLs here.

But third-party dependencies get awkward after awhile. Because it's not simply jaxtyping, is it? It never ends at a single dependency, does it? It's a brutal gauntlet of third-party packages like pandera and deal and the twenty-thousand pound QA gorilla Pydantic and the list just goes on and on. All of these packages purport to work well with one another, but so infrequently do. There's always friction at the interface between two packages – let alone n packages whose intersection is usually the empty set. Which brings us to...

Pydantic: The Twenty-Thousand Pound QA Gorilla

LangGraph + PydanticAI now dominates the LLM space. We're probably all aware of that by now. LangChain? Old hat already. AutoGen? New hat but the hat is painful. CrewAI? New hat mostly only for rapid prototyping. Which means that Pydantic effectively dominates the Python space. And I ask myself:

"How did this wondrous magic come to be? How did Pydantic rise above its competitors to so thoroughly dominate the power law distribution for Python packages?"

There are many answers – but the simplest is just that Pydantic literally does everything. Pydantic users don't have to seek outside Pydantic. Want JSON? Pydantic's got that already. Schema inference? Pydantic. Type casting? Pydantic. Data ingestion? Pydantic. And so on ad nauseum.

Pydantic's a monolithic mecha-kaiju with batteries included. Pydantic is the Systemd of the Python world. It does everything you think it does and everything else you didn't think it could possibly do. How did this feature request become an advertisement for Pydantic!?!? 😮‍💨

@beartype will never be monolithic in the way that Pydantic is monolithic. @beartype generally prefers the UNIX philosophy of: "Do one thing and do that thing well." But UNIX philosophy is a gradated spectrum of possibility. There's no practical justification for @beartype to literally just do one thing and only one thing.

@beartype can do many things and still be @beartype. Which leads us to...

beartype.hint: A New Subpackage for a New Millenium

Gods! What a lame one-liner! Let's never use that slogon in anything users will see. 😂

beartype.hint will be a new public @beartype API. It's gonna be great! beartype.hint will define PEP-compliant and mypy-friendly type hint factories that are generically usable by any runtime type-checker – Pydantic, typeguard, or otherwise.

beartype.hint type factories will include:

  • beartype.hint.Tensor[...] a type hint factory for type-checking tensors defined by arbitrary third-party packages – including tensors defined by PyTorch, JAX, NumPy, SciPy, and so on. beartype.hint.Tensor[...] type hints are subscripted by tensor types, dtypes, ndims, and/or shapes. The syntax is the same old familiar Pythonic typing syntax we're all familiar with – only extended to tensors:
    • beartype.hint.Tensor[numpy.ndarray, int, typing.Literal[3]], a three-dimensional NumPy array of integers (of any size).
    • beartype.hint.Tensor[torch.Tensor, torch.float, tuple[typing.Literal[2560, 1440]]], a two-dimensional PyTorch tensor of floats (of any size) and the exact shape 2560 x 1440.

The exact signature of beartype.hint.Tensor is a bit arduous to spec out, because Python doesn't even have the concept of a "type hint factory signature". Still, it looks something like a series of overloads subscripted by increasingly many child type hints:

  1. The unsubscripted beartype.hint.Tensor attribute, matching any possible tensor from any third-party package. No idea if this is usable, but could be fun to support.
  2. beartype.hint.Tensor[{tensor_type}], matching any tensor of the single third-party type {tensor_type}.
  3. beartype.hint.Tensor[{tensor_type}, {tensor_dtype}], matching any tensor of the single third-party type {tensor_type} whose dtype is a subtype of {tensor_dtype}.
  4. beartype.hint.Tensor[{tensor_type}, typing.Literal[{tensor_ndim}]], matching any tensor of the single third-party type {tensor_type} whose number of dimensions is exactly {tensor_ndim}.
  5. beartype.hint.Tensor[{tensor_type}, tuple[typing.Literal[{tensor_dimension_1_size}], ..., typing.Literal[{tensor_dimension_N_size}]]], matching any tensor of the single third-party type {tensor_type} whose shape (i.e., size of the N dimensions of this tensor) is exactly {tensor_dimension_1_size} through {tensor_dimension_N_size}.
  6. All possible permutations and combinations of the above. The only constraint is that the first child hint is always {tensor_type}.

Is implementing PEP-compliant and mypy-friendly type hint factories that are generically usable by any runtime type-checker even feasible, though? It's trivial. In fact, it's so trivial I already specced out a working solution over at #522. No problem-o. Even as I said that, though, my face was sweating. 😰 🥵

Since beartype.hint.Tensor[...] type hints are the most succinct description of tensors, beartype.hint.Tensor[...] type hints are what most users are likely to use as actual type hints in end user apps. Under the hood, though, @beartype will reduce beartype.hint.Tensor[...] type hints to equivalent...

@beartype Tensor Validators: A New Victor Emerges from the Rubble of the @beartype API

This feature request sure got long fast, didn't I? We're exhausted – and so are you. So, let's just finish up by exhibiting a few new beartype.vale validators unique to typing tensors. Users will be welcome to use these longer-winded public validators, even though nobody wants to:

  • typing.Annotated[{tensor_type}, IsTensorDtype[{tensor_dtype}]], semantically equivalent to the more compact form beartype.hint.Tensor[{tensor_type}, {tensor_dtype}] outlined above.
  • typing.Annotated[{tensor_type}, IsTensorNdim[{tensor_ndim}]], semantically equivalent to the more compact form beartype.hint.Tensor[{tensor_type}, typing.Literal[{tensor_ndim}]] outlined above.
  • typing.Annotated[{tensor_type}, IsTensorShape[{tensor_shape}]], semantically equivalent to the honestly far more verbose form beartype.hint.Tensor[{tensor_type}, tuple[typing.Literal[{tensor_dimension_1_size}], ..., typing.Literal[{tensor_dimension_N_size}]]]] outlined above. Python makes PEP-compliant type hints use typing.Literal for literally (...get it?) every magic number in a type hint. Interestingly, this ensures that the @beartype tensor validator approach beats out the beartype.hint.Tensor[...] approach in terms of readability. Whatevah!

In Conclusion, I Have a Tired Face

That's it. That's @beartype tensor type hints. A similar approach can be extended to Pandas and Polars dataframes by defining a new beartype.hint.DataFrame[...] type hint factory backed by corresponding new beartype.vale validators.

Is any of this magic feasible? Absolutely. In fact, not only is this magic feasible, but this magic is trivially feasible. I should have done it years ago. So why didn't I?

Laziness. Actually, it's even worse than laziness. It's foolish ideology. I foolishly believed a bit too zealously in the UNIX philosophy. I wanted to help co-create a rich and diverse ecosystem of small little Python packages that each worked together to form a much larger and even more complete holonomy of vibrant software holons, all harmoniously working in concert for the good of all.

In other words, I was dumb. I should have just done what Pydantic did, which was to do everything and do everything well. Why farm essential type-checking work out to sibling third-party packages when @beartype could just do all of that work itself, right? Sure, it's more work for me – but it's a lot less work for you, the user. And you, the user, are most of what matters here.

beartype.hint.Tensor[...]: because users matter more than @leycec's sanity. 🥲

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions