-
University of Toronto
- Toronto, ON, Canada
- https://clarivy.github.io/
- https://scholar.google.com/citations?user=GZ5aDcUAAAAJ
Highlights
- Pro
Stars
Productive, portable, and performant GPU programming in Python.
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.
The repository provides code for running inference with the SAM 3D Body Model (3DB), links for downloading the trained model checkpoints and datasets, and example notebooks that show how to use the…
A generative speech model for daily dialogue.
[ICCV 2025] Official code for AnimateAnyMesh: A Feed-Forward 4D Foundation Model for Text-Driven Universal Mesh Animation
✨✨[NeurIPS 2025] VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation
[NeurIPS 2025 Spotlight] Official repository for “Puppeteer: Rig and Animate Your 3D Models”
(ICCV 2025) DepR: Depth Guided Single-view Scene Reconstruction with Instance-level Diffusion
SparseFlex: High-Resolution and Arbitrary-Topology 3D Shape Modeling
[SIGGRAPH Asia 2025 (ACM TOG)] AnySplat: Feed-forward 3D Gaussian Splatting from Unconstrained Views
[SIGGRAPH 2025] One Model to Rig Them All: Diverse Skeleton Rigging with UniRig
[CVPR 2025] Official repository for “MagicArticulate: Make Your 3D Models Articulation-Ready”
[NeurIPS'2024] Zero-Shot Scene Reconstruction from Single Images with Deep Prior Assembly
We write your reusable computer vision tools. 💜