Lists (1)
Sort Name ascending (A-Z)
Stars
Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
openvla / openvla
Forked from TRI-ML/prismatic-vlmsOpenVLA: An open-source vision-language-action model for robotic manipulation.
[CoRL 2024] HumanPlus: Humanoid Shadowing and Imitation from Humans
Code for BAKU: An Efficient Transformer for Multi-Task Policy Learning
We write your reusable computer vision tools. 💜
Beyond Language Models: Byte Models are Digital World Simulators
Code and implementations for the ACL 2025 paper "AgentGym: Evolving Large Language Model-based Agents across Diverse Environments" by Zhiheng Xi et al.
🤗 LeRobot: Making AI for Robotics more accessible with end-to-end learning
RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots
Humanoid-Gym: Reinforcement Learning for Humanoid Robot with Zero-Shot Sim2Real Transfer https://arxiv.org/abs/2404.05695
4D Panoptic Scene Graph Generation (NeurIPS'23 Spotlight)
[IROS '24 Oral] DarkGS: Building 3DGS in the dark with a torch.
Build private and secure AI products that run in your cloud.
DORA (Dataflow-Oriented Robotic Architecture) is middleware designed to streamline and simplify the creation of AI-based robotic applications. It offers low latency, composable, and distributed dat…
Zero-Shot Speech Editing and Text-to-Speech in the Wild
AI wearables. Put it on, speak, transcribe, automatically
Open-source AI for voice control, rivaling Alexa and Siri
VisionOS App + Python Library to stream hand tracking data from Vision Pro, video/audio stream to Vision Pro.
Emote Portrait Alive: Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
Foundational model for human-like, expressive TTS
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
✨ Light and Fast AI Assistant. Support: Web | iOS | MacOS | Android | Linux | Windows
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)