Lists (2)
Sort Name ascending (A-Z)
Stars
Robust Speech Recognition via Large-Scale Weak Supervision
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
A generative speech model for daily dialogue.
🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!
Code and documentation to train Stanford's Alpaca models, and generate the data.
Python sample codes and textbook for robotics algorithms.
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants
🔥 MaxKB is an open-source platform for building enterprise-grade agents. 强大易用的开源企业级智能体平台。
🤗 LeRobot: Making AI for Robotics more accessible with end-to-end learning
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Open Source framework for voice and multimodal conversational AI
本项目为xiaozhi-esp32提供后端服务,帮助您快速搭建ESP32设备控制服务器。Backend service for xiaozhi-esp32, helps you quickly build an ESP32 device control server.
🤖 wukong-robot 是一个简单、灵活、优雅的中文语音对话机器人/智能音箱项目,支持ChatGPT多轮对话能力,还可能是首个支持脑机交互的开源智能音箱项目。
Multilingual Voice Understanding Model
WebRTC and ORTC implementation for Python using asyncio
Have a natural, spoken conversation with AI!
PyTorch code and models for V-JEPA self-supervised learning from video.
Robotics Toolbox for Python
A Python-based Xiaozhi AI for users who want the full Xiaozhi experience without owning specialized hardware.
[IROS 2025 Award Finalist] The Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems
A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization
Re-implementation of pi0 vision-language-action (VLA) model from Physical Intelligence
[CoRL 2024] Open-TeleVision: Teleoperation with Immersive Active Visual Feedback
🔥 SpatialVLA: a spatial-enhanced vision-language-action model that is trained on 1.1 Million real robot episodes. Accepted at RSS 2025.
Educational Python library for manipulator motion planning