Lists (29)
Sort Name ascending (A-Z)
3DGS
Agent
AIGC
Animation
Calibration
Concept
DIBR
DigitalHuman
Fusion
GPT
ImageTask2D
Library
LLM
LocoManip
MeshProcess
MM-Interaction
Motion
NERF
ObjectGeneration
Reconstruction
Render
Robot
SceneGen
Survey
Tools
VideoGen
VideoInterpolation
VLA
WorldModel
Starred repositories
Foundation Models and Data for Human-Human and Human-AI interactions.
[ICCV 2025] MotionStreamer: Streaming Motion Generation via Diffusion-based Autoregressive Model in Causal Latent Space
This is an official PyTorch implementation of "Gesture2Vec: Clustering Gestures using Representation Learning Methods for Co-speech Gesture Generation" (IROS 2022).
SnapMoGen: Human Motion Generation from Expressive Texts [NeurIPS 2025]
SimWorld: An Open-ended Realistic Simulator for Autonomous Agents in Physical and Social Worlds
Skills for Real Engineers. Straight from my .claude directory.
TextOp: Real-time Interactive Text-Driven Humanoid Robot Motion Generation and Control
This repository contains data pre-processing and visualization scripts used in GENEA Challenge 2022 and 2023. Check the repository's README.md file for instructions on how to use scripts yourself.
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
🎙️ 「大模型」从0训练0.1B能听能说能看的全模态Omni模型!A 0.1B Omni model trained from scratch, capable of listening, speaking, and seeing!
X-Talk is an open-source full-duplex cascaded spoken dialogue system framework enabling low-latency, interruptible, and human-like speech interaction with a lightweight, pure-Python, production-rea…
A curated list of full-duplex spoken dialogue models & benchmarks
Towards Self-Evolving Proactive AI with Perpetual Memory
SALMONN family: A suite of advanced multi-modal LLMs
LLM inference server with continuous batching & SSD caching for Apple Silicon — managed from the macOS menu bar
A high-throughput and memory-efficient inference and serving engine for LLMs
A production-grade, multi-modal voice gateway providing real-time audio-to-audio interaction, read-aloud TTS, transcription, and model introspection. Built on vLLM-Omni architecture with Qwen3 models.
Run Qwen3 Omni - A multimodal AI assistant demo
A catgirl who watches, reads, listens, and plays alongside you, powered by human-like memory and an embodied emotional engine. 🐱❤️一只会主动找你玩的 AI 猫娘。
This is a simple demonstration of more advanced, agentic patterns built on top of the Realtime API.
An agent-managed museum exhibit, built in Rust with Gajae-Code / LazyCodex — developed and maintained with no human intervention.
End-to-end realtime stack for connecting humans and AI
基于阿里云的tts, llm,stt模型构建的实时对话应用
🟢🌍2026最新超详细+极速+隐私 Hysteria2一键安装脚本,默认解锁GPT和奈飞;🛡️附带VPN 安全性检测指南
A framework for efficient model inference with omni-modality models
Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.
OmniStream: Mastering Perception, Reconstruction and Action in Continuous Streams