- Taipei
Lists (11)
Sort Name ascending (A-Z)
Stars
Fun-ASR is an end-to-end speech recognition large model launched by Tongyi Lab.
RealSee3D: A multi-view RGB-D dataset combining real-world captures and procedurally generated scenes, with extensible annotations for diverse 3D vision research.
🛜 ESPectre 👻 - Motion detection system based on Wi-Fi spectre analysis (CSI), with Home Assistant integration.
Official implementation for "Android in the Zoo: Chain-of-Action-Thought for GUI Agents" (Findings of EMNLP 2024)
Sharp Monocular View Synthesis in Less Than a Second
Official repo of "Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens"
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
[NeurIPS 2025] Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance
A real-time streaming conversational video system that transforms text interactions into continuous, high-fidelity video responses using autoregressive diffusion.
[AAAI'26 Oral] DexGraspVLA: A Vision-Language-Action Framework Towards General Dexterous Grasping
HY-World 1.5: A Systematic Framework for Interactive World Modeling with Real-Time Latency and Geometric Consistency
Native and Compact Structured Latents for 3D Generation
[3DV 2026] "SceneGen: Single-Image 3D Scene Generation in One Feedforward Pass"
Visionary: The World Model Carrier Built on WebGPU-Powered Gaussian Splatting Platform
The official repository of "Astra : General Interactive World Model with Autoregressive Denoising"
DDGS | Dux Distributed Global Search. A metasearch library that aggregates results from diverse web search services
Repo for Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent
A next.js web application that integrates AI capabilities with draw.io diagrams. This app allows you to create, modify, and enhance diagrams through natural language commands and AI-assisted visual…
✨✨[NeurIPS 2025] VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
开源移动端车型识别 Mobile Plateform Vehicle Identification Model
[arXiv 2025] MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence
An Open Phone Agent Model & Framework. Unlocking the AI Phone for Everyone