minghsuanwu

Ming-Hsuan Wu minghsuanwu

Writing code is like building a Lego. Choose what brick you need and put them together. Sometimes, you do it by yourself. Sometimes, you work with other people.

12 followers · 50 following

Taipei

Lists (11)

Sort

Stars

FunAudioLLM / Fun-ASR

Fun-ASR is an end-to-end speech recognition large model launched by Tongyi Lab.

Python 533 32 Updated Dec 24, 2025

Action-State-Labs / android-action-kernel

Python 1,218 157 Updated Dec 21, 2025

realsee-developer / RealSee3D

RealSee3D: A multi-view RGB-D dataset combining real-world captures and procedurally generated scenes, with extensible annotations for diverse 3D vision research.

Python 211 8 Updated Dec 18, 2025

francescopace / espectre

🛜 ESPectre 👻 - Motion detection system based on Wi-Fi spectre analysis (CSI), with Home Assistant integration.

C 4,000 279 Updated Dec 24, 2025

HumanMLLM / R1-Omni

Python 989 69 Updated Mar 24, 2025

IMNearth / CoAT

Official implementation for "Android in the Zoo: Chain-of-Action-Thought for GUI Agents" (Findings of EMNLP 2024)

Python 96 5 Updated Oct 14, 2024

apple / ml-sharp

Sharp Monocular View Synthesis in Less Than a Second

Python 5,079 323 Updated Dec 19, 2025

Wakals / CoVT

Official repo of "Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens"

Python 229 12 Updated Dec 9, 2025

NVIDIA-NeMo / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 16,354 3,243 Updated Dec 24, 2025

spacecontrol3d / spacecontrol

Python 39 2 Updated Dec 23, 2025

ali-vilab / Wan-Move

[NeurIPS 2025] Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance

Python 464 18 Updated Dec 19, 2025

zai-org / RealVideo

A real-time streaming conversational video system that transforms text interactions into continuous, high-fidelity video responses using autoregressive diffusion.

Python 253 36 Updated Dec 15, 2025

Psi-Robot / DexGraspVLA

[AAAI'26 Oral] DexGraspVLA: A Vision-Language-Action Framework Towards General Dexterous Grasping

Python 453 33 Updated Aug 10, 2025

Tencent-Hunyuan / HY-WorldPlay

HY-World 1.5: A Systematic Framework for Interactive World Modeling with Real-Time Latency and Geometric Consistency

Python 739 45 Updated Dec 24, 2025

microsoft / TRELLIS.2

Native and Compact Structured Latents for 3D Generation

Python 2,333 162 Updated Dec 23, 2025

Mengmouxu / SceneGen

[3DV 2026] "SceneGen: Single-Image 3D Scene Generation in One Feedforward Pass"

Jupyter Notebook 233 13 Updated Dec 15, 2025

Visionary-Laboratory / visionary

Visionary: The World Model Carrier Built on WebGPU-Powered Gaussian Splatting Platform

Python 352 15 Updated Dec 15, 2025

EternalEvan / Astra

The official repository of "Astra : General Interactive World Model with Autoregressive Denoising"

Python 172 3 Updated Dec 24, 2025

deedy5 / ddgs

DDGS | Dux Distributed Global Search. A metasearch library that aggregates results from diverse web search services

Python 2,033 196 Updated Dec 19, 2025

Alibaba-NLP / OmniSearch

Repo for Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent

Python 402 29 Updated Apr 22, 2025

LOG1997 / log-lottery

🎈🎈🎈🎈年会抽奖程序，threejs+vue3 3D球体动态抽奖应用。

Vue 2,071 437 Updated Dec 24, 2025

THUDM / MobileRL

Python 43 3 Updated Dec 23, 2025

DayuanJiang / next-ai-draw-io

A next.js web application that integrates AI capabilities with draw.io diagrams. This app allows you to create, modify, and enhance diagrams through natural language commands and AI-assisted visual…

TypeScript 14,988 1,543 Updated Dec 25, 2025