Starred repositories
Open-source release accompanying Gao et al. 2025
GPTAlgoPro / GLM-ASR
Forked from zai-org/GLM-ASRGLM-ASR-Nano: A robust, open-source speech recognition model with 1.5B parameters
GLM-ASR-Nano: A robust, open-source speech recognition model with 1.5B parameters
An Open Phone Agent Model & Framework. Unlocking the AI Phone for Everyone
TempoMaster (Juanwan Yayun) is a music education app designed for iOS that perfectly combines electronic piano playing with fun games. Whether for music enlightenment, learning musical notation, or…
Make Large Multimodal Models excel in object detection, ICCV 2025
A Hybrid Bandit Model with Visual Priors for Creative Ranking in Display Advertising
An open-source kit for agent development, integrated the powerful capabilities of Volcengine.
HunyuanVideo-1.5: A leading lightweight video generation model
A unified, comprehensive and efficient recommendation library
The repository provides code for running inference and finetuning with the Meta Segment Anything Model 3 (SAM 3), links for downloading the trained model checkpoints, and example notebooks that sho…
MiroThinker is a series of open-source agentic models trained for deep research and complex tool use scenarios.
Ollama client for iOS, Android, macOS, and Windows that simplifies experimenting with LLMs.
📝A simple and elegant markdown editor, available for Linux, macOS and Windows.
Code and implementations for the paper "AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning" by Zhiheng Xi et al.
Cocos simplifies game creation and distribution with Cocos Creator, a free, open-source, cross-platform game engine. Empowering millions of developers to create high-performance, engaging 2D/3D gam…
Cook up amazing multimodal AI applications effortlessly with MiniCPM-o
Yelp Simulator for WWW'25 AgentSociety Challenge
An index of recommendation algorithms that are based on Graph Neural Networks. (TORS)
An index for papers on large language model agents for recommendation and search.
GPTAlgoPro / OmniVinci
Forked from NVlabs/OmniVinciOmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.
OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.