Comprehensive open-source library of AI research and engineering skills for any AI model. Package the skills and your claude code/codex/gemini agent will be an AI research agent with full horsepowe…

TeX 2,273 189 Updated Feb 3, 2026

ZhuLinsen / daily_stock_analysis

LLM驱动的 A/H/美股智能分析器，多数据源行情 + 实时新闻 + Gemini 决策仪表盘 + 多渠道推送，零成本，纯白嫖，定时运行

Python 9,258 9,683 Updated Feb 3, 2026

microsoft / VibeVoice

Open-Source Frontier Voice AI

Python 22,873 2,497 Updated Feb 3, 2026

SalesforceAIResearch / DiffusionDPO

Code for "Diffusion Model Alignment Using Direct Preference Optimization"

Python 658 46 Updated Nov 10, 2025

ASLP-lab / VoiceSculptor

An instruct text-to-speech solution based on LLaSA and CosyVoice2 developed by the ASLP lab and collaborators.

Python 207 11 Updated Jan 20, 2026

zai-org / CogVideo

text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)

Python 12,395 1,251 Updated Nov 4, 2025

Soul-AILab / SoulX-FlashTalk

SoulX-FlashTalk is the first 14B model to achieve sub-second start-up latency (0.87s) while maintaining a real-time throughput of 32 FPS on an 8xH800 node.

Python 458 40 Updated Jan 30, 2026

Lightricks / LTX-2

Official Python inference and LoRA trainer package for the LTX-2 audio–video generative model.

Python 3,481 456 Updated Jan 29, 2026

facebookresearch / sam-audio

The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trained model checkpoints, and example notebooks that show how t…

Python 3,266 275 Updated Jan 5, 2026

facebookresearch / perception_models

State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!

Jupyter Notebook 2,135 141 Updated Jan 22, 2026

xid32 / NAACL_2025_TWM

We introduce temporal working memory (TWM), which aims to enhance the temporal modeling capabilities of Multimodal foundation models (MFMs). This plug-and-play module can be easily integrated into …

Python 311 30 Updated Nov 26, 2025

baaivision / EVA

EVA Series: Visual Representation Fantasies from BAAI

Python 2,642 189 Updated Aug 1, 2024

encord-team / ebind

A 5-way embedding model for text, audio, image, video, and 3D point clouds.

Python 12 3 Updated Nov 13, 2025

encord-team / E-MM1

A dataset of 100M connections between 5 different modalities.

58 5 Updated Nov 14, 2025

jose-solorzano / audio-denoiser

Uses machine learning to denoise audio containing speech

Python 49 3 Updated Jun 22, 2024

ekazakos / grove

Code implementation for the paper "Large-scale Pre-training for Grounded Video Caption Generation" (ICCV 2025)

Python 28 1 Updated Jan 18, 2026

HKUST-C4G / AnyTalker

AnyTalker: Scaling Multi-person Talking Video Generation with Interactivity Refinement

Python 276 40 Updated Dec 5, 2025

NVlabs / describe-anything

[ICCV 2025] Implementation for Describe Anything: Detailed Localized Image and Video Captioning

Python 1,447 87 Updated Jun 26, 2025

facebookresearch / grounded-video-description

Video Grounding and Captioning

Python 332 73 Updated Oct 12, 2021

wzk1015 / Awesome-Vision-to-Music-Generation

[ISMIR 2025] A curated list of vision-to-music generation: methods, datasets, evaluation and challenges.

118 3 Updated Aug 9, 2025

Xiaohao-Liu / Awesome-Vison2Audio

A curated list of Vision (video/image) to Audio Generation

96 4 Updated Nov 22, 2025

Soul-AILab / SoulX-Podcast

SoulX-Podcast is an inference codebase by the Soul AI team for generating high-fidelity podcasts from text.

Python 3,128 403 Updated Dec 11, 2025

FireRedTeam / FireRedASR

Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR benchmarks, while also offering outstanding singing lyrics rec…

Python 1,750 159 Updated Jan 29, 2026

wsntxxn / UniFlow-Audio

Python 68 4 Updated Dec 30, 2025

zeyuxie29 / AudioTime

Python 37 Updated Jul 4, 2024

motion-generation

sign-language-recognition-system

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fengjiasun

Block or report fengjiasun

Starred repositories

anthropics / skills

QVerisAI / QVerisBot

Robbyant / lingbot-world

Robbyant / lingbot-vla

declare-lab / TangoFlux

Orchestra-Research / AI-research-SKILLs