Lists (1)
Sort Name ascending (A-Z)
Stars
This project is the official implementation of 'DreamOmni2: Multimodal Instruction-based Editing and Generation''
High-Quality Text-to-Video Generation with Alpha Channel
Convolutional Neural Networks to predict the aesthetic and technical quality of images.
[CVPR 2025 Highlight] DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos
[ICCV 2025, Oral] TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos via Diffusion Models
Best and simplest tool for website change detection, web page monitoring, and website change alerts. Perfect for tracking content changes, price drops, restock alerts, and website defacement monito…
"RAG-Anything: All-in-One RAG Framework"
A lightweight yet powerful audio-to-MIDI converter with pitch bend detection
Deezer source separation library including pretrained models.
VoxCPM: Tokenizer-Free TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
A pipeline to read lips and generate speech for the read content, i.e Lip to Speech Synthesis.
HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning
Unlimited-length talking video generation that supports image-to-video and video-to-video generation
Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation.
An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
🎯 告别信息过载,AI 助你看懂新闻资讯热点,简单的舆情监控分析 - 多平台热点聚合+基于 MCP 的AI分析工具。监控35个平台(抖音、知乎、B站、微博等),智能筛选+自动推送+AI对话分析(用自然语言深度挖掘新闻:趋势追踪、情感分析、相似检索等13种工具)。支持企业微信/飞书/钉钉/Telegram/邮件/ntfy推送,30秒网页部署,1分钟手机通知,无需编程。支持Docker部署⭐ 让…
MOSS-TTSD is a spoken dialogue generation model that enables expressive dialogue speech synthesis in both Chinese and English, supporting zero-shot multi-speaker voice cloning, and long-form speech…
Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.
A Unified Framework for Expressive Speech Synthesis with Voice Cloning
DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022; Official code
A research prototype of a human-centered web agent