Skip to content
View CodecLT's full-sized avatar

Block or report CodecLT

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

This project is the official implementation of 'DreamOmni2: Multimodal Instruction-based Editing and Generation''

Python 2,372 202 Updated Oct 20, 2025

High-Quality Text-to-Video Generation with Alpha Channel

Python 271 20 Updated Oct 1, 2025

Convolutional Neural Networks to predict the aesthetic and technical quality of images.

Python 2,199 457 Updated Jul 12, 2024

一个利用 AI 制作漫画的工具,支持脚本创作、分镜设计和角色风格控制。

TypeScript 695 108 Updated Sep 16, 2025

[CVPR 2025 Highlight] DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos

Python 1,465 82 Updated Jul 29, 2025

[ICCV 2025, Oral] TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos via Diffusion Models

Python 784 39 Updated Aug 8, 2025

Best and simplest tool for website change detection, web page monitoring, and website change alerts. Perfect for tracking content changes, price drops, restock alerts, and website defacement monito…

Python 28,366 1,575 Updated Nov 5, 2025

"RAG-Anything: All-in-One RAG Framework"

Python 9,981 1,181 Updated Oct 20, 2025

A lightweight yet powerful audio-to-MIDI converter with pitch bend detection

Python 4,385 392 Updated Nov 5, 2025

Deezer source separation library including pretrained models.

Python 27,706 3,045 Updated Apr 2, 2025

VoxCPM: Tokenizer-Free TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning

Python 2,021 214 Updated Oct 9, 2025

A pipeline to read lips and generate speech for the read content, i.e Lip to Speech Synthesis.

Python 93 22 Updated Jul 23, 2025

HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning

Python 798 92 Updated Oct 19, 2025

​​Unlimited-length talking video generation​​ that supports image-to-video and video-to-video generation

Python 2,951 457 Updated Aug 25, 2025
Python 4,540 362 Updated Jun 12, 2025

Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation.

Python 1,195 85 Updated Sep 22, 2025

An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System

Python 14,799 1,671 Updated Oct 30, 2025

🎯 告别信息过载,AI 助你看懂新闻资讯热点,简单的舆情监控分析 - 多平台热点聚合+基于 MCP 的AI分析工具。监控35个平台(抖音、知乎、B站、微博等),智能筛选+自动推送+AI对话分析(用自然语言深度挖掘新闻:趋势追踪、情感分析、相似检索等13种工具)。支持企业微信/飞书/钉钉/Telegram/邮件/ntfy推送,30秒网页部署,1分钟手机通知,无需编程。支持Docker部署⭐ 让…

Python 4,946 3,624 Updated Oct 31, 2025

Open CapCut API.

Python 1,195 267 Updated Nov 5, 2025

Sing to Midi 🎶

Python 7 1 Updated Jun 10, 2025

SOME: Singing-Oriented MIDI Extractor.

Python 617 53 Updated Jan 18, 2025

MOSS-TTSD is a spoken dialogue generation model that enables expressive dialogue speech synthesis in both Chinese and English, supporting zero-shot multi-speaker voice cloning, and long-form speech…

Python 1,008 87 Updated Nov 4, 2025

Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.

Python 5,922 323 Updated Sep 30, 2025

A Unified Framework for Expressive Speech Synthesis with Voice Cloning

Python 380 32 Updated Aug 18, 2025

Frontier Open-Source Text-to-Speech

9,851 1,249 Updated Sep 5, 2025

Spark-TTS Inference Code

Python 10,679 1,139 Updated Apr 9, 2025

DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022; Official code

Python 4,641 779 Updated Mar 19, 2025

A research prototype of a human-centered web agent

Python 7,904 818 Updated Nov 3, 2025
Next