-
SCUT
- Guangzhou
-
02:38
(UTC +08:00) - https://scholar.google.com/citations?user=dW7AgfgAAAAJ&hl=zh-CN
Stars
Official implementation of UnifiedReward & [NeurIPS 2025] UnifiedReward-Think
微舆:人人可用的多Agent舆情分析助手,打破信息茧房,还原舆情原貌,预测未来走向,辅助决策!从0实现,不依赖任何框架。
HunyuanImage-3.0: A Powerful Native Multimodal Model for Image Generation
[EMNLP 2025] ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents
Official PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"
Production First and Production Ready End-to-End Speech Recognition Toolkit
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL.
This is a continuously updated handbook for readers to easily track the latest Text-to-SQL techniques in the literature and provide practical guidance for researchers and practitioners.
Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Andr…
Utilizes ONNX Runtime for audio denoising.
DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
kaldi-asr/kaldi is the official location of the Kaldi project.
Robust Speech Recognition via Large-Scale Weak Supervision
OneCAT: Decoder-Only Auto-Regressive Model for Unified Understanding and Generation
Awesome curated collection of images and prompts generated by gemini-2.5-flash-image (aka Nano Banana) state-of-the-art image generation and editing model. Explore AI generated visuals created with…
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
Reference PyTorch implementation and models for DINOv3
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
Multilingual Document Layout Parsing in a Single Vision-Language Model
OCRFlux is a lightweight yet powerful multimodal toolkit that significantly advances PDF-to-Markdown conversion, excelling in complex layout handling, complicated table parsing and cross-page conte…
One Tiny RAG-Powered LLM Framework: Knowledge-Enhanced Generative AI Demo
The official repository of the dots.vlm1 instruct models proposed by rednote-hilab.
gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI
Renderer for the harmony response format to be used with gpt-oss
Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.