Skip to content
View lcy0604's full-sized avatar
😮‍💨
😮‍💨

Block or report lcy0604

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Official implementation of UnifiedReward & [NeurIPS 2025] UnifiedReward-Think

Python 598 39 Updated Nov 5, 2025

微舆:人人可用的多Agent舆情分析助手,打破信息茧房,还原舆情原貌,预测未来走向,辅助决策!从0实现,不依赖任何框架。

Python 14,547 2,504 Updated Nov 5, 2025

Native Multimodal Models are World Learners

Python 1,130 39 Updated Nov 5, 2025

HunyuanImage-3.0: A Powerful Native Multimodal Model for Image Generation

Python 2,372 103 Updated Oct 31, 2025

Contexts Optical Compression

Python 19,590 1,373 Updated Oct 25, 2025

[EMNLP 2025] ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents

Python 590 47 Updated Jun 12, 2025

Official PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"

Python 1,479 40 Updated Oct 15, 2025

Production First and Production Ready End-to-End Speech Recognition Toolkit

Python 4,887 1,166 Updated Nov 3, 2025

Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL.

Python 483 45 Updated Sep 8, 2025

This is a continuously updated handbook for readers to easily track the latest Text-to-SQL techniques in the literature and provide practical guidance for researchers and practitioners.

Python 1,111 69 Updated Nov 1, 2025

Speech-to-text, text-to-speech, speaker diarization, speech enhancement, source separation, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Andr…

C++ 8,711 963 Updated Nov 5, 2025

Utilizes ONNX Runtime for audio denoising.

Python 89 13 Updated Oct 9, 2025

End-to-End Speech Processing Toolkit

Python 9,563 2,343 Updated Nov 5, 2025

DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

C++ 26,646 4,090 Updated Jun 19, 2025

kaldi-asr/kaldi is the official location of the Kaldi project.

Shell 15,205 5,368 Updated Sep 22, 2025

Robust Speech Recognition via Large-Scale Weak Supervision

Python 90,393 11,321 Updated Sep 8, 2025

MCP for xiaohongshu.com

Go 6,688 981 Updated Nov 5, 2025

OneCAT: Decoder-Only Auto-Regressive Model for Unified Understanding and Generation

Python 229 5 Updated Sep 22, 2025

Awesome curated collection of images and prompts generated by gemini-2.5-flash-image (aka Nano Banana) state-of-the-art image generation and editing model. Explore AI generated visuals created with…

JavaScript 7,308 734 Updated Sep 8, 2025

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

Python 2,194 276 Updated Nov 5, 2025

Reference PyTorch implementation and models for DINOv3

Jupyter Notebook 8,118 547 Updated Nov 3, 2025

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"

Python 13,531 1,988 Updated Nov 3, 2025

Multilingual Document Layout Parsing in a Single Vision-Language Model

Python 5,583 562 Updated Oct 31, 2025

OCRFlux is a lightweight yet powerful multimodal toolkit that significantly advances PDF-to-Markdown conversion, excelling in complex layout handling, complicated table parsing and cross-page conte…

Python 2,358 146 Updated Aug 4, 2025

One Tiny RAG-Powered LLM Framework: Knowledge-Enhanced Generative AI Demo

Python 33 3 Updated Jul 7, 2025

The official repository of the dots.vlm1 instruct models proposed by rednote-hilab.

Dockerfile 262 6 Updated Sep 26, 2025

gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI

Python 19,105 1,901 Updated Nov 1, 2025

Renderer for the harmony response format to be used with gpt-oss

Rust 3,973 224 Updated Nov 5, 2025

Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.

Python 5,922 323 Updated Sep 30, 2025

将SmolVLM2的视觉头与Qwen3-0.6B模型进行了拼接微调

Python 423 44 Updated Sep 8, 2025
Next