Skip to content
View pengchengu's full-sized avatar
:shipit:
:shipit:

Block or report pengchengu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

SoTA open-source TTS

Python 16,456 2,248 Updated Dec 15, 2025

The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trained model checkpoints, and example notebooks that show how t…

Python 2,106 147 Updated Dec 19, 2025

Fun-ASR is an end-to-end speech recognition large model launched by Tongyi Lab.

Python 443 25 Updated Dec 19, 2025

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

Python 14,035 1,457 Updated Dec 19, 2025

GLM-4.6V/4.5V/4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Python 2,052 139 Updated Dec 18, 2025

An Open Phone Agent Model & Framework. Unlocking the AI Phone for Everyone

Python 18,058 2,821 Updated Dec 19, 2025

MiMo-Embodied

Python 319 11 Updated Nov 21, 2025

Humanity's Last Exam

Python 1,276 80 Updated Oct 7, 2025

Central repository for biomolecular foundation models with shared trainers and pipeline components

Python 582 74 Updated Dec 20, 2025
Python 57 2 Updated Aug 8, 2025

robosuite: A Modular Simulation Framework and Benchmark for Robot Learning

Python 2,103 632 Updated Dec 2, 2025
8 Updated Nov 19, 2025

Long-form streaming TTS system for multi-speaker dialogue generation

Python 1,293 118 Updated Oct 26, 2025

Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR benchmarks, while also offering outstanding singing lyrics rec…

Python 1,675 153 Updated Sep 22, 2025

CLI interfaces & config objects, from types

Python 905 38 Updated Dec 18, 2025

EverMemOS is an open-source, enterprise-grade intelligent memory system. Our mission is to build AI memory that never forgets, making every conversation built on previous understanding.

Python 1,325 123 Updated Dec 15, 2025

Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages

Python 2,488 213 Updated Dec 16, 2025

OpenOCR: A general OCR system with accuracy and efficiency. Supporting 24 Scene Text Recognition methods trained from scratch on large-scale real datasets, and will continue to add the latest methods.

Python 834 71 Updated Dec 4, 2025

A lightweight LMM-based Document Parsing Model

Python 6,379 441 Updated Dec 8, 2025

docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.

Python 5,701 602 Updated Dec 14, 2025

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

Python 28,589 3,511 Updated Dec 5, 2025

A Lightweight Face Recognition and Facial Attribute Analysis (Age, Gender, Emotion and Race) Library for Python

Python 21,326 2,896 Updated Dec 17, 2025

face recognition training project(pytorch)

Python 481 85 Updated Nov 12, 2021

OpenAI CLIP text encoders for multiple languages!

Jupyter Notebook 823 69 Updated May 15, 2023

Depth Pro: Sharp Monocular Metric Depth in Less Than a Second.

Python 5,095 387 Updated Apr 21, 2025

Core ML tools contain supporting tools for Core ML model conversion, editing, and validation.

Python 5,081 744 Updated Dec 17, 2025
Python 8,673 518 Updated Oct 9, 2024
Python 316 29 Updated Dec 17, 2024

Embedding Atlas is a tool that provides interactive visualizations for large embeddings. It allows you to visualize, cross-filter, and search embeddings and metadata.

TypeScript 4,465 237 Updated Dec 16, 2025

This repository contains the official implementation of "FastVLM: Efficient Vision Encoding for Vision Language Models" - CVPR 2025

Python 7,063 522 Updated May 5, 2025
Next