- Ho chi minh city, Vietnam
Stars
pix2tex: Using a ViT to convert images of equations into LaTeX code.
The simplest, fastest repository for training/finetuning medium-sized GPTs.
The easiest way to make yourself the hero of video memes.
DiffFace: Diffusion-based Face Swapping with Facial Guidance
[CVPR 2025] Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Video Diffusion Transformer
State-of-the-art 2D and 3D Face Analysis Project
Multilingual Document Layout Parsing in a Single Vision-Language Model
UniSpeech - Large Scale Self-Supervised Learning for Speech
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Speaker detection using a lip movement based RNN detector
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
LLM agents built for control. Designed for real-world use. Deployed in minutes.
Nhận dạng giọng nói Tiếng Việt sử dụng model Quartznet (Nvidia) + flask demo
convert phoneme to grapheme vietnames
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
🔊 A comprehensive list of open-source datasets for voice and sound computing (95+ datasets).
An AI agent development platform with all-in-one visual tools, simplifying agent creation, debugging, and deployment like never before. Coze your way to AI Agent creation.
Official inference framework for 1-bit LLMs
Rapid Product Development with n8n, published by Packt