- Ho chi minh city, Vietnam
Stars
Robust Speech Recognition via Large-Scale Weak Supervision
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
The simplest, fastest repository for training/finetuning medium-sized GPTs.
all of the workflows of n8n i could find (also from the site itself)
Opiniated RAG for integrating GenAI in your apps 🧠 Focus on your product rather than the RAG. Easy integration in existing products with customisation! Any LLM: GPT4, Groq, Llama. Any Vectorstore: …
We have made you a wrapper you can't refuse
State-of-the-art 2D and 3D Face Analysis Project
Official inference framework for 1-bit LLMs
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Intelligent automation and multi-agent orchestration for Claude Code
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
LLM agents built for control. Designed for real-world use. Deployed in minutes.
pix2tex: Using a ViT to convert images of equations into LaTeX code.
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Wan: Open and Advanced Large-Scale Video Generative Models
TensorFlow-based neural network library
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
Text-audio foundation model from Boson AI
Numenta Platform for Intelligent Computing is an implementation of Hierarchical Temporal Memory (HTM), a theory of intelligence based strictly on the neuroscience of the neocortex.
Translate darknet to tensorflow. Load trained weights, retrain/fine-tune using tensorflow, export constant graph def to mobile devices
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Real-time face detection and emotion/gender classification using fer2013/imdb datasets with a keras CNN model and openCV.
Multilingual Document Layout Parsing in a Single Vision-Language Model
Python framework for creating, editing, and invoking Noisy Intermediate-Scale Quantum (NISQ) circuits.