- Ho chi minh city, Vietnam
Stars
Robust Speech Recognition via Large-Scale Weak Supervision
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
The simplest, fastest repository for training/finetuning medium-sized GPTs.
all of the workflows of n8n i could find (also from the site itself)
Opiniated RAG for integrating GenAI in your apps 🧠 Focus on your product rather than the RAG. Easy integration in existing products with customisation! Any LLM: GPT4, Groq, Llama. Any Vectorstore: …
We have made you a wrapper you can't refuse
State-of-the-art 2D and 3D Face Analysis Project
FAIR's research platform for object detection research, implementing popular algorithms like Mask R-CNN and RetinaNet.
Official inference framework for 1-bit LLMs
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Intelligent automation and multi-agent orchestration for Claude Code
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
pix2tex: Using a ViT to convert images of equations into LaTeX code.
LLM agents built for control. Designed for real-world use. Deployed in minutes.
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Wan: Open and Advanced Large-Scale Video Generative Models
This repository contains code examples for the Stanford's course: TensorFlow for Deep Learning Research.
TensorFlow-based neural network library
VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
Text-audio foundation model from Boson AI
Numenta Platform for Intelligent Computing is an implementation of Hierarchical Temporal Memory (HTM), a theory of intelligence based strictly on the neuroscience of the neocortex.
Translate darknet to tensorflow. Load trained weights, retrain/fine-tune using tensorflow, export constant graph def to mobile devices
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Real-time face detection and emotion/gender classification using fer2013/imdb datasets with a keras CNN model and openCV.
Multilingual Document Layout Parsing in a Single Vision-Language Model