Starred repositories
UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition
Math OCR model that outputs LaTeX and markdown
Implementation of Nougat Neural Optical Understanding for Academic Documents
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
A latent text-to-image diffusion model
RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly trained like a GPT transformer (parallelizable). We are at RWKV-7 "Goose". So it's combining the best of RN…
A multi-voice TTS system trained with an emphasis on quality
High-Resolution Image Synthesis with Latent Diffusion Models
🦙 LaMa Image Inpainting, Resolution-robust Large Mask Inpainting with Fourier Convolutions, WACV 2022
ILVR: Conditioning Method for Denoising Diffusion Probabilistic Models (ICCV 2021 Oral)
Official PyTorch implementation of StyleGAN3
Code for Transformers Solve Limited Receptive Field for Monocular Depth Prediction
Unsupervised Training Data Generation of Handwritten Formulas using Generative Adversarial Networks with Self-Attention
An arbitrary face-swapping framework on images and videos with one single trained model!
functorch is JAX-like composable function transforms for PyTorch.
katie-lim / LaTeX-OCR
Forked from lukas-blecher/LaTeX-OCRpix2tex: Using a ViT to convert images of equations into LaTeX code.
Fast, differentiable sorting and ranking in PyTorch
This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows".
Math formula recognition (Images to LaTeX strings)
Taming Transformers for High-Resolution Image Synthesis
Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
A Deep Learning based project for creating line art portraits.
This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Multimedia 2020. For HD commercial model, please try out Sync Labs