Stars
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.
A huge dataset for Document Visual Question Answering
Official Implementation of Paella https://arxiv.org/abs/2211.07292v2
Localized watermarking for AI-generated speech audios, with SOTA on robustness and very fast detector
The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
Teaching materials for the applied machine learning course at Cornell Tech (online edition)
Pure python implementation of product quantization for nearest neighbor search
Official code Cross-Covariance Image Transformer (XCiT)
LeViT a Vision Transformer in ConvNet's Clothing for Faster Inference
PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO
A PyTorch library and evaluation platform for end-to-end compression research
This technique modifies image data so that any model trained on it will bear an identifiable mark.
This repository reproduces the results of the paper: "Fixing the train-test resolution discrepancy" https://arxiv.org/abs/1906.06423
Matlab/Mex implementation of Aggregated Selective Match Kernels for Image Retrieval (published in ICCV 2013)
Code for "MultiGrain: a unified image embedding for classes and instances"
Open source implementation of "Spreading Vectors for Similarity Search"
Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking
Code for my PhD thesis. Library of quantization-based methods for fast similarity search in high dimensions. Presented at ECCV 18.
How should we evaluate supervised hashing
Implements an efficient softmax approximation as described in the paper "Efficient softmax approximation for GPUs" (http://arxiv.org/abs/1609.04309)
Low-shot learning with large-scale diffusion
A library for Multilingual Unsupervised or Supervised word Embeddings
InferSent sentence embeddings
A python tool for evaluating the quality of sentence embeddings.
A flexible tool for creating, organizing, and sharing visualizations of live, rich data. Supports Torch and Numpy https://visdom.dev
Library for fast text representation and classification.