-
Radboud University
- Netherlands
- https://orcid.org/0000-0002-7129-6799
Highlights
- Pro
Stars
Dynamic cluster-based data sampling for efficient and long-tail-aware vision-language model pre-training.
A codebase and a curated list of awesome deep long-tailed learning (TPAMI 2023).
Code for the paper: "SuS-X: Training-Free Name-Only Transfer of Vision-Language Models" [ICCV'23]
Embedding Atlas is a tool that provides interactive visualizations for large embeddings. It allows you to visualize, cross-filter, and search embeddings and metadata.
Provides a baseline of how NCCL inits should look like using PyTorch distributed
Automatically create Faiss knn indices with the most optimal similarity search parameters.
A playbook for systematically maximizing the performance of deep learning models.
CLIPF: Contrastive Language-Image Pre-training with Word Frequency Masking — a frequency-based text masking strategy for efficient VLM pre-training (WACV 2026)
Official code for the paper "Does CLIP's Generalization Performance Mainly Stem from High Train-Test Similarity?" (ICLR 2024)
[EE499] Semantic Deduplication for Data Efficient Learning
fastdup is a powerful, free tool designed to rapidly generate valuable insights from image and video datasets. It helps enhance the quality of both images and labels, while significantly reducing d…
Centered Masking for Language-Image Pre-training
Textual Concept Expansion with Commonsense Knowledge to Improve Dual-Stream Image-Text Matching
Boost LaTeX typesetting efficiency with preview, compile, autocomplete, colorize, and more.
Collection of AWESOME vision-language models for vision tasks
This list of writing prompts covers a range of topics and tasks, including brainstorming research ideas, improving language and style, conducting literature reviews, and developing research plans.
A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.
VISSL is FAIR's library of extensible, modular and scalable components for SOTA Self-Supervised Learning with images.
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.
Code release for SLIP Self-supervision meets Language-Image Pre-training
COYO-700M: Large-scale Image-Text Pair Dataset
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
An open source implementation of CLIP.
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
A simple tool to update bib entries with their official information (e.g., DBLP or the ACL anthology).