-
NVIDIA
- Marseille, France
- @simon_jegou
- in/simon-jegou
Stars
The evaluation framework for training-free sparse attention in LLMs
[NeurIPS 2025 D&B] 🚀 SWE-bench Goes Live!
The #1 open-source SWE-bench Verified implementation
Reference implementation of the Jupyter Notebook format
An extremely fast Python package and project manager, written in Rust.
[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention
The NVIDIA NeMo Agent toolkit is an open-source library for efficiently connecting and optimizing teams of AI agents.
[ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
NeMo Retriever extraction is a scalable, performance-oriented document content and metadata extraction microservice. NeMo Retriever extraction uses specialized NVIDIA NIM microservices to find, con…
Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of GPT-Fast, a simple, PyTorch-native generation codebase.
The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.
A collection of LogitsProcessors to customize and enhance LLM behavior for specific tasks.
📰 Must-read papers on KV Cache Compression (constantly updating 🤗).
Awesome LLM compression research papers and tools.
A framework for few-shot evaluation of language models.
This repo contains the source code for RULER: What’s the Real Context Size of Your Long-Context Language Models?
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Code for the paper "Visual Anagrams: Generating Multi-View Optical Illusions with Diffusion Models"
Generative Representational Instruction Tuning
Create and modify Word documents with Python
Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
A guidance language for controlling large language models.
PyTorch code and models for the DINOv2 self-supervised learning method.
Library for Digital Pathology Image Processing