Skip to content
View SimJeg's full-sized avatar

Block or report SimJeg

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

The evaluation framework for training-free sparse attention in LLMs

Python 100 6 Updated Jun 19, 2025

[NeurIPS 2025 D&B] 🚀 SWE-bench Goes Live!

Python 123 15 Updated Sep 25, 2025

The #1 open-source SWE-bench Verified implementation

Python 828 150 Updated Jun 9, 2025

Reference implementation of the Jupyter Notebook format

Python 303 158 Updated Oct 6, 2025

An extremely fast Python package and project manager, written in Rust.

Rust 69,525 2,096 Updated Oct 9, 2025

[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention

C++ 764 52 Updated Mar 6, 2025

The NVIDIA NeMo Agent toolkit is an open-source library for efficiently connecting and optimizing teams of AI agents.

Python 1,416 381 Updated Oct 10, 2025

[ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads

Python 494 34 Updated Feb 10, 2025

NeMo Retriever extraction is a scalable, performance-oriented document content and metadata extraction microservice. NeMo Retriever extraction uses specialized NVIDIA NIM microservices to find, con…

Python 2,747 268 Updated Oct 9, 2025

LLM KV cache compression made easy

Python 648 66 Updated Oct 9, 2025

Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of GPT-Fast, a simple, PyTorch-native generation codebase.

Python 147 15 Updated Aug 9, 2024

The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.

Jupyter Notebook 89 5 Updated Jul 17, 2025

A collection of LogitsProcessors to customize and enhance LLM behavior for specific tasks.

Python 359 24 Updated Jul 8, 2025

📰 Must-read papers on KV Cache Compression (constantly updating 🤗).

554 13 Updated Sep 30, 2025

Awesome LLM compression research papers and tools.

1,676 108 Updated Jul 2, 2025
Python 20 2 Updated Apr 17, 2025

♟️ Vectorized RL game environments in JAX

Python 531 37 Updated Mar 6, 2025

A framework for few-shot evaluation of language models.

Python 10,301 2,772 Updated Oct 9, 2025

This repo contains the source code for RULER: What’s the Real Context Size of Your Long-Context Language Models?

Python 1,319 115 Updated Oct 9, 2025

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 59,886 7,346 Updated Oct 9, 2025

Code for the paper "Visual Anagrams: Generating Multi-View Optical Illusions with Diffusion Models"

Jupyter Notebook 947 94 Updated Jun 22, 2024

Generative Representational Instruction Tuning

Jupyter Notebook 673 50 Updated Jun 25, 2025

Universal markup converter

Haskell 39,536 3,649 Updated Oct 6, 2025

Create and modify Word documents with Python

Python 5,241 1,240 Updated Jun 17, 2025

Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.

64,667 7,191 Updated Jun 4, 2025
Python 28 3 Updated Oct 3, 2023

A guidance language for controlling large language models.

Jupyter Notebook 20,821 1,118 Updated Oct 8, 2025

Structured Outputs

Python 12,673 640 Updated Oct 8, 2025

PyTorch code and models for the DINOv2 self-supervised learning method.

Jupyter Notebook 11,675 1,098 Updated Aug 17, 2025

Library for Digital Pathology Image Processing

Python 425 63 Updated Oct 7, 2025
Next