Skip to content
View fanpu's full-sized avatar

Organizations

@15-411-f20

Block or report fanpu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Python 2,208 365 Updated Aug 14, 2025

Memray is a memory profiler for Python

Python 14,700 432 Updated Dec 15, 2025

The nnsight package enables interpreting and manipulating the internals of deep learned models.

Jupyter Notebook 743 64 Updated Dec 23, 2025

Tooling for exact and MinHash deduplication of large-scale text datasets

Rust 46 4 Updated Dec 20, 2025

[ICML'24] Data and code for our paper "Training-Free Long-Context Scaling of Large Language Models"

Python 447 23 Updated Oct 16, 2024

Flexible and powerful tensor operations for readable and reliable code (for pytorch, jax, TF and others)

Python 9,333 392 Updated Nov 24, 2025

uops.info Code Analyzer

Python 307 23 Updated Jan 14, 2024

High-performance MinHash implementation in Rust with Python bindings for efficient similarity estimation and deduplication of large datasets

Python 223 22 Updated Dec 6, 2025

LSH index for approximate set containment search

Go 61 12 Updated Jun 27, 2022

All-pair set similarity search on millions of sets in Python and on a laptop

Python 604 42 Updated Oct 11, 2022

Python extension for MurmurHash (MurmurHash3), a set of fast and robust hash functions.

C 355 73 Updated Dec 1, 2025

MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW

Python 2,841 314 Updated Dec 19, 2025

Fast Python Bloom Filter using Mmap

C 133 26 Updated Sep 14, 2025

A specification that python filesystems should adhere to.

Python 1,263 424 Updated Dec 17, 2025

Text utilities, including beam search decoding, tokenizing, and more, built for use in Flashlight.

C++ 78 17 Updated Dec 16, 2025
Python 88 11 Updated Dec 7, 2025

A simple n-gram language model.

Python 12 2 Updated Sep 11, 2018
Python 214 10 Updated Oct 27, 2025

KenLM: Faster and Smaller Language Model Queries

C++ 2,708 533 Updated Mar 30, 2025

The RedPajama-Data repository contains code for preparing large datasets for training large language models.

Python 4,905 368 Updated Dec 7, 2024

Computing with Python functions.

Python 4,303 443 Updated Dec 15, 2025

Ship correct and fast LLM kernels to PyTorch

Python 127 15 Updated Dec 18, 2025

Python logging made (stupidly) simple

Python 23,335 759 Updated Dec 20, 2025

Parallel S3 and local filesystem execution tool.

Go 3,809 325 Updated Jun 13, 2025

MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024.

Python 1,400 85 Updated Apr 21, 2025

Streaming WARC/ARC library for fast web archive IO

Python 441 66 Updated Dec 10, 2024

[ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule

Python 403 23 Updated Sep 15, 2025

cuDF - GPU DataFrame Library

C++ 9,396 995 Updated Dec 24, 2025

Github Pages template based upon HTML and Markdown for personal, portfolio-based websites.

SCSS 16,123 4,644 Updated Dec 21, 2025
Next