Skip to content
View jakep-allenai's full-sized avatar
  • Seattle, WA
  • 11:08 (UTC -07:00)

Block or report jakep-allenai

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A fast AWS S3 browser, with inspiration from s5cmd

Rust 10 3 Updated Mar 12, 2026

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 69,968 8,545 Updated Apr 12, 2026

Rapid fuzzy string matching in Python using various string metrics

Python 3,836 151 Updated Apr 7, 2026

Access a database of word frequencies, in various natural languages.

Python 1,646 110 Updated Jan 4, 2025

🚀 Efficient implementations for emerging model architectures

Python 4,852 490 Updated Apr 12, 2026

utilities for batched llm calls with retries

Python 49 2 Updated Apr 8, 2026

High performance and CommonMark compliant HTML to Markdown converter. Maintained by the Kreuzberg team. Kreuzberg is a fast, polyglot document intelligence engine with a Rust core. It extracts stru…

HTML 643 55 Updated Apr 12, 2026

Augmentation pipeline for rendering synthetic paper printing, faxing, scanning and copy machine processes

Python 522 60 Updated Jul 20, 2025

High-performance In-browser LLM Inference Engine

TypeScript 17,755 1,237 Updated Apr 8, 2026

Nano vLLM

Python 12,826 1,912 Updated Nov 3, 2025

An open-source framework for detecting, redacting, masking, and anonymizing sensitive data (PII) across text, images, and structured data. Supports NLP, pattern matching, and customizable pipelines.

Python 7,568 999 Updated Apr 12, 2026

Next-generation Punkt sentence boundary detection with zero dependencies

Python 30 1 Updated Nov 18, 2025

OCR & Document Extraction using vision models

TypeScript 12,200 838 Updated May 20, 2025

OCR Benchmark

TypeScript 631 53 Updated Oct 21, 2025

OLMost every training recipe you need to perform data interventions with the OLMo family of models.

Python 71 19 Updated Apr 3, 2026

qpdf: A content-preserving PDF document transformer

C++ 4,921 367 Updated Apr 12, 2026

A pipeline for performing OCR on historical newspapers

Python 6 2 Updated Jan 22, 2026

LaTeXML: a TeX and LaTeX to XML/HTML/ePub/MathML translator.

Perl 1,240 139 Updated Apr 9, 2026

A computer algebra system written in pure Python

Python 14,561 5,261 Updated Apr 12, 2026
Python 31 2 Updated Jan 17, 2026

Tile primitives for speedy kernels

Cuda 3,312 275 Updated Apr 8, 2026

Toolkit for linearizing PDFs for LLM datasets/training

Python 17,117 1,381 Updated Mar 25, 2026

📦 Repomix is a powerful tool that packs your entire repository into a single, AI-friendly file. Perfect for when you need to feed your codebase to Large Language Models (LLMs) or other AI tools lik…

TypeScript 23,421 1,136 Updated Apr 12, 2026

Synthetic data curation for post-training and structured data extraction

Python 1,663 136 Updated Mar 28, 2026

Code at the speed of thought – Zed is a high-performance, multiplayer code editor from the creators of Atom and Tree-sitter.

Rust 78,974 7,800 Updated Apr 12, 2026

Parallel S3 and local filesystem execution tool.

Go 3,997 333 Updated Jun 13, 2025

SGLang is a high-performance serving framework for large language models and multimodal models.

Python 25,691 5,301 Updated Apr 12, 2026

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files

Python 9,918 1,559 Updated Apr 10, 2026

Streaming replication for SQLite.

Go 13,438 355 Updated Apr 8, 2026

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Python 3,999 318 Updated Apr 12, 2026
Next