Skip to content
View jakep-allenai's full-sized avatar
  • Seattle, WA
  • 11:33 (UTC -07:00)

Block or report jakep-allenai

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A fast AWS S3 browser, with inspiration from s5cmd

Rust 10 3 Updated Mar 12, 2026

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 70,229 8,598 Updated Apr 12, 2026

Rapid fuzzy string matching in Python using various string metrics

Python 3,850 151 Updated Apr 13, 2026

Access a database of word frequencies, in various natural languages.

Python 1,648 110 Updated Jan 4, 2025

🚀 Efficient implementations for emerging model architectures

Python 4,900 502 Updated Apr 17, 2026

utilities for batched llm calls with retries

Python 50 2 Updated Apr 8, 2026

High performance and CommonMark compliant HTML to Markdown converter. Maintained by the Kreuzberg team. Kreuzberg is a fast, polyglot document intelligence engine with a Rust core. It extracts stru…

HTML 654 55 Updated Apr 17, 2026

Augmentation pipeline for rendering synthetic paper printing, faxing, scanning and copy machine processes

Python 523 60 Updated Jul 20, 2025

High-performance In-browser LLM Inference Engine

TypeScript 17,783 1,240 Updated Apr 15, 2026

Nano vLLM

Python 12,966 1,945 Updated Apr 13, 2026

An open-source framework for detecting, redacting, masking, and anonymizing sensitive data (PII) across text, images, and structured data. Supports NLP, pattern matching, and customizable pipelines.

Python 7,645 1,003 Updated Apr 17, 2026

Next-generation Punkt sentence boundary detection with zero dependencies

Python 30 1 Updated Nov 18, 2025

OCR & Document Extraction using vision models

TypeScript 12,211 839 Updated May 20, 2025

OCR Benchmark

TypeScript 631 54 Updated Oct 21, 2025

OLMost every training recipe you need to perform data interventions with the OLMo family of models.

Python 71 19 Updated Apr 3, 2026

qpdf: A content-preserving PDF document transformer

C++ 4,938 368 Updated Apr 14, 2026

A pipeline for performing OCR on historical newspapers

Python 6 2 Updated Jan 22, 2026

LaTeXML: a TeX and LaTeX to XML/HTML/ePub/MathML translator.

Perl 1,242 139 Updated Apr 16, 2026

A computer algebra system written in pure Python

Python 14,574 5,264 Updated Apr 14, 2026
Python 31 2 Updated Jan 17, 2026

Tile primitives for speedy kernels

Cuda 3,322 278 Updated Apr 8, 2026

Toolkit for linearizing PDFs for LLM datasets/training

Python 17,140 1,383 Updated Mar 25, 2026

📦 Repomix is a powerful tool that packs your entire repository into a single, AI-friendly file. Perfect for when you need to feed your codebase to Large Language Models (LLMs) or other AI tools lik…

TypeScript 23,624 1,155 Updated Apr 17, 2026

Synthetic data curation for post-training and structured data extraction

Python 1,663 136 Updated Mar 28, 2026

Code at the speed of thought – Zed is a high-performance, multiplayer code editor from the creators of Atom and Tree-sitter.

Rust 79,295 7,879 Updated Apr 17, 2026

Parallel S3 and local filesystem execution tool.

Go 4,007 335 Updated Jun 13, 2025

SGLang is a high-performance serving framework for large language models and multimodal models.

Python 25,993 5,425 Updated Apr 17, 2026

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files

Python 9,936 1,559 Updated Apr 17, 2026

Streaming replication for SQLite.

Go 13,460 354 Updated Apr 16, 2026

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Python 4,015 320 Updated Apr 17, 2026
Next