Starred repositories
Open source implementation and extension of Google Research’s PaperBanana for automated academic figures, diagrams, and research visuals, expanded to new domains like slide generation.
SQL-like query language and CLI for Qdrant vector search engine
This repo is meant to serve as a guide for Machine Learning/AI technical interviews.
Anbeeld / beellama.cpp
Forked from spiritbuun/buun-llama-cppDFlash & TurboQuant in llama.cpp with up to 3x faster generation and 7.5x more KV cache in same VRAM
Advanced prompt injection defense system for AI agents. Multi-language detection, severity scoring, and security auditing.
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
A fast type checker and language server for Python
A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.
ML algorithms implemented and derived from first-principles in Jupyter Notebooks and NumPy
Modal-style sandbox API on top of Hugging Face Jobs
DFlash: Block Diffusion for Flash Speculative Decoding
Codebase for LLM Textual Hallucination Benchmark
Development repository for the Triton language and compiler
TokenSpeed is a speed-of-light LLM inference engine.
MedSafetyBench: Evaluating and Improving the Medical Safety of LLMs, NeurIPS 2024
hardware implementation of transformers running microgpt at 50k+ tkps
Domain-specific GLiNER fine-tuning pipeline for sports NER. GLiNER is a bidirectional transformer encoder (DeBERTa-v3 based) for zero-shot named entity recognition — I'm fine-tuning it on a custom …
A framework for few-shot evaluation of language models.
DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails
Restore heading hierarchy in markdown documents using a fine-tuned 0.6B parameter LLM.
KV cache compression via block-diagonal rotation. Beats TurboQuant: better PPL (6.91 vs 7.07), 28% faster decode, 5.3x faster prefill, 44x fewer params. Drop-in llama.cpp integration.