Highlights
- Pro
Lists (2)
Sort Name ascending (A-Z)
Starred repositories
NVR with realtime local object detection for IP cameras
🏡 Open source home automation that puts local control and privacy first.
Boosting RAG on model and system performance with context reuse
Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O
An API conversion tool for popular external reinforcement learning environments
A curated list of recent progress and resources on Reinforcement Learning for AI Agents.
Tool for data extraction and interacting with Lean programmatically.
Snapshot is an encoder-decoder transformer that learns to compress context into fixed memories, for more efficient long-context inference.
Using Unified Memory on Jetson
gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI
A flipped classroom series on understanding LLMs for non-CS/AI students
A framework for fine-tuning retrieval-augmented generation (RAG) systems.
A gallery that showcases on-device ML/GenAI use cases and allows people to try and use models locally.
Implementation of the sparse attention pattern proposed by the Deepseek team in their "Native Sparse Attention" paper
TransMLA: Multi-Head Latent Attention Is All You Need (NeurIPS 2025 Spotlight)
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
Real-time webcam demo with SmolVLM and llama.cpp server
State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!
Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.
Supercharge Your LLM with the Fastest KV Cache Layer