- Nuremberg
- www.philschmid.de
- @_philschmid
Starred repositories
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Stay ahead of AI trends with automated Reddit insights! 🚀 This tool scans AI-related Reddit communities in English & Chinese, using Reddit Official API, DeepSeek R1 by OpenRouter to analyze posts, …
The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data…
SGLang is a fast serving framework for large language models and vision language models.
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
FastAPI framework, high performance, easy to learn, fast to code, ready for production
Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek-R1, Qwen3, Gemma 3, TTS 2x faster with 70% less VRAM.
An extremely fast Python linter and code formatter, written in Rust.
Environments for LLM Reinforcement Learning
A developer toolkit to implement Serverless best practices and increase developer velocity.
Empowering everyone to build reliable and efficient software.
Training and inference on AWS Trainium and Inferentia chips.
Pretrain, finetune ANY AI model of ANY size on 1 or 10,000+ GPUs with zero code changes.
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.
A high-throughput and memory-efficient inference and serving engine for LLMs
Get your documents ready for gen AI
Test your prompts, agents, and RAGs. AI Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with co…
Split text into semantic chunks, up to a desired chunk size. Supports calculating length by characters and tokens, and is callable from Rust and Python.
Extremely fast Query Engine for DataFrames, written in Rust
Development repository for the Triton language and compiler
LLRT (Low Latency Runtime) is an experimental, lightweight JavaScript runtime designed to address the growing demand for fast and efficient Serverless applications.
A python module to repair invalid JSON from LLMs