- Bavarian Oberland, Germany
-
18:49
(UTC +02:00) - https://schweter.ml
Highlights
- Pro
Stars
A vector index built on TurboQuant, written in Rust with Python bindings
Repository for scripts affiliated with training and classification of German political texts
Tools for merging pretrained large language models.
26m function call model that runs on incredibly small devices
Experiments and examples using the Marin framework
Open-source framework for the research and development of foundation models.
Copy Fail 2: Electric Boogaloo
TurboQuant: Near-optimal KV cache quantization for LLM inference (3-bit keys, 2-bit values) with Triton kernels + vLLM integration
Open source interface for iCUE LINK Hub and other Corsair AIOs, Hubs for Linux. Manage RGB lighting, fan speeds, system metrics, as well as keyboards, mice, headsets via a web dashboard.
Landing repository for the paper "Softpick: No Attention Sink, No Massive Activations with Rectified Softmax"
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
Code for the paper "On the Expressivity Role of LayerNorm in Transformers' Attention" (Findings of ACL'2023)
Toxic Data Augmentation via LLM-Guided Directional Adversarial Generation
🤗 ml-intern: an open-source ML engineer that reads papers, trains models, and ships ML models
Official code for the paper Detoxification for LLM: From Dataset Itself. The code is based on transformers.
[ACL 2026 Findings] E2E-GMNER: End-to-End Generative Grounded Multimodal Named Entity Recognition
Official PyTorch implementation of Sessa: Selective State Space Attention for long-context sequence modeling.
[ACL 2026] How Tokenization Limits Phonological Knowledge Representation in Language Models and How to Improve Them
Lucebox: LLM inference server built for speed for specific consumer hardware.
BenGER - Research-grade benchmarking for LLMs in the German legal domain
🪨 why use many token when few token do trick — Claude Code skill that cuts 65% of tokens by talking like caveman