Théophile Lafargue

20 · Founder · Paris

I build AI infrastructure for constrained environments — the edges where most systems stop working.

Available for contract work — Apple Silicon LLM inference: KV-cache optimization, quantization, speculative decoding, throughput tuning. → contact@aeternitatis.eu

Student-Entrepreneur — Pépite PEIPS, Université Paris-Saclay (SNEE) ORCID: 0009-0001-5727-2475

Selected proof

First GGUF quants of MiniMax-M2.5 & M2.7 (229B MoE) — used and recommended by the llama.cpp / r/LocalLLaMA community
llama.cpp PRs merged and cited as prior work by core maintainers
Native MLX speculative decoding port — 3.41× faster inference on Apple Silicon

Open-source contributions

PR	What it does	Status
llama.cpp #20075	Fix state corruption in speculative decoding on hybrid SSM/MoE models. +45% inference speed on Apple Silicon Metal.	Merged · cited as prior work in #20428 + #20649
llama.cpp #20649	Diagnostic + flake8 fix for Mistral Small 4 (119B MoE).	Merged · cited alongside ggerganov + ngxson
unslothai/unsloth #4901	Fix RoPE offset cast crashing Gemma 4 inference on Apple Silicon.	Merged by danielhanchen (creator)
StepFun Cookbook #14	Local deployment architecture for Step-3.5-Flash on Apple Silicon.	Merged

Projects

Project	Description
mlx-dflash	Native MLX port of DFlash speculative decoding. 3.41× faster inference on Apple Silicon — Qwen3-8B bf16, M3 Max 128GB, 1024 tokens. Acceptance 8.75/16. Single `mx.eval()` per step, intra-GPU `verify_ids`.
LACE	Semantic compression under LoRa/SMS physical constraints. Cognitive Emergence Law: N/K < C·d_cog, C_emp=0.391≈1/e. K=16 optimal deployment parameter (p=0.0034). Preprint: HAL hal-05596229 · Zenodo 10.5281/zenodo.19664121
mythos-distillation	Behavioral distillation into Gemma 4 26B MoE via LoRA (r=64, 30 layers). 551 pairs, val loss 1.398, 7/7 out-of-distribution questions generalized without system prompt. 80 t/s on M3 Max.
patent-low-bandwidth-ai	Reference gateway for stateful LLM dialogue over 2G / SMS. Companion implementation to patent FR2511116 (architecture).
Phantom	On-device behavioral AI OS. Two-Tower (LSTM 256d + action embeddings), full RLHF loop via local Qwen 122B as reward model. MLX, zero cloud.
VoxTape	Local voice dictation for macOS. MLX Whisper on Metal GPU: 8.3s audio → 0.4s inference (20× real-time). Open-source alternative to SuperWhisper.
benchmark-422-qec	11 LLMs (cloud + local M3 Max) on the [[4,2,2]] CritPt QEC problem. 0/11 correct. Failure patterns documented.

Models — HuggingFace (ox-ox)

Repo	Description	Downloads
MiniMax-M2.7-GGUF	First GGUF quants of MiniMax-M2.7 (229B MoE). Q3_K_L + Q8_0. PPL 8.44 · 28.52 t/s.	522
MiniMax-M2.5-GGUF	First GGUF quants of MiniMax-M2.5 (229B MoE). PPL 8.79 · 28.7 t/s. Recommended by llama.cpp community.	437
mythos-character-distillation	551 behavioral pairs for Mythos-style LoRA distillation.	66
lace-semantic-compression	198 operational tasks (defense / medical / industrial), VQ codebook, LACE v2 dataset.	40

Patent

FR2511116 — Hybrid State-Preserving Gateway for LLM Inference over Low-Bandwidth Protocols (2G / SMS / LoRa / satellite) Filed: Sep 27, 2025 · INPI · 11 claims · examination in progress

Stack

Inference — llama.cpp · MLX · Metal · GGUF · speculative decoding AI/ML — LoRA · Transformers · VQ-VAE · RAG (ChromaDB) · Whisper · RLHF Protocols — LoRa / 2G / SMS / satellite · Flask gateway Languages — Python · C++ Infra — Tailscale · bare-metal homelab · M3 Max 128GB

LinkedIn · Substack · HuggingFace · ORCID

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Théophile Lafargue

Selected proof

Open-source contributions

Projects

Models — HuggingFace (ox-ox)

Patent

Stack

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Théophile Lafargue

Selected proof

Open-source contributions

Projects

Models — HuggingFace (ox-ox)

Patent

Stack

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages