-
Freelancer
- Germany, remote
-
07:22
(UTC +02:00) - https://huggingface.co/kyr0
Stars
Grafting script and vLLM container inference runtime Makefile for kyr0/Ornith-35B-FP8-E4M3-MTP
JetSpec: Breaking the Scaling Ceiling of Speculative Decoding with Causal Parallel Tree Drafting
Fully uncensored, capability-enhanced abliteration of Qwen3.6-27B. NVFP4 + z-lab DFlash speculative decoding (n=12) on the unified ghcr.io/aeon-7/aeon-vllm-ultimate:latest container, tuned for long…
First foundation ASR built for the real world - 7 atomic acoustic conditions, 54 compound scenarios, 2.6M samples, and up to ~30% gains over SOTA where every other model falls apart. **You'll come …
Building Foundation Models for Human Behavior Simulation
The easiest and fastest way to create production-ready Kubernetes clusters on Hetzner Cloud
Give a query, get a dataroom. Pi + self-hosted Qwen3.6 research harness on a single L4.
Build Real-Time Knowledge Graphs for AI Agents
We propose Bidirectional Evolutionary Search (BES), a search framework that couples forward candidate evolution with backward goal decomposition.
An open source, self-hosted implementation of the Tailscale control server
SkillOpt with local AI is a text-space optimizer that trains reusable natural-language skills for frozen LLM agents through trajectory-driven edits, validation-gated updates, and deployable best_sk…
A Datacenter Scale Distributed Inference Serving Framework
Lightning-Fast, On-Device, Multilingual TTS — running natively via ONNX.
Run 70B+ LLMs on Apple Silicon by using SSD as extended memory — intelligent layer streaming and caching for Mac
Fast and Accurate Code Search for Agents. Uses ~98% fewer tokens than grep+read
reverse engineering Gemini's SynthID detection
Efficient Universal Perception Encoder: a single on-device vision encoder with versatile representations that match or exceed specialized experts across multiple task domains.
The best-benchmarked open-source AI memory system. And it's free.
HeadAudio: An audio node/processor for real-time audio-driven viseme detection and lip-sync in browsers.
Talking Head (3D): A JavaScript class for real-time lip-sync using full-body 3D avatars.
MLX-Embeddings is the best package for running Vision and Language Embedding models locally on your Mac using MLX.
LLM inference server with continuous batching & SSD caching for Apple Silicon — managed from the macOS menu bar