Stars
Automated Identification of Redundant Layer Blocks for Pruning in Large Language Models
llllvvuu / mlx_parallm
Forked from willccbb/mlx_parallmFast parallel LLM inference for MLX
huggingface chat-ui integration with mlx-lm server
Fast parallel LLM inference for MLX
On-the-fly Goliath-style transformer franken-merges with a tiny memory footprint
On-device AI across mobile, embedded and edge for PyTorch
This is the Personality Core for GLaDOS, the first steps towards a real-life implementation of the AI from the Portal series by Valve.
SWE-agent takes a GitHub issue and tries to automatically fix it, using your LM of choice. It can also be employed for offensive cybersecurity or competitive coding challenges. [NeurIPS 2024]
A fast inference library for running LLMs locally on modern consumer-class GPUs
Zero-Shot Speech Editing and Text-to-Speech in the Wild