MAX is a high-performance inference server that provides an OpenAI-compatible endpoint for large language models (LLMs) and it's a fundamental component of the Modular Platform.
This directory includes the source for our Python-based inference server, Python-based model pipelines (graphs), Python-based neural-net operators (high-level graph ops), Mojo-based kernel functions (low-level graph ops for GPUs and CPUs), and more.
With just a few commands, you can use MAX to create a local endpoint serving a large language model (LLM) of your choice, using our CLI tool or Docker container. Try it now with our quickstart guide.
Thanks for your interest in contributing to MAX!
We welcome contributions to this repo on the
main
branch. Please first read our Contributor
Guide.
If you want to report issues or request features, please create a GitHub issue here—also see our guide to submitting good bug reports.
If you'd like to chat with the team and other community members, please send a message to our Discord channel and our forum board.