Skip to content

Nexa SDK is a comprehensive toolkit for supporting ONNX and GGML models. It supports text generation, image generation, vision-language models (VLM), auto-speech-recognition (ASR), and text-to-speech (TTS) capabilities.

License

Notifications You must be signed in to change notification settings

FiditeNemini/nexa-sdk

 
 

Repository files navigation

Nexa AI Banner

🤝 Supported chipmakers

Documentation Vote for Next Models X account Join us on Discord Join us on Slack

NexaSDK - Run any AI model on any backend

NexaSDK is an easy-to-use developer toolkit for running any AI model locally — across NPUs, GPUs, and CPUs — powered by our NexaML engine, built entirely from scratch for peak performance on every hardware stack. Unlike wrappers that depend on existing runtimes, NexaML is a unified inference engine built at the kernel level. It’s what lets NexaSDK achieve Day-0 support for new model architectures (LLMs, multimodal, audio, vision). NexaML supports 3 model formats: GGUF, MLX, and Nexa AI's own .nexa format.

⚙️ Differentiation

Features NexaSDK Ollama llama.cpp LM Studio
NPU support ✅ NPU-first
Android SDK support ✅ NPU/GPU/CPU support ⚠️ ⚠️
Support any model in GGUF, MLX, NEXA format ✅ Low-level Control ⚠️
Full multimodality support ✅ Image, Audio, Text ⚠️ ⚠️ ⚠️
Cross-platform support ✅ Desktop, Mobile, Automotive, IoT ⚠️ ⚠️ ⚠️
One line of code to run ⚠️
OpenAI-compatible API + Function calling

Legend: ✅ Supported   |   ⚠️ Partial or limited support   |   ❌ No

Recent Wins

Quick Start

Step 1: Download Nexa CLI with one click

macOS

Windows

Linux

For x86_64:

curl -fsSL https://github.com/NexaAI/nexa-sdk/releases/latest/download/nexa-cli_linux_x86_64.sh -o install.sh && chmod +x install.sh && ./install.sh && rm install.sh

For arm64:

curl -fsSL https://github.com/NexaAI/nexa-sdk/releases/latest/download/nexa-cli_linux_arm64.sh -o install.sh && chmod +x install.sh && ./install.sh && rm install.sh

Uninstall

sudo rm -r /opt/nexa_sdk
sudo rm /usr/local/bin/nexa
# if you want to remove data as well
# rm -r $HOME/.cache/nexa.ai

Step 2: Run models with one line of code

You can run any compatible GGUF, MLX, or nexa model from 🤗 Hugging Face by using the nexa infer <full repo name>.

GGUF models

Tip

GGUF runs on macOS, Linux, and Windows on CPU/GPU. Note certain GGUF models are only supported by NexaSDK (e.g. Qwen3-VL-4B and 8B).

📝 Run and chat with LLMs, e.g. Qwen3:

nexa infer ggml-org/Qwen3-1.7B-GGUF

🖼️ Run and chat with Multimodal models, e.g. Qwen3-VL-4B:

nexa infer NexaAI/Qwen3-VL-4B-Instruct-GGUF

MLX models

Tip

MLX is macOS-only (Apple Silicon). Many MLX models in the Hugging Face mlx-community organization have quality issues and may not run reliably. We recommend starting with models from our curated NexaAI Collection for best results. For example

📝 Run and chat with LLMs, e.g. Qwen3:

nexa infer NexaAI/Qwen3-4B-4bit-MLX

🖼️ Run and chat with Multimodal models, e.g. Gemma3n:

nexa infer NexaAI/gemma-3n-E4B-it-4bit-MLX

Qualcomm NPU models

Tip

You need to download the arm64 with Qualcomm NPU support and make sure you have Snapdragon® X Elite chip on your laptop.

Quick Start (Windows arm64, Snapdragon X Elite)

  1. Login & Get Access Token (required for Pro Models)

    • Create an account at sdk.nexa.ai
    • Go to Deployment → Create Token
    • Run this once in your terminal (replace with your token):
      nexa config set license '<your_token_here>'
  2. Run and chat with our multimodal model, OmniNeural-4B, or other models on NPU

nexa infer NexaAI/OmniNeural-4B
nexa infer NexaAI/Granite-4-Micro-NPU
nexa infer NexaAI/Qwen3-VL-4B-Instruct-NPU

CLI Reference

Essential Command What it does
nexa -h show all CLI commands
nexa pull <repo> Interactive download & cache of a model
nexa infer <repo> Local inference
nexa list Show all cached models with sizes
nexa remove <repo> / nexa clean Delete one / all cached models
nexa serve --host 127.0.0.1:8080 Launch OpenAI‑compatible REST server
nexa run <repo> Chat with a model via an existing server

👉 To interact with multimodal models, you can drag photos or audio clips directly into the CLI — you can even drop multiple images at once!

See CLI Reference for full commands.

Import model from local filesystem

# hf download <model> --local-dir /path/to/modeldir
nexa pull <model> --model-hub localfs --local-path /path/to/modeldir

🎯 You Decide What Model We Support Next

Nexa Wishlist — Request and vote for the models you want to run on-device.

Drop a Hugging Face repo ID, pick your preferred backend (GGUF, MLX, or Nexa format for Qualcomm + Apple NPUs), and watch the community's top requests go live in NexaSDK.

👉 Vote now at sdk.nexa.ai/wishlist

Acknowledgements

We would like to thank the following projects:

Join Builder Bounty Program

Earn up to 1,500 USD for building with NexaSDK.

Developer Bounty

Learn more in our Participant Details.

About

Nexa SDK is a comprehensive toolkit for supporting ONNX and GGML models. It supports text generation, image generation, vision-language models (VLM), auto-speech-recognition (ASR), and text-to-speech (TTS) capabilities.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Go 50.6%
  • Kotlin 23.5%
  • Jupyter Notebook 14.7%
  • Python 6.7%
  • Shell 2.3%
  • Inno Setup 0.9%
  • Other 1.3%