Whisper + Qwen3 on MLX

Local AI stack for Apple Silicon: speech-to-text via Whisper and tool-calling LLM via Qwen3.

Requirements

Apple Silicon Mac (M1/M2/M3/M4)
Nix with flakes enabled
For Qwen3-32B: 24GB+ unified memory recommended
For Qwen3-14B: 16GB unified memory sufficient

Setup

# Enter the dev shell (provides Python, ffmpeg, etc.)
nix develop

# Install Python dependencies
./install

Whisper (Speech-to-Text)

# Transcribe audio to text
./run input.mp3 output
# Creates output.txt

Uses mlx-community/whisper-large-v3-mlx for high-quality transcription.

Qwen3 LLM (Tool-Calling)

Interactive Chat

# Use large model (32B, requires ~18GB)
./run-llm

# Use medium model (14B, requires ~10GB)
./run-llm medium

# Use small model (7B, requires ~5GB)
./run-llm small

Programmatic Usage

from llm import ToolCallingAgent, Tool

# Define a tool
def get_weather(city: str) -> str:
    return f"Weather in {city}: Sunny, 22°C"

weather_tool = Tool(
    name="get_weather",
    description="Get current weather for a city",
    parameters={
        "type": "object",
        "properties": {
            "city": {"type": "string", "description": "City name"}
        },
        "required": ["city"],
    },
    function=get_weather,
)

# Create agent with tools
agent = ToolCallingAgent(
    tools=[weather_tool],
    model_size="large",  # or "medium", "small"
)

# Run query (agent will call tools automatically)
response = agent.run("What's the weather in Tokyo?")
print(response)

Available Models

Size	Model	Memory	Best For
`large`	Qwen3-32B-4bit	~18GB	Best reasoning & tool use
`medium`	Qwen2.5-14B-Instruct-4bit	~10GB	Good balance
`small`	Qwen2.5-7B-Instruct-4bit	~5GB	Resource-constrained

Why Qwen3?

Qwen3 (April 2025) is currently the most capable open-source tool-calling LLM:

Hybrid reasoning (thinking/non-thinking modes)
128K context window
Excellent function calling via Hermes-style XML format
Runs efficiently on Apple Silicon via MLX

Qwen Daemon (Unified API Server)

Keep the model loaded and serve multiple agent profiles via HTTP:

# Start the daemon (loads model on first request)
./run-daemon

# In another terminal, run smoke tests
./run-ping

Google Sync (Gmail + Calendar)

The daemon syncs Gmail and Calendar for all authenticated accounts:

On startup: Syncs last year of data immediately
Every 5 minutes: Incremental sync

Quick setup:

# 1. Store OAuth client secret in passveil (one-time)
passveil set google/qwen-sync-oauth < ~/Downloads/client_secrets.json

# 2. Authenticate each account
python -m daemon.sync.auth --account work
python -m daemon.sync.auth --account personal

# 3. Start daemon - sync happens automatically
./run-daemon

See docs/AUTH.md for detailed setup instructions including:

Creating OAuth credentials in Google Cloud Console
Supporting personal Gmail accounts (not just Workspace)
Troubleshooting token expiration

Data storage: ~/.qwen/data/{account}/emails/ and calendar/

Endpoints

Endpoint	Method	Description
`/health`	GET	Health check and model status
`/v1/chat`	POST	Chat completion with profile/tools
`/v1/invoke-tool`	POST	Direct tool execution
`/v1/profiles`	GET	List available agent profiles
`/v1/tools`	GET	List available tools

Example: Chat with Profile

curl -X POST http://127.0.0.1:5997/v1/chat \
  -H "Content-Type: application/json" \
  -d '{
    "message": "What is 2 + 2?",
    "profile": "general",
    "model_size": "large"
  }'

Available Profiles

Profile	Tools	Use Case
`general`	None	Simple chat without tools
`mirror`	Linear/Slack	Query team knowledge base
`code_runner`	Browser/Web	Find & run code in playgrounds

Python Client Example

import httpx

client = httpx.Client(base_url="http://127.0.0.1:5997")

# Chat with mirror profile
response = client.post("/v1/chat", json={
    "message": "What issues are in progress?",
    "profile": "mirror",
    "model_size": "large",
})
print(response.json()["content"])

Architecture

                    ┌─────────────────────────────────────┐
                    │          Qwen Daemon                │
                    │     (FastAPI + Singleton Model)     │
                    ├─────────────────────────────────────┤
                    │  /v1/chat    /v1/invoke-tool        │
                    │  /v1/profiles  /v1/tools  /health   │
                    └──────────────────┬──────────────────┘
                                       │
        ┌──────────────────────────────┼──────────────────────────────┐
        │                              │                              │
┌───────▼────────┐          ┌─────────▼─────────┐          ┌─────────▼─────────┐
│  AgentProfile  │          │   ToolRegistry    │          │   QwenModel       │
│  (prompts)     │          │   (executors)     │          │   (singleton)     │
└───────┬────────┘          └─────────┬─────────┘          └─────────┬─────────┘
        │                             │                              │
        │     ┌───────────────────────┴───────────────────────┐      │
        │     │                                               │      │
        │  ┌──▼───────────┐                         ┌─────────▼──┐   │
        │  │ Mirror Tools │                         │Browser Tools│  │
        │  │ (Linear/Slack)│                        │ (Playwright)│  │
        │  └──────────────┘                         └─────────────┘  │
        │                                                            │
        └────────────────────────────┬───────────────────────────────┘
                                     │
                              ┌──────▼──────┐
                              │     MLX     │
                              │  Framework  │
                              └──────┬──────┘
                                     │
                              ┌──────▼──────┐
                              │   Apple     │
                              │  Silicon    │
                              └─────────────┘

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
daemon		daemon
docs		docs
frontend		frontend
scripts		scripts
tests		tests
.gitignore		.gitignore
API.md		API.md
README.md		README.md
code_runner_agent.py		code_runner_agent.py
demo_browser_agent.py		demo_browser_agent.py
dia-run		dia-run
diarize.py		diarize.py
flake.lock		flake.lock
flake.nix		flake.nix
install		install
install-service		install-service
llm.py		llm.py
mirror_agent.py		mirror_agent.py
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements.txt		requirements.txt
run		run
run-code		run-code
run-daemon		run-daemon
run-frontend		run-frontend
run-frontend-tests		run-frontend-tests
run-llm		run-llm
run-mirror		run-mirror
run-ping		run-ping
run-tests		run-tests

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Whisper + Qwen3 on MLX

Requirements

Setup

Whisper (Speech-to-Text)

Qwen3 LLM (Tool-Calling)

Interactive Chat

Programmatic Usage

Available Models

Why Qwen3?

Qwen Daemon (Unified API Server)

Google Sync (Gmail + Calendar)

Endpoints

Example: Chat with Profile

Available Profiles

Python Client Example

Architecture

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Whisper + Qwen3 on MLX

Requirements

Setup

Whisper (Speech-to-Text)

Qwen3 LLM (Tool-Calling)

Interactive Chat

Programmatic Usage

Available Models

Why Qwen3?

Qwen Daemon (Unified API Server)

Google Sync (Gmail + Calendar)

Endpoints

Example: Chat with Profile

Available Profiles

Python Client Example

Architecture

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages