Getting Started with Ax AI Providers and Models
This guide helps beginners get productive with Ax quickly: pick a provider, choose a model, and send a request. You’ll also learn how to define model presets and common options.
1. Install and set up
npm i @ax-llm/ax
Set your API keys as environment variables:
OPENAI_APIKEYANTHROPIC_APIKEYGOOGLE_APIKEY(or Google Vertex setup)
2. Create an AI instance
Use the ai() factory with a provider name and your API key.
import { ai, AxAIGoogleGeminiModel } from "@ax-llm/ax";
const llm = ai({
name: "google-gemini",
apiKey: process.env.GOOGLE_APIKEY!,
config: {
model: AxAIGoogleGeminiModel.Gemini20Flash,
},
});
Supported providers include: openai, anthropic, google-gemini, mistral,
groq, cohere, together, deepseek, ollama, huggingface, openrouter,
azure-openai, reka, x-grok.
3. Choose models using presets (recommended)
Define a models list with user-friendly keys. Each item describes a preset and
can include provider-specific settings. When you use a key in model, Ax maps
it to the right backend model and merges the preset config.
import { ai, AxAIGoogleGeminiModel } from "@ax-llm/ax";
const gemini = ai({
name: "google-gemini",
apiKey: process.env.GOOGLE_APIKEY!,
config: { model: "simple" },
models: [
{
key: "tiny",
model: AxAIGoogleGeminiModel.Gemini20FlashLite,
description: "Fast + cheap",
// Provider config merged automatically
config: { maxTokens: 1024, temperature: 0.3 },
},
{
key: "simple",
model: AxAIGoogleGeminiModel.Gemini20Flash,
description: "Balanced general-purpose",
config: { temperature: 0.6 },
},
],
});
// Use a preset by key
await gemini.chat({
model: "tiny",
chatPrompt: [{ role: "user", content: "Summarize this:" }],
});
What gets merged when you pick a key:
- Model mapping: preset
modelreplaces the key - Tuning:
maxTokens,temperature,topP,topK, penalties,stopSequences,n,stream - Provider extras (Gemini):
config.thinking.thinkingTokenBudgetis mapped to Ax’s levels automatically;includeThoughtsmaps toshowThoughts
You can still override per-request:
await gemini.chat(
{ model: "simple", chatPrompt: [{ role: "user", content: "Hi" }] },
{ stream: false, thinkingTokenBudget: "medium" },
);
4. Send your first chat
const res = await gemini.chat({
chatPrompt: [
{ role: "system", content: "You are concise." },
{ role: "user", content: "Write a haiku about the ocean." },
],
});
console.log(res.results[0]?.content);
5. Common options
stream(boolean): enable server-sent events;trueby default if supportedthinkingTokenBudget(Gemini/Claude-like):'minimal' | 'low' | 'medium' | 'high' | 'highest' | 'none'showThoughts(if model supports): include thoughts in outputfunctionCallMode:'auto' | 'native' | 'prompt'debug,logger,tracer,rateLimiter,timeout
Example with overrides:
await gemini.chat(
{ chatPrompt: [{ role: "user", content: "Plan a weekend trip" }] },
{ stream: false, thinkingTokenBudget: "high", showThoughts: true },
);
6. Extended Thinking
Extended thinking allows models to reason internally before responding, improving
quality on complex tasks. Ax provides a unified thinkingTokenBudget interface
that works across providers (Anthropic, Google Gemini) while handling
provider-specific details automatically.
Usage
Pass thinkingTokenBudget and optionally showThoughts when making requests:
import { ai, AxAIAnthropicModel } from "@ax-llm/ax";
// Anthropic
const claude = ai({
name: "anthropic",
apiKey: process.env.ANTHROPIC_APIKEY!,
config: { model: AxAIAnthropicModel.Claude46Opus },
});
const res = await claude.chat(
{ chatPrompt: [{ role: "user", content: "Solve this step by step..." }] },
{ thinkingTokenBudget: "medium", showThoughts: true },
);
console.log(res.results[0]?.thought); // The model's internal reasoning
console.log(res.results[0]?.content); // The final answer
import { ai, AxAIGoogleGeminiModel } from "@ax-llm/ax";
// Google Gemini
const gemini = ai({
name: "google-gemini",
apiKey: process.env.GOOGLE_APIKEY!,
config: { model: AxAIGoogleGeminiModel.Gemini25Pro },
});
const res = await gemini.chat(
{ chatPrompt: [{ role: "user", content: "Analyze this complex problem..." }] },
{ thinkingTokenBudget: "high", showThoughts: true },
);
Budget levels
The string levels map to provider-specific token budgets:
| Level | Anthropic (tokens) | Gemini (tokens) |
|---|---|---|
'none' | disabled | minimal (Gemini 3+ can’t fully disable) |
'minimal' | 1,024 | 200 |
'low' | 5,000 | 800 |
'medium' | 10,000 | 5,000 |
'high' | 20,000 | 10,000 |
'highest' | 32,000 | 24,500 |
Anthropic model-specific behavior
Ax automatically selects the right wire format based on the Anthropic model:
- Opus 4.6: Uses adaptive thinking (
type: 'adaptive') + effort levels. No explicit token budget is sent — the model decides how much to think. ThethinkingTokenBudgetlevel controls the effort parameter instead. - Opus 4.5: Uses explicit budget (
budget_tokens) + effort levels. Effort is capped at'high'(the'max'effort level is not supported). - Other thinking models (Claude 3.7 Sonnet, Claude 4 Sonnet, etc.): Uses budget tokens only, no effort parameter.
Effort levels
For Opus 4.5+ models, Ax automatically maps your thinkingTokenBudget level to
an Anthropic effort level (low / medium / high / max). You don’t need
to set effort manually. The default mapping is:
| Budget level | Effort |
|---|---|
'minimal' | low |
'low' | low |
'medium' | medium |
'high' | high |
'highest' | max |
You can customize this via the effortLevelMapping config (see below).
Customization
Override the default token budgets or effort mapping in your provider config:
const claude = ai({
name: "anthropic",
apiKey: process.env.ANTHROPIC_APIKEY!,
config: {
model: AxAIAnthropicModel.Claude46Opus,
thinkingTokenBudgetLevels: {
minimal: 2048,
low: 8000,
medium: 16000,
high: 25000,
highest: 40000,
},
effortLevelMapping: {
minimal: "low",
low: "medium",
medium: "high",
high: "high",
highest: "max",
},
},
});
Constraints
When thinking is enabled on Anthropic, the API restricts certain parameters:
temperatureis ignored (cannot be set)topKis ignored (cannot be set)topPis only sent if its value is >= 0.95
These restrictions are handled automatically — Ax omits the restricted parameters from the request when thinking is active.
7. Embeddings (if supported)
const { embeddings } = await gemini.embed({
texts: ['hello', 'world'],
embedModel: 'text-embedding-005',
})
``;
### 8. Context Caching
Context caching reduces costs and latency by caching large prompt prefixes
(system prompts, function definitions, examples) for reuse across multiple
requests. This is especially valuable for multi-turn agentic flows.
#### Enabling Context Caching
Pass the `contextCache` option to `forward()` to enable caching:
```ts
import { ai, ax, AxMemory } from "@ax-llm/ax";
const llm = ai({
name: "google-gemini",
apiKey: process.env.GOOGLE_APIKEY!,
});
const codeReviewer = ax(
`code:string, language:string -> review:string, suggestions:string[]`,
{ description: "You are an expert code reviewer..." } // Large system prompt
);
const mem = new AxMemory();
// Enable context caching
const result = await codeReviewer.forward(llm, { code, language }, {
mem,
sessionId: "code-review-session",
contextCache: {
ttlSeconds: 3600, // Cache TTL (1 hour)
},
});
How It Works
Google Gemini (Explicit Caching):
- Creates a separate cache resource with an ID
- Cache persists across requests using the same
sessionId+ content hash - Automatic TTL refresh when cache is near expiration
- Provides up to 90% cost reduction on cached tokens
- Minimum 2048 tokens required for caching
Anthropic (Implicit Caching):
- Uses
cache_controlmarkers in the request - System prompts are automatically cached
- Function definitions and results are marked for caching
- No explicit cache management needed
- Provides up to 90% cost reduction on cached tokens
Configuration Options
type AxContextCacheOptions = {
// Explicit cache name (bypasses auto-creation)
name?: string;
// TTL in seconds (default: 3600)
ttlSeconds?: number;
// Minimum tokens to create cache (default: 2048)
minTokens?: number;
// Window before expiration to trigger refresh (default: 300)
refreshWindowSeconds?: number;
// External registry for serverless environments
registry?: AxContextCacheRegistry;
// Controls where the cache breakpoint is set in the prompt prefix
// Prefix order: System → Functions → Examples → User Input
// - 'after-examples' (default): Cache includes system + functions + examples
// - 'after-functions': Cache system + functions only (use when examples are dynamic)
// - 'system': Cache only system prompt (use when functions are dynamic)
cacheBreakpoint?: 'system' | 'after-functions' | 'after-examples';
};
Dynamic Examples (Excluding from Cache)
When examples are dynamic (e.g., retrieved per-request from a vector database),
use cacheBreakpoint: 'after-functions' to exclude them from caching:
const result = await gen.forward(llm, input, {
contextCache: {
ttlSeconds: 3600,
cacheBreakpoint: 'after-functions', // Cache system + functions, but not examples
},
});
Similarly, if both examples and functions are dynamic, use cacheBreakpoint: 'system'
to cache only the system prompt.
Multi-Turn Function Calling with Caching
When using functions/tools, caching is automatically applied:
import { ai, ax, type AxFunction } from "@ax-llm/ax";
const tools: AxFunction[] = [
{
name: "calculate",
description: "Evaluate a math expression",
parameters: { type: "object", properties: { expression: { type: "string" } } },
func: ({ expression }) => eval(expression),
},
];
const agent = ax("question:string -> answer:string", {
description: "You are a helpful assistant...",
functions: tools,
});
const llm = ai({ name: "google-gemini", apiKey: process.env.GOOGLE_APIKEY! });
// Tools and function results are automatically cached
const result = await agent.forward(llm, { question: "What is 2^10?" }, {
contextCache: { ttlSeconds: 3600 },
});
External Cache Registry (Serverless)
For serverless environments where in-memory state is lost, use an external registry:
// Redis-backed registry example
const registry: AxContextCacheRegistry = {
get: async (key) => {
const data = await redis.get(`cache:${key}`);
return data ? JSON.parse(data) : undefined;
},
set: async (key, entry) => {
await redis.set(`cache:${key}`, JSON.stringify(entry), "EX", 3600);
},
};
const result = await gen.forward(llm, input, {
sessionId: "my-session",
contextCache: {
ttlSeconds: 3600,
registry,
},
});
Supported Models
Gemini (Explicit Caching):
- Gemini 3 Flash/Pro
- Gemini 2.5 Pro/Flash/Flash-Lite
- Gemini 2.0 Flash/Flash-Lite
Anthropic (Implicit Caching):
- All Claude models support implicit caching
9. Tips
- Prefer presets: gives friendly names and consistent tuning across your app
- Start with fast/cheap models for iteration; switch keys later without code changes
- Use
stream: falsein tests for simpler assertions - In the browser, set
corsProxyif needed
For more examples, see the examples directory and provider-specific docs.
AWS Bedrock Provider
The @ax-llm/ax-ai-aws-bedrock package provides production-ready AWS Bedrock integration supporting Claude, GPT OSS, and Titan Embed models.
Installation
npm install @ax-llm/ax @ax-llm/ax-ai-aws-bedrock
Quick Start
import { AxAIBedrock, AxAIBedrockModel } from "@ax-llm/ax-ai-aws-bedrock";
import { ax } from "@ax-llm/ax";
const ai = new AxAIBedrock({
region: "us-east-2",
config: { model: AxAIBedrockModel.ClaudeSonnet4 },
});
const generator = ax("question:string -> answer:string");
const result = await generator.forward(ai, {
question: "What is AWS Bedrock?",
});
console.log(result.answer);
Configuration
const ai = new AxAIBedrock({
region: "us-east-2", // Primary AWS region
fallbackRegions: ["us-west-2", "us-east-1"], // Fallback regions for Claude
gptRegion: "us-west-2", // Primary region for GPT models
gptFallbackRegions: ["us-east-1"], // Fallback regions for GPT
config: {
model: AxAIBedrockModel.ClaudeSonnet4,
maxTokens: 4096,
temperature: 0.7,
topP: 0.9,
},
});
Supported Models
Claude Models:
AxAIBedrockModel.ClaudeSonnet4- Claude Sonnet 4AxAIBedrockModel.ClaudeOpus4- Claude Opus 4AxAIBedrockModel.Claude35Sonnet- Claude 3.5 SonnetAxAIBedrockModel.Claude35Haiku- Claude 3.5 HaikuAxAIBedrockModel.Claude3Opus- Claude 3 Opus
GPT Models:
AxAIBedrockModel.Gpt41106- GPT-4 1106 PreviewAxAIBedrockModel.Gpt4Mini- GPT-4o Mini
Embedding Models:
AxAIBedrockEmbedModel.TitanEmbedV2- Titan Embed V2
Regional Failover
The provider automatically handles regional failover for high availability. If the primary region fails, it retries with fallback regions.
AWS Authentication
Uses AWS SDK’s default credential chain:
- Environment variables (
AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY) - AWS credentials file (
~/.aws/credentials) - IAM roles (EC2/Lambda)
Vercel AI SDK Integration
The @ax-llm/ax-ai-sdk-provider package provides seamless integration with the Vercel AI SDK v5.
Installation
npm install @ax-llm/ax @ax-llm/ax-ai-sdk-provider ai
Basic Usage
import { ai } from "@ax-llm/ax";
import { AxAIProvider } from "@ax-llm/ax-ai-sdk-provider";
import { generateText, streamText } from "ai";
// Create Ax AI instance
const axAI = ai({
name: "openai",
apiKey: process.env.OPENAI_APIKEY!,
});
// Create AI SDK v5 compatible provider
const model = new AxAIProvider(axAI);
// Use with AI SDK functions
const result = await generateText({
model,
messages: [{ role: "user", content: "Hello!" }],
});
console.log(result.text);
Streaming with React Server Components
import { ai } from "@ax-llm/ax";
import { AxAIProvider } from "@ax-llm/ax-ai-sdk-provider";
import { streamUI } from "ai/rsc";
const axAI = ai({
name: "openai",
apiKey: process.env.OPENAI_APIKEY!,
});
const model = new AxAIProvider(axAI);
const result = await streamUI({
model,
messages: [{ role: "user", content: "Tell me a story" }],
text: ({ content }) => <p>{content}</p>,
});
Agent Provider
Use Ax agents with the AI SDK:
import { ai, agent } from "@ax-llm/ax";
import { AxAgentProvider } from "@ax-llm/ax-ai-sdk-provider";
const llm = ai({ name: "openai", apiKey: process.env.OPENAI_APIKEY! });
const myAgent = agent("userInput:string -> response:string", {
name: "helper",
description: "A helpful assistant",
ai: llm,
});
const agentProvider = new AxAgentProvider({
agent: myAgent,
updateState: (msgs) => {
/* handle state updates */
},
generate: (result) => <div>{result.response}</div>,
});
Features
- AI SDK v5
LanguageModelV2compatible - Full tool/function calling support
- Streaming with lifecycle events
- Multi-modal inputs (text, images, files)
- Full TypeScript support
Ax Tools Package
The @ax-llm/ax-tools package provides additional tools for Ax including MCP (Model Context Protocol) support and a JavaScript runtime.
Installation
npm install @ax-llm/ax @ax-llm/ax-tools
MCP Stdio Transport
Connect to MCP servers via stdio:
import { AxMCPClient } from "@ax-llm/ax";
import { axCreateMCPStdioTransport } from "@ax-llm/ax-tools";
// Create transport for an MCP server
const transport = axCreateMCPStdioTransport({
command: "npx",
args: ["-y", "@anthropic/mcp-server-filesystem"],
env: { HOME: process.env.HOME },
});
// Use with AxMCPClient
const client = new AxMCPClient(transport);
await client.init();
const tools = await client.getTools();
console.log("Available tools:", tools.map((t) => t.name));
AxJSRuntime
A sandboxed JavaScript runtime that can be used as a function tool.
AxJSRuntime is the runtime implementation and is designed
to work across Node.js/Bun-style backends, Deno, and browsers.
import { ai, ax } from "@ax-llm/ax";
import {
AxJSRuntime,
AxJSRuntimePermission,
} from "@ax-llm/ax";
// Create runtime with specific permissions
const runtime = new AxJSRuntime({
permissions: [
AxJSRuntimePermission.NETWORK,
AxJSRuntimePermission.TIMING,
],
});
// Use as a function tool
const llm = ai({ name: "openai", apiKey: process.env.OPENAI_APIKEY! });
const codeRunner = ax("task:string -> result:string", {
functions: [runtime.toFunction()],
});
const result = await codeRunner.forward(llm, {
task: "Calculate the factorial of 10",
});
Permissions
Control what the runtime can access:
| Permission | Description |
|---|---|
NETWORK | Network APIs (fetch, WebSocket, etc.) |
STORAGE | Client storage APIs (indexedDB, caches) |
CODE_LOADING | Dynamic script loading (importScripts) |
COMMUNICATION | Cross-context messaging (BroadcastChannel) |
TIMING | High-resolution timing (performance) |
WORKERS | Sub-worker spawning (Worker, SharedWorker) |
import { AxJSRuntimePermission } from "@ax-llm/ax";
const runtime = new AxJSRuntime({
permissions: [
AxJSRuntimePermission.NETWORK,
AxJSRuntimePermission.STORAGE,
],
});