A versatile Python CLI for interacting with various Large Language Models (LLMs) from multiple inference providers. Pure stdlib, Python 3.9+ β zero pip installs required.
The original bash script (
ai.sh) is also included for minimal environments.
Screencast.From.2026-02-03.23-47-45.webm
-
π Multi-Provider Support: Chat with models from:
- Google Gemini
- OpenRouter (access to multiple model providers)
- Groq
- Together AI
- Cerebras AI
- Novita AI
- Ollama (local or cloud)
-
π― Dynamic Model Selection: Fetches and lists available models from the chosen provider with optional filtering.
-
πΎ Conversation History: Remembers the last
Nmessages (configurable). Use/historyto recall conversation logs. -
π System Prompt: Define a system-level instruction for the AI (configurable).
-
β‘ Streaming Responses: AI responses are streamed token by token for a real-time feel.
-
π¨ Color-Coded Output: Differentiates between user input, AI responses, thinking, tools, errors, and info messages.
-
π Markdown Rendering: Bold, italic, inline code, fenced code blocks β rendered beautifully in terminal.
-
π’ LaTeX Rendering: Greek letters, superscripts, subscripts, fractions β Unicode (e.g.,
Ξ±Β²,Ο/2). -
πΌοΈ Vision Support: Attach images for vision-capable models.
/upload <path>β Attach an image/imageβ Show attached image info/clearimageβ Remove attached image
-
π Live Model Switching: Switch models mid-conversation with
/model(preserves history). -
π οΈ Tool/Function Calling: When enabled, the model can invoke local tools:
get_timeβ Current local date & timecalculatorβ Safe arithmetic evaluationfetch_urlβ Fetch & clean a web page (β€ 8,000 chars)wikipediaβ Search Wikipedia & return article text- Gemini also gets Google Search grounding when tools are on.
-
π§ Thinking/Reasoning Output: Toggle display of reasoning tokens from supported models with
/togglethinking. -
π Multi-line Input:
- End any line with
\to continue on the next line - Use
/paste [text]for bulk paste mode (end with---)
- End any line with
-
πΎ Session Management:
/save <name>β Save session to~/.chat_sessions/<name>.json/load <name>β Load a saved session/clearβ Delete all saved sessions
- Python 3.9 or newer
- No pip installs required β uses only Python standard library
-
Clone the repository:
git clone https://github.com/olumolu/consoleAI.git cd consoleAI -
Configure API Keys:
[!IMPORTANT] You MUST add your API keys. Open
ai.pyand locate theAPI_KEYSdictionary:API_KEYS: dict[str, str] = { "gemini": "", # https://aistudio.google.com/app/apikey "openrouter": "", # https://openrouter.ai/keys "groq": "", # https://console.groq.com/keys "together": "", # https://api.together.ai/settings/api-keys "cerebras": "", # https://cloud.cerebras.ai/ "novita": "", # https://novita.ai/ "ollama": "", # Leave blank for local Ollama }
Alternatively, set environment variables:
export GEMINI_API_KEY="your-key-here" export OPENROUTER_API_KEY="your-key-here" # etc.
-
Run it:
python ai.py <provider> [filter]...
python ai.py gemini
python ai.py groq llama
python ai.py openrouter claude
python ai.py together
python ai.py cerebras
python ai.py novita
python ai.py ollamaUse the optional [filter]... argument to narrow down model selection:
python ai.py openrouter 32b # Shows only models with "32b" in the name
python ai.py gemini pro # Shows only models with "pro" in the name| Command | Description |
|---|---|
/history |
Show conversation history |
/model |
Switch to a different model mid-chat |
/save <name> |
Save session to ~/.chat_sessions/<name>.json |
/load <name> |
Load a saved session |
/clear |
Delete all saved sessions |
/upload <path> |
Attach an image to your next message |
/image |
Show currently attached image info |
/clearimage |
Remove the attached image |
/paste [text] |
Multi-line paste mode (end with ---) |
/togglethinking |
Toggle reasoning/thinking output display |
/toggletools |
Toggle tool calling on/off |
/help |
Show available commands |
quit / exit |
End the session |
- End any line with
\to continue on the next line - Use
/pastefor bulk paste mode (end with---on its own line)
When enabled, the model can invoke local tools:
| Tool | Description |
|---|---|
get_time |
Get the current local date & time |
calculator |
Evaluate a mathematical expression safely |
fetch_url |
Fetch & clean a web page by URL |
wikipedia |
Search Wikipedia and return article text |
Note: Gemini also gets Google Search grounding when tools are enabled.
You can customize settings at the top of ai.py:
MAX_HISTORY_MESSAGES = 20 # Messages to keep in context
MAX_MESSAGE_LENGTH = 50_000 # Max chars per message
DEFAULT_TEMPERATURE = 0.7
DEFAULT_MAX_TOKENS = 3000
DEFAULT_TOP_P = 0.9
SYSTEM_PROMPT = "You are a helpful assistant running in a command-line interface."
ENABLE_THINKING_OUTPUT = True
MAX_TOOL_ITERATIONS = 10Runs on:
- macOS
- Linux
- Android (via Termux)
- Any system with Python 3.9+
The original bash-based version is still included for minimal environments.
- Multi-provider support (Gemini, OpenRouter, Groq, Together, Cerebras, Novita, Ollama)
- Dynamic model selection with filtering
- Conversation history +
/historycommand - Streaming responses
- Color-coded output
- Tool calling (Gemini)
- Session management (
/save,/load,/clear) - Image support (
/upload,/image,/clearimage) - Minimal dependencies: only
bash,curl,bc, andjq
- bash
- curl
- bc
- jq
chmod +x ai.sh
./ai.sh gemini
./ai.sh groq llama
./ai.sh openrouter claude
./ai.sh together
./ai.sh cerebras
./ai.sh novita
./ai.sh ollama| Scenario | Recommendation |
|---|---|
| Minimal environment (no Python) | ai.sh |
| Need Markdown/LaTeX rendering | ai.py |
| Need multi-line paste mode | ai.py |
| Need thinking/reasoning display | ai.py |
| Need live model switching | ai.py |
| Quick setup on servers | ai.sh |
| Android/Termux with limited storage | ai.sh |
Edit the API key section at the top of ai.sh:
GEMINI_API_KEY=""
OPENROUTER_API_KEY=""
GROQ_API_KEY=""
TOGETHER_API_KEY=""
CEREBRAS_API_KEY=""
NOVITA_API_KEY=""
OLLAMA_API_KEY=""Note
Out of scope: Image generation is not supported as this is a terminal-based chat application.