| Date | Update | |
|---|---|---|
| π | 27 May 2026 | v0.2.10 Released: Security, Edge & DX β secret scanning (gitleaks), SBOM (CycloneDX), pip-audit CI, sandboxed CodeExecutor (SubprocessSandbox + DockerSandbox), OAuth2/OIDC + RBAC + audit log, Docker + Helm, AWS Lambda (Mangum), Cloudflare Worker edge proxy, VSCode extension, Jupyter magics, live dashboard. See changelog |
| π | 23 May 2026 | v0.2.9 Released: Observability & Reliability β structured JSON logs + secret redaction, OTel samplers + canonical span spec, Prometheus histograms, SLO tracking, circuit breakers, bulkheads, jittered retries, chaos harness, fuzz suite, effgen loadtest CLI, Alertmanager rules. See changelog |
| πΌοΈ | 21 May 2026 | v0.2.8 Released: First-class multimodal input β image, audio, and video across 6 providers (Gemini, OpenAI, Groq, Anthropic, Together, HF). New multimodal preset, MultimodalDescribeTool, unified Message content schema, 5 cookbook walkthroughs. See changelog |
| π | 20 May 2026 | v0.2.7 Released: 31 prompt templates across 7 domains β research, coding, data/SQL, legal, medical, creative, business β with golden eval harness, interactive playground, and auto-generated gallery. See changelog |
| π | 19 May 2026 | v0.2.6 Released: 14 new tools β OCR, AudioTranscribe, ImageInfo, ImageCaption, PDF, DOCX, Excel, Weather, Geocode, Maps, EmailSMTP, EmailIMAP, SlackWebhook, DiscordWebhook. New presets: media, notify. 58+ built-in tools total. See changelog |
| π | 18 May 2026 | v0.2.5 Released: 13 new free tools β PubMed, ArXiv, SemanticScholar, RSS, News, YouTubeTranscript, YouTubeMetadata, Reddit, HackerNews, Translate, LanguageDetect, QRGenerate, QRRead. 44+ built-in tools total. See changelog |
| π | 14 May 2026 | v0.2.4 Released: ModelRouter with CostBased/LatencyBased/FirstAvailable policies, transparent provider failover, cross-process SQLite rate-limit coordination, persistent cost tracker + effgen cost dashboard CLI. See changelog |
| π | 4 May 2026 | v0.2.3 Released: 5 new cloud backends (Groq, Together AI, Fireworks, Replicate, HuggingFace Inference) β 9 providers total. Unified ProviderRegistry, effgen doctor auth check, backend parity matrix. See changelog |
| π | 28 Apr 2026 | v0.2.2 Released: Gemini 3.x/2.5/2.0 registry, thinking_budget, Google Search grounding, Files API, Gemini native tools (GoogleSearch, UrlContext, CodeExecution). Anthropic Claude 4.7 registry, extended thinking, prompt caching (cache_control), streaming polish, experimental native tools. See changelog |
| π | 25 Apr 2026 | v0.2.1 Released: Cerebras backend (4 free-tier models, streaming, native tool-calling, rate-limit coordinator, cost tracking) + OpenAI gpt-5/gpt-5.4-nano/o-series with reasoning_effort, prompt caching, structured outputs v2, and OpenAI native tools (web_search, code_interpreter, file_search). See changelog |
| π | 9 Apr 2026 | v0.2.0 Released: Major release β native tool calling, guardrails, multi-agent orchestration, RAG pipeline, 31 tools, eval framework, production API server, MLX Apple Silicon support, Python & TypeScript SDKs. See changelog |
| π | 8 Apr 2026 | MLX & Apple Silicon support merged (PR #4): Native Metal GPU acceleration via MLX & MLX-VLM backends, hardware detection, 5 Gradio GUI examples. pip install effgen[mlx] |
| π§ | 25 Mar 2026 | v0.1.3 Released: Verification hardening β smarter loop detection, "skip the tool" prompting, model-aware token counting, sub-agent depth limits, circuit breaker persistence. See changelog |
| π§ | 12 Mar 2026 | v0.1.2 Released: Test-driven hardening β 10 example agents, 19 bug fixes, cross-model compatibility matrix (11 models, 73% pass rate). See changelog |
| π | 6 Mar 2026 | v0.1.1 Released: Stabilization β fixed license/metadata consistency, improved error handling, added 6 examples, expanded test suite. See changelog |
| π | 1 Mar 2026 | v0.1.0 Released: Major feature release β 14 built-in tools, agent presets, plugin system, real streaming, memory integration, ACP/MCP protocols, CI/CD, and comprehensive test suite. See changelog |
| π§ | 3 Feb 2026 | v0.0.2 Released: vLLM backend fixes with automatic chat template support, GPU memory control, improved OOM error handling, and multi-model family compatibility |
| π | 2 Feb 2026 | Preprint available: EffGen: Enabling Small Language Models as Capable Autonomous Agents |
| π | 31 Jan 2026 | Initial release of effGen framework (v0.0.1) |
effGen transforms Small Language Models into powerful AI agents. While most frameworks require massive LLMs, effGen is optimized from the ground up for efficient, smaller models β delivering fast, capable agents without the compute overhead.
from effgen import Agent, load_model
from effgen.core.agent import AgentConfig
from effgen.tools.builtin import Calculator, PythonREPL
# Load a small but mighty model
model = load_model("Qwen/Qwen2.5-1.5B-Instruct", quantization="4bit")
# Create agent with tools
config = AgentConfig(
name="math_agent",
model=model,
tools=[Calculator(), PythonREPL()]
)
agent = Agent(config=config)
# Run computation
result = agent.run("What is 24344 * 334?")
print(f"Answer: {result.output}")Requires Python 3.10 or newer. Tested on Python 3.10, 3.11, 3.12, 3.13, 3.14.
pip install effgenpip install effgen[mlx] # Text models on Apple Silicon
pip install effgen[mlx-vlm] # Vision-Language models on Apple Siliconpip install effgen[vllm]pip install effgen[all] # installs vLLM + RAG + vector-DB + search + cloud-secrets + monitoring + β¦
flash-attnis not in[all]on purpose: its ownsetup.pyimportstorchbefore pip's isolated build environment has torch installed (a well-known upstream bug), so bundling it would breakpip install effgen[all]for everyone. Install it in two steps instead:
pip install effgen[all] # step 1: gets torch + the rest
pip install flash-attn --no-build-isolation # step 2: reuses the torch from step 1See docs/installation.md for the full guide.
git clone https://github.com/ctrl-gaurav/effGen.git
cd effGen
# Quick install
./install.sh
# Full install (includes vLLM + dev tools)
./install.sh --full
# Manual install
pip install -e .# Run a task
effgen run "What is the capital of France?"
# Interactive chat
effgen chat
# Start API server
effgen serve --port 8000
# List available presets
effgen presets
# Check infrastructure health
effgen health
# Interactive wizard
effgenfrom effgen import Agent, load_model
from effgen.core.agent import AgentConfig
from effgen.tools.builtin import Calculator
# Load model
model = load_model("Qwen/Qwen2.5-1.5B-Instruct", quantization="4bit")
# Configure agent
config = AgentConfig(
name="calculator_agent",
model=model,
tools=[Calculator()],
system_prompt="You are a helpful math assistant."
)
# Create and run
agent = Agent(config=config)
result = agent.run("Calculate 15% tip on $85.50")
print(result.output)from effgen import Agent, load_model
from effgen.core.agent import AgentConfig
from effgen.tools.builtin import Calculator
# Load MLX model β native Metal GPU, unified memory, no CPU-GPU transfer
model = load_model("LiquidAI/LFM2.5-1.2B-Instruct-MLX-8bit", engine="mlx")
config = AgentConfig(
name="mlx_agent",
model=model,
tools=[Calculator()],
)
agent = Agent(config=config)
result = agent.run("What is sqrt(144) + 2^10?")
print(result.output)|
π§ |
π |
π‘οΈ |
π |
π₯ |
πΌοΈ |
π |
π |
Observability & Reliability β production-ready telemetry in v0.2.9
effGen v0.2.9 ships the full observability and reliability stack. All telemetry is async/non-blocking β a failed export never fails inference.
Structured JSON logging with secret redaction. Every log line is a JSON object: {ts, level, module, event, attributes, trace_id, span_id}. The built-in Redactor strips OpenAI, Anthropic, Cerebras, Google, HF, Groq, Bearer, Slack, and Discord webhook patterns at the encoder β no secret ever appears in a log file.
from effgen.observability import get_logger
log = get_logger(__name__)
log.event("model.call.started", provider="cerebras", model="llama3.1-8b", cached_tokens=0)
# β {"ts": "2026-05-23T...", "level": "INFO", "event": "model.call.started", ...}Prometheus histograms + SLO tracking. effgen_model_call_latency_seconds, effgen_tool_call_latency_seconds, effgen_agent_iteration_latency_seconds, and effgen_tokens_total now expose histogram buckets at /metrics. SLOTracker maintains a rolling-window error budget and burn_rate() at /slo.
Configurable OTel samplers + canonical span spec. Choose AlwaysOn, AlwaysOff, TraceIdRatio(p), or RateLimited(per_second) in config. effgen/observability/spans.py is the single source of truth for every span attribute name β no more scattered string literals across adapters.
Reliability primitives. Four layers now protect every adapter call:
| Primitive | Class | What it does |
|---|---|---|
| Timeouts | ReliabilityConfig |
model_call=60s, tool_call=30s, http=20s β explicit on every httpx client |
| Retries | @retryable(Retry(...)) |
Jittered exponential backoff for 5xx / 429 / network errors; emits OTel events |
| Circuit breaker | CircuitBreaker |
CLOSED β OPEN β HALF_OPEN per provider; isolates misbehaving backends |
| Bulkhead | Bulkhead |
Per-provider concurrency + queue limit; prevents provider starvation |
Deterministic chaos harness. Inject NetworkTimeout, Http5xx, Http429, SlowResponse, PartialResponse, or MalformedJSON faults with Chaos(seed). Four canonical scenarios β fallback on 5xx, Retry-After honoured, timeout fires cleanly, AllProvidersFailed β all pass deterministically across 10 seeds.
Fuzz suite. Hypothesis runs 500 examples against all 66 BaseTool subclasses, random ContentPart message sequences, and the router's provider-availability logic. No unhandled exceptions, no secret leaks.
Load-testing CLI + Alertmanager rules.
# Run a 30-second load test (JSON report prints to stdout by default)
effgen loadtest --concurrency 10 --duration 30 --scenario fixed
# Or write the report to a file with --output
effgen loadtest --concurrency 10 --duration 30 --output report.json
# Integrate with Alertmanager
cp docs/observability/alert_rules.yaml /etc/prometheus/rules/effgen.yamlSee docs/observability/overview.md for full setup, docs/observability/metrics.md for all metric definitions, and docs/observability/alerting.md for Alertmanager integration.
First-class multimodal in v0.2.8 β image, audio & video across 6 providers
effGen v0.2.8 makes multimodal input a first-class citizen. Send images, audio clips, and short video to any vision-capable provider through a unified Message schema β the adapter handles the translation, not your code.
Image input β Gemini, OpenAI gpt-4o, Groq, Anthropic (code-only), Together, HF. Automatic resize/MIME validation via image_pre.py. Raises CapabilityNotSupportedError cleanly when the provider doesn't support vision.
Audio input β Gemini native inline audio, OpenAI Whisper transcription + gpt-4o audio, HF Inference ASR. Auto-downsamples to 16 kHz mono; chunks files over provider max duration. Anthropic raises CapabilityNotSupportedError.
Video input β Gemini native video for providers that accept raw video; frame-sampling fallback (ffmpeg) for all others. MissingSystemDependency with install hints when ffmpeg is absent.
Unified message schema β TextPart, ImagePart, AudioPart, VideoPart form a typed ContentPart union. Message.content is always a List[ContentPart]; backwards-compatible string constructor still works.
multimodal preset β create_agent("multimodal", model) wires Gemini Flash-Lite (primary) + OpenAI gpt-4o-mini (fallback) with ImageInfo, ImageCaption, OCR, AudioTranscribe, MultimodalDescribeTool, and the full tool suite.
5 cookbook walkthroughs β image Q&A, audio transcribe + reason, video summarize, OCR + LLM structured extraction, chart reading from an image. All in docs/cookbook/.
from effgen import image_from, audio_from, video_from
from effgen.core.messages import Message, Role
from effgen.presets import create_agent
from effgen import load_model
model = load_model("gemini-2.0-flash", provider="gemini")
agent = create_agent("multimodal", model)
# Image question
img = image_from("https://upload.wikimedia.org/wikipedia/commons/thumb/4/47/PNG_transparency_demonstration_1.png/240px-PNG_transparency_demonstration_1.png")
msg = Message(role=Role.USER, content=[img, "What is in this image?"])
result = agent.run_message(msg)
print(result.output)
# Audio transcription
aud = audio_from("/tmp/clip.mp3")
msg = Message(role=Role.USER, content=[aud, "Transcribe and summarize."])
result = agent.run_message(msg)# Multimodal preset
effgen run --preset multimodal "Describe this image" --image /tmp/photo.jpg
# Check capability
python -c "from effgen.models.capabilities import Capability; print(Capability.vision)"See docs/multimodal/overview.md for the full architecture and docs/cookbook/README.md for the cookbook index.
31 prompt templates in v0.2.7 β Prompt Library, Eval Harness & Interactive Playground
effGen v0.2.7 adds a curated, domain-organized Prompt Library with 31 reusable templates across 7 domains, paired with a golden evaluation harness and an interactive playground CLI. See the full gallery.
Research β literature review (zero-shot + CoT), paper summary, citation extraction, methodology critique.
Coding β code review, bug diagnosis, refactoring plan, test generation, docstring fill.
Data / SQL β NL-to-SQL with warnings, SQL explain, SQL optimize, data profile, ETL plan.
Legal β contract summary, clause classify, research brief. All templates include mandatory legal disclaimer.
Medical β symptom triage, drug interaction, medical literature synthesis. All templates include mandatory medical disclaimer.
Creative β story continuation (zero-shot + few-shot), poetry forms, character bio, world building.
Business β meeting summary, email draft (formal/casual), OKR generation, SWOT analysis, elevator pitch.
# Discover and browse
effgen prompts list
effgen prompts list --domain research
effgen prompts list --format markdown
# Inspect and evaluate
effgen prompts show research.literature_review.v1.cot
effgen prompts eval
effgen prompts eval --domain coding --live --model llama3.1-8b
# Interactive playground
effgen prompts playgroundfrom effgen.prompts.library import registry
p = registry.get("data.sql_from_nl.v1")
sql_prompt = p.template(
schema_ddl="CREATE TABLE orders (id INT, customer TEXT, total FLOAT, created_at DATE)",
question="Total revenue per customer this month",
dialect="postgresql",
)See docs/prompts/gallery.md for the full template catalog and docs/prompts/library.md for the framework overview.
14 new tools in v0.2.6 β OCR, Audio, Images, Documents, Geo/Weather & Communications
effGen v0.2.6 adds 14 new built-in tools across document, media, and communication categories, bringing the total to 58+. Two new presets (media, notify) are also introduced.
-
OCR β
OCRTool(Tesseract local + OCR.space fallback;OCRBackendUnavailableraised with install instructions).from effgen.tools.builtin.ocr import OCRTool result = OCRTool().execute({"operation": "extract", "image_path": "/tmp/scan.png"}) print(result["data"]["text"])
-
Audio Transcription β
AudioTranscribeTool(faster-whisper local; HF Inference fallback; GPU auto-detected).from effgen.tools.builtin.audio_transcribe import AudioTranscribeTool result = AudioTranscribeTool().execute({"operation": "transcribe", "audio_path": "/tmp/clip.mp3"})
-
Image Analysis β
ImageInfoTool(Pillow metadata, zero network) +ImageCaptionTool(vision-capable model router). -
Document Parsing β
PDFTool(pypdf + pdfplumber),DOCXTool(python-docx),ExcelTool(openpyxl + pandas). All added toresearchandgeneralpresets.from effgen.tools.builtin.pdf import PDFTool result = PDFTool().execute({"operation": "text", "path": "/tmp/paper.pdf"})
-
Geo / Weather β
WeatherTool(Open-Meteo, free, no auth),GeocodeTool(Nominatim/OSM, 1 req/s),MapsTool(staticmap PNG renderer).from effgen.tools.builtin.geocode import GeocodeTool result = GeocodeTool().execute({"operation": "geocode", "address": "San Francisco, CA"})
-
Email & Webhooks β
EmailSMTPTool,EmailIMAPTool,SlackWebhookTool,DiscordWebhookTool. All in newnotifypreset. Webhook URLs are redacted in logs.from effgen.tools.builtin.slack_webhook import SlackWebhookTool result = SlackWebhookTool().execute({"operation": "post", "text": "Deploy complete!"})
See the full tool gallery for quickstart snippets for all 58+ tools.
13 new free tools in v0.2.5 β Research, News, YouTube, Social, Translation & QR
effGen v0.2.5 adds 13 free, no-auth-required tools, bringing the built-in tool count above 44. All tools integrate with the research and general presets.
-
Academic Research β
PubMedTool(NCBI, 3 ops, built-in rate limiting),ArXivTool(Atom feed + PDF download),SemanticScholarTool(search + citations + references).from effgen.tools.builtin.arxiv import ArXivTool tool = ArXivTool() result = tool.execute({"operation": "search", "query": "transformer attention", "max_results": 5})
-
News & RSS β
RSSFeedTool(any RSS/Atom feed),NewsTool(BBC, Reuters, HN, NPR, etc. + optional NewsAPI.org key).from effgen.tools.builtin.news import NewsTool result = NewsTool().execute({"operation": "top_headlines", "category": "technology"})
-
YouTube β
YouTubeTranscriptTool(captions without Google API key),YouTubeMetadataTool(via yt-dlp, public content only). -
Social Media β
RedditTool(public JSON, no OAuth),HackerNewsTool(Firebase API, no auth). -
Translation & Language Detection β
TranslateTool(LibreTranslate + offline argostranslate fallback),LanguageDetectTool(55+ languages, fully offline). -
QR Codes β
QRGenerateTool(generate locally),QRReadTool(decode from image, with OpenCV fallback if zbar is unavailable).
See the full tool gallery for quickstart snippets for all 58+ tools.
Top 5 features from v0.2.4 β ModelRouter & Cost Optimizer
-
PolicyBasedRouterβ composable routing engine with three built-in policies. Pick the cheapest provider within your budget, the fastest under your SLA, or simply the first available β and combine them freely.from effgen import PolicyBasedRouter, RoutingContext, CostBasedPolicy, LatencyBasedPolicy from effgen.models.capabilities import Capability router = PolicyBasedRouter(policies=[LatencyBasedPolicy(), CostBasedPolicy()]) ctx = RoutingContext( prompt_tokens_estimate=500, user_budget_usd=0.01, latency_budget_ms=3000, required_capabilities={Capability.chat}, ) decision = router.route(ctx) print(decision.chosen) # e.g., ProviderModelPair("cerebras", "llama3.1-8b") print(decision.eliminated) # [(pair, reason), ...] β fully explainable
-
Transparent failover β
route_and_execute(ctx, fn)retries on rate-limits / 5xx / timeouts and seamlessly moves to the next-best provider. Each hop fires aRouterEventto registered subscribers.from effgen import load_model def call_provider(pair): model = load_model(pair.model_id, provider=pair.provider) return model.generate("Hello!").text router.subscribe( lambda event: print( f"Failover: {event.from_provider}/{event.from_model} " f"β {event.to_provider}/{event.to_model}" ) ) result = router.route_and_execute(ctx, call_provider)
-
Cross-process SQLite rate-limit coordination β share a single rate-limit budget across multiple workers:
from effgen import RateLimitCoordinator, SQLiteRateLimitStore store = SQLiteRateLimitStore("~/.effgen/rate_limits.sqlite") coordinator = RateLimitCoordinator(storage=store) # WAL-mode, BEGIN IMMEDIATE
-
Persistent cost tracking +
effgen costCLI β every API call persists to SQLite; query spend instantly:effgen cost today # per-provider per-model table effgen cost week # rolling 7-day view effgen cost by-provider # lifetime totals effgen cost set-budget 1.0 # set $1/day cap (BudgetExceededError at 100%)
-
Fully explainable decisions + budget guard β
RouterDecisionrecords every eliminated provider and why ("rate_limited","no_key","cost_exceeds_budget","latency_exceeds_sla"). Configure a daily spend cap; the router automatically fails over to a free-tier provider when the budget is hit.
Top 5 features from v0.2.3
-
5 new cloud backends β
GroqAdapter,TogetherAdapter,FireworksAdapter,ReplicateAdapter,HFInferenceAdapterβ each with streaming, native tools, rate-limit coordination, and cost tracking. 9 providers total.model = load_model("llama-3.1-8b-instant", provider="groq") model = load_model("Qwen/Qwen2.5-72B-Instruct", provider="hf")
-
Unified ProviderRegistry β
list_providers(),list_models(provider),lookup(model_id)consolidated across all 9 adapters.AmbiguousModelErroron bare IDs shared across providers. -
effgen doctorβ new CLI command showing which providers have API keys configured. -
Backend parity matrix β canonical agentic task ("(17 Γ 23) + sqrt(144) = 403") runs identically across all providers; streaming and error surfaces verified uniform. See
docs/providers/parity.md. -
HuggingFace Router support β
HFInferenceAdapterwith 124-model dynamic catalog,refresh_models()+check_drift(),ModelUnavailableErrorwithsuggest_alternatives(), and custom Inference Endpoint URL.
Top 5 features from v0.2.2 (and earlier)
-
Gemini 3.x/2.5/2.0 + Gemma families β full model registry with correct context windows, output limits, and feature flags; SDK migrated to
google-genai>=1.0.0. -
Gemini
thinking_budgetβ activate Gemini's internal reasoning withGenerationConfig(thinking_budget=8192, include_thoughts=True); thinking trace surfaces inModelResponse.metadata["thinking"]. -
Gemini grounding + Files API β
GenerationConfig(grounding=True)injects Google Search;upload_file(path)passes PDFs/images to the model with a 2 GiB guard. -
Gemini native tools β
GoogleSearchTool,GeminiUrlContextTool,GeminiCodeExecutionToolactivate server-side Gemini capabilities in any Agent. Parallel function calls handled automatically. -
Anthropic Claude 4.7, extended thinking, prompt caching β full Claude 4.x registry;
GenerationConfig.thinkingfor extended reasoning;mark_cached()+AgentConfig.cache_system_prompt/cache_toolsforcache_control; cache tokens surfaced in usage.
Top 5 features from v0.2.1
-
Cerebras backend β 4 free-tier models (
llama3.1-8b,qwen-3-235b-a22b-instruct-2507,gpt-oss-120b,zai-glm-4.7) with streaming, native function-calling, automatic RPM/TPM/RPD/TPD rate-limit coordination, and per-call cost tracking.pip install effgen[cerebras]and setCEREBRAS_API_KEY.from effgen import load_model model = load_model("llama3.1-8b", provider="cerebras")
-
OpenAI gpt-5 / gpt-5.4-nano / o-series reasoning models β full registry coverage with
reasoning_effort(minimal/low/medium/high) andmax_reasoning_tokensonGenerationConfig. Reasoning payloads are routed only to reasoning-capable models. -
OpenAI prompt caching surfacing β
cached_input_tokensexposed onModelResponse.usage;AgentConfig.stable_system_prompt=Truekeeps the system prompt anchored at position 0 to maximize OpenAI's automatic β₯1024-token prefix cache hit rate. -
Structured outputs v2 β
OpenAIAdapter.generate_structured()with strict JSON Schema;to_openai_schema(pydantic_model)inlines$refs and forcesadditionalProperties: false; refusals raiseModelRefusalError. -
OpenAI native tools β
OpenAIWebSearchTool,OpenAICodeInterpreterTool,OpenAIFileSearchToolroute through OpenAI's Responses API and compose with effGen's local tools in the same agent.ToolIncompatibleErrorfires at Agent init when paired with a non-OpenAI model.
Top 5 features from v0.2.0
-
Native Tool Calling β Qwen, Llama, Mistral models use built-in function calling instead of text parsing. Set
tool_calling_mode="native"or"hybrid". Structured JSON/Pydantic output validation included. -
Guardrails & Safety β PII detection, prompt injection blocking, toxicity filtering, tool permissions. One-liner:
get_guardrail_preset("strict"). -
Production RAG Pipeline β Ingest PDF/DOCX/HTML/Markdown, semantic+BM25 hybrid search, reranking, inline citations.
create_agent("rag", model, knowledge_base="./docs/"). -
Production API Server β OpenAI-compatible
/v1/chat/completions, request queuing, agent pooling, multi-tenancy, API keys. Drop-in OpenAI replacement with local SLMs. -
Apple Silicon Native β MLX & MLX-VLM backends for M1/M2/M3/M4. Metal GPU acceleration, unified memory.
pip install effgen[mlx].
Get started instantly with ready-to-use agent configurations:
from effgen import load_model
from effgen.presets import create_agent
model = load_model("Qwen/Qwen2.5-3B-Instruct", quantization="4bit")
# One-line agent creation
math_agent = create_agent("math", model) # Calculator + PythonREPL
research_agent = create_agent("research", model) # WebSearch + URLFetch + Wikipedia
coding_agent = create_agent("coding", model) # CodeExecutor + PythonREPL + FileOps + Bash
general_agent = create_agent("general", model) # All tools
rag_agent = create_agent("rag", model, knowledge_base="./docs/") # RAG pipeline
minimal_agent = create_agent("minimal", model) # Direct inference, no tools# CLI preset support
effgen run --preset math "What is sqrt(144)?"
effgen run --preset research "Tell me about quantum computing"|
π’ |
π |
π» |
π |
π |
π |
π― |
|
π₯οΈ |
π€οΈ |
π |
π |
π |
π |
π |
|
π¬ |
π |
π |
π‘ |
π° |
|
π¬ |
|
π€ |
π₯ |
π |
π |
π± |
π· |
β¦ |
effGen ships a curated catalog of 31 reusable prompt templates across 7 domains, each with a golden evaluation test and CLI access. Browse the full gallery.
| Domain | Templates | Variants |
|---|---|---|
| Research | 5 | zero-shot, CoT, structured, tool-augmented |
| Coding | 5 | zero-shot, CoT, structured, few-shot, tool-augmented |
| Data / SQL | 5 | zero-shot, CoT, structured, few-shot, tool-augmented |
| Legal | 3 | zero-shot, structured, tool-augmented |
| Medical | 3 | structured, tool-augmented |
| Creative | 5 | zero-shot, CoT, structured, few-shot |
| Business | 5 | zero-shot, CoT, structured, few-shot |
effgen prompts list # browse all 31 templates
effgen prompts show research.paper_summary.v1 # inspect a template
effgen prompts eval # run golden eval (no model needed)
effgen prompts playground # interactive REPLfrom effgen.prompts.library import registry
# Get and render a template
p = registry.get("coding.code_review.v1")
prompt = p.template(code="def add(a, b): return a + b", language="python")
# Search templates
cot_prompts = registry.search(variant="cot")
sql_prompts = registry.search(domain="data")Legal and medical templates enforce a mandatory non-advice disclaimer in every rendered output, verified by unit tests.
# Visual agent & tool development
python examples/basic/chat_gui_mlx.py # MLX Chat β streaming chat with Apple Silicon models (port 7860)
python examples/basic/agent_viz_mlx.py # Agent Visualizer β step-by-step reasoning + code editor (port 7860)
python examples/basic/tool_builder_gui.py # Tool Builder β visually create custom tools (port 7863)
python examples/basic/tool_tester_gui.py # Tool Tester β browse, test, inspect all 58+ tools (port 7864)python examples/basic/basic_agent_mlx.py # Basic MLX agent with calculator
python examples/basic/chat_gui_mlx.py --autoload # Chat GUI with auto model loading
python examples/basic/agent_viz_mlx.py --autoload # Agent visualizer with auto model loadingpython examples/basic/qa_agent.py # Q&A agent (no tools)
python examples/basic/calculator_agent.py # Math with Calculator + PythonREPL
python examples/tools/advanced_multi_tool_agent.py # 5 tools + fallback chains
python examples/tools/file_operations_agent.py # File read/write/search
python examples/tools/coding_agent.py # Code execution + iteration
python examples/advanced/conversational_agent.py # Multi-turn memory
python examples/advanced/advanced_streaming_agent.py # Token streaming with callbacks
python examples/advanced/data_processing_agent.py # JSON & data pipelines
python examples/advanced/multi_agent_pipeline.py # Multi-agent orchestration
python examples/advanced/error_recovery_agent.py # Error handling patternspython examples/basic/basic_agent.py # Basic agent (Transformers)
python examples/basic/basic_agent_vllm.py # Basic agent (vLLM - 5-10x faster)
python examples/plugins_presets/preset_agents.py # Ready-to-use agent presets
python examples/web_retrieval/streaming_agent.py # Simple streaming
python examples/web_retrieval/memory_agent.py # Simple multi-turn memory
python examples/tools/multi_tool_agent.py # Simple multi-tool
python examples/web_retrieval/weather_agent.py # Weather via Open-Meteo (free)
python examples/plugins_presets/plugin_example.py # Custom tool plugins
python examples/web_retrieval/web_agent.py # Web search agent
python examples/web_retrieval/retrieval_agent.py # RAG-based retrievalπ See examples/compatibility_matrix.md for model compatibility across all agents.
π More Examples
from effgen import Agent, load_model
from effgen.core.agent import AgentConfig
from effgen.tools.builtin import Calculator, WebSearch, PythonREPL
model = load_model("Qwen/Qwen2.5-3B-Instruct")
config = AgentConfig(
name="research_agent",
model=model,
tools=[Calculator(), WebSearch(), PythonREPL()],
system_prompt="You are a research assistant."
)
agent = Agent(config=config)
result = agent.run("Search for the population of Tokyo and calculate what percentage it is of Japan's total population")from effgen import Agent, load_model
from effgen.core.agent import AgentConfig
from effgen.tools.builtin import Calculator
model = load_model("Qwen/Qwen2.5-3B-Instruct", quantization="4bit")
agent = Agent(config=AgentConfig(
name="stream_demo", model=model,
tools=[Calculator()], enable_streaming=True
))
for token in agent.stream("What is 2 + 2?"):
print(token, end="", flush=True)agent = Agent(config=AgentConfig(
name="memory_demo", model=model,
tools=[], enable_memory=True
))
agent.run("My name is Alice and I'm working on quantum computing.")
result = agent.run("What's my name and what am I working on?")
# β "Your name is Alice and you're working on quantum computing."from effgen.tools.builtin import Retrieval
retrieval_tool = Retrieval(knowledge_base_path="./docs")
config = AgentConfig(name="qa_agent", model=model, tools=[retrieval_tool])
agent = Agent(config=config)
result = agent.run("What does the documentation say about configuration?")effGen supports 9 cloud inference providers + 4 local backends, tested across 11+ model families:
| Backend | Platform | Install | Best For |
|---|---|---|---|
| MLX | Apple Silicon (M1/M2/M3/M4) | effgen[mlx] |
Native Metal GPU, unified memory, 4/8-bit quantization |
| MLX-VLM | Apple Silicon | effgen[mlx-vlm] |
Vision-Language models (Qwen2-VL, LLaVA, Phi-3 Vision, 30+ architectures) |
| vLLM | NVIDIA GPU | effgen[vllm] |
High-throughput batch inference |
| Transformers | Any (CPU/GPU) | (bundled) | Universal compatibility, local models |
| OpenAI | Cloud API | (bundled) | gpt-5/gpt-5.4/o-series, reasoning_effort, structured outputs, native tools |
| Anthropic | Cloud API | (bundled) | Claude 4.7/4.x, extended thinking, prompt caching, native tools |
| Google Gemini | Cloud API | (bundled) | Gemini 3.x/2.5/2.0, thinking_budget, grounding, Files API, native tools |
| Cerebras | Cloud API | effgen[cerebras] |
4 free-tier models (llama3.1-8b, qwen-3-235b), ultra-low latency |
| Groq | Cloud API | effgen[groq] |
16 models (llama-3.3-70b, mixtral, qwen3-32b), ultra-fast free-tier inference |
| Together AI | Cloud API | effgen[together] |
163-model catalog (llama, deepseek, qwen, mistral), per-model pricing |
| Fireworks | Cloud API | effgen[fireworks] |
80 chat models (54 tool-capable), serverless + dedicated |
| Replicate | Cloud API | effgen[replicate] |
38 models, async run-poll, SSE streaming, compute-second billing |
| HuggingFace | Cloud API | effgen[hf] |
124-model HF Router catalog, custom Inference Endpoints, free serverless tier |
# See which API keys are configured
effgen doctorfrom effgen import load_model, Agent
from effgen.core.agent import AgentConfig
from effgen.tools.builtin import Calculator
# Any of the 9 cloud providers
model = load_model("llama-3.1-8b-instant", provider="groq") # Groq
# model = load_model("meta-llama/Llama-3.3-70B-Instruct-Turbo", provider="together")
# model = load_model("Qwen/Qwen2.5-72B-Instruct", provider="hf")
agent = Agent(config=AgentConfig(name="agent", model=model, tools=[Calculator()]))
result = agent.run("What is (17 * 23) + sqrt(144)?")
print(result.output) # β 403| Model | Size | Compatibility |
|---|---|---|
| LFM2.5-1.2B-Instruct-MLX-8bit | 1.2B | Apple Silicon optimized, fast agentic |
| Qwen2.5-1.5B-Instruct | 1.5B | 10/10 agents pass |
| Qwen2.5-3B-Instruct | 3B | 10/10 agents pass (recommended default) |
| Phi-4-mini-instruct | 3.8B | 10/10 agents pass |
| Qwen3-1.7B | 1.7B | 9.5/10 |
| Qwen2.5-7B-Instruct | 7B | 9/10 |
| Llama-3.2-3B-Instruct | 3B | 8.5/10 |
Full matrix with 11 models x 10 agents: compatibility_matrix.md
|
π³ |
π‘οΈ |
β‘ |
π For security policies and vulnerability reporting, see SECURITY.md
effGen v0.2.10 ships production-ready deployment recipes for every major target:
Multi-stage build with a non-root user, read-only filesystem, and /health healthcheck. See docs/deploy/docker.md.
docker build -f deploy/docker/Dockerfile -t effgen:0.2.10 .
docker run -p 8000:8000 --env-file .env effgen:0.2.10
curl http://localhost:8000/healthFull Helm chart with Deployment, Service, Ingress, NetworkPolicy, PDB, and HPA (scales on CPU + effgen_model_call_latency_seconds). See docs/deploy/kubernetes.md.
helm lint deploy/k8s/helm/effgen/
helm install effgen deploy/k8s/helm/effgen/ --set image.tag=0.2.10Mangum adapter wrapping the FastAPI app. Cold start < 3 s; warm call < 100 ms. SAM template included. See docs/deploy/lambda.md.
cd deploy/aws_lambda
sam build && sam deploy --guidedThin edge proxy handling CORS, Bearer JWT auth, and KV-backed rate limiting before forwarding to your backend. See docs/deploy/cloudflare.md.
cd deploy/cloudflare
wrangler deploy # staging: wrangler deploy --env stagingPrompt-template completion, inline "Run" code lens on LibraryPrompt definitions, and hover docs β all from the effGen registry. See docs/dx/vscode.md.
cd tools/vscode-effgen
npm ci && npm run compile
# Install: Extensions β Β·Β·Β· β Install from VSIX β vscode-effgen-*.vsix%load_ext effgen.jupyter
%effgen_chat "What is 17 * 23?"
%%effgen_agent general
Summarise the top HackerNews stories today and rank them by interest.
%effgen_metricsSee docs/dx/jupyter.md.
The API server serves a real-time SPA at /dashboard (no auth required). Panels: span stream (SSE), Prometheus metrics, recent agent runs with token counts and cost, SLO burn rates. See docs/dx/dashboard.md.
EFFGEN_DEV_MODE=1 effgen serve --port 8000
open http://localhost:8000/dashboardGitleaks pre-commit hook + CI workflow (secret-scan.yml) catch secrets before they reach the repo. Install the hook once:
pip install pre-commit && pre-commit installCodeExecutor defaults to SubprocessSandbox (rootless user-namespace, network blocked, isolated /tmp) or DockerSandbox when Docker is available. To opt out (not recommended):
EFFGEN_SANDBOX_BACKEND=off effgen run ... # loud warning emittedProtect your API server with OAuth2/OIDC (any OIDC provider β Auth0, Keycloak, Cognito):
export EFFGEN_OIDC_ISSUER=https://your-tenant.auth0.com/
export EFFGEN_OIDC_CLIENT_ID=your-client-id
export EFFGEN_OIDC_JWKS_URI=https://your-tenant.auth0.com/.well-known/jwks.json
effgen serve --port 8000See docs/server/auth.md, docs/server/rbac.md, and docs/server/audit.md.
If you use effGen in your research, please cite our paper:
@software{srivastava2026effgen,
title={effGen: Enabling Small Language Models as Capable Autonomous Agents},
author={Gaurav Srivastava and Aafiya Hussain and Chi Wang and Yingyan Celine Lin and Xuan Wang},
year={2026},
eprint={2602.00887},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2602.00887},
}Apache License 2.0 β see LICENSE for details.