AssistantHub is a self-hosted RAG (Retrieval-Augmented Generation) data and chatbot platform. It enables you to create AI assistants that can answer questions grounded in your uploaded documents, powered by vector embeddings, hybrid search, and large language models. Upload PDFs, text files, HTML, and more -- AssistantHub automatically extracts content, summarizes, chunks, generates embeddings, and makes it searchable. Your assistants retrieve relevant context at query time and generate accurate, citation-ready responses.
AssistantHub ships as a fully orchestrated Docker Compose stack -- one command brings up the entire platform, including the LLM inference engine, document processing pipeline, vector database, object storage, and a browser-based management dashboard.
v0.16.0 adds CIFS and NFS file-server crawler support alongside the existing web crawler, attached-document chat selection for assistant collections, and the first disabled-by-default server-side tool policy surface for model-directed collection, Verbex, S3, and Tavily web-search tools, including optional dedicated tool-routing endpoints.
- CIFS and NFS crawlers -- Crawl plans can target web sites, CIFS/SMB file shares, or NFS exports through the shared crawler lifecycle.
- Shared crawler architecture --
CrawlerBasenow supports lazy content retrieval so web, CIFS, and NFS crawlers share delta, upload, document creation, ingestion, scheduling, and retention behavior. - Repository settings contract -- CIFS and NFS settings are mapped from View's
DataRepositoryfields and exposed through REST, OpenAPI, Postman, the dashboard, and C#/TypeScript/Python SDKs. - Attached-document chat -- Public assistant chat clients can list completed documents from the assistant collection and send
attached_document_idsto constrain RAG retrieval to selected documents for a turn. Chat clients can also send one-offlocal_attachmentsfrom the user's machine for server-side text extraction and prompt context without adding those files to the assistant collection. - Tool-call policy foundation -- Assistant settings now include administrator-controlled tool policy JSON, effective tool previews, validation endpoints, SDK/Postman/OpenAPI coverage, and disabled-by-default Tavily web-search configuration.
- Dedicated tool-routing endpoint -- Assistant Settings can optionally route model tool-decision turns to a separate explicit tool-capable completion endpoint while final answers still use the response inference endpoint. Startup migrations and provider scripts add
ToolRoutingInferenceEndpointId. - Tool-call trace history -- Non-streaming tool calls are persisted as redacted
AssistantToolCallRecordrows linked to trace, request history, and chat history, with admin REST/Postman/OpenAPI/SDK coverage under/v1.0/assistants/{assistantId}/tool-calls. Trace retention followsRequestHistory.RetentionDays.
- Verbex deployment plumbing -- Docker Compose includes
jchristn77/verbex-server:v0.1.0andjchristn77/verbex-dashboard:v0.1.0, backed by the shared PostgreSQL service. - Inverted-index APIs -- AssistantHub now has proxied REST routes for indices, index records, index search, and collection search.
- Collection search API -- AssistantHub now marshals RecallDB collection search through
POST /v1.0/collections/{collectionId}/search. - Dashboard search surfaces -- Artifacts includes
Collections > Search,Indices,Indices > Records, andIndices > Searchwith filters, metadata editing, result details, scoring, and raw JSON inspection. - Implementation plan -- The remaining whole-product work is tracked in
archive/SEARCH.md.
- Assistant Analytics dashboard -- New
Assistants > Analyticspage with per-assistant charts for request volume, success/failure, latency percentiles, stage duration, endpoint/model usage, provider timings, token throughput, retrieval fanout, slowest requests, and feedback trend, scoped to retained Assistant History rows. - Analytics REST API -- Added
GET /v1.0/assistants/{assistantId}/analytics/*endpoints for overview, time series, stage buckets, endpoint summaries, slowest requests, and feedback analytics. - Efficient assistant-scoped telemetry queries --
chat_history_performance_eventsnow carriesassistant_id, with startup migrations and provider scripts adding backfill and indexes for SQLite, PostgreSQL, MySQL, and SQL Server. - SDK and MCP coverage -- C#, JavaScript/TypeScript, Python, Postman, OpenAPI, and MCP all expose the new assistant analytics read APIs.
- Schema migration -- Existing deployments can add analytics indexes and backfill performance events with the matching assistant analytics provider script.
Implementation planning notes for Assistant Analytics are archived in archive/ASSISTANT_ANALYTICS.md.
- Assistant performance telemetry -- Chat history now stores
TraceId,RequestHistoryId,PerformanceSchemaVersion, and serializedPerformanceJsonwith per-stage timings, including safe aggregate tool-call counts and duration metadata for tool-enabled turns. The dashboard slowest-request table surfaces aggregate tool failures, denials, truncation counts, and slowest tool names for admin diagnosis. - Provider-agnostic hot-path detail -- Final inference telemetry captures endpoint limiter wait, request-to-headers, headers-to-first-token, first-token-to-last-token, token counts, status, endpoint/model metadata, and provider-native metrics when available.
- Request/history correlation -- Request history stores
TraceIdandChatHistoryId, allowing assistant request detail views to drill into linked chat timing. - Dashboard drill-down -- History details and request-history details now include expanded performance timing tables for cold-load and hot-load analysis.
- Schema migration -- Existing deployments can add the new telemetry columns and
chat_history_performance_eventstable with the matchingmigrations/010_upgrade_to_v0.12.0.*.sqlprovider script. - SDK and API surface -- C#, JavaScript, and Python SDKs include the new history correlation fields and telemetry DTOs.
- Specialized RAG utility endpoints -- Assistant Settings now has dedicated dropdowns for retrieval gate, query rewrite, and re-rank inference endpoints.
- Hot-path endpoint honoring -- Chat execution uses those dedicated endpoints for their matching utility calls and falls back to the response inference endpoint when the specialized selector is empty.
- API and SDK support --
RetrievalGateInferenceEndpointId,QueryRewriteInferenceEndpointId, andRerankInferenceEndpointIdare available through REST, OpenAPI, Postman, and the C#, JavaScript, and Python SDK models. - Migration scripts -- Existing deployments can add the new assistant settings columns with the matching
migrations/009_upgrade_to_v0.11.0.*.sqlprovider script.
- API Explorer -- Browse the live AssistantHub route surface from
/openapi.json, execute management APIs directly from the dashboard, inspect responses, and generate reusable cURL or JavaScript snippets. - Assistant API explorer mode -- Exercise assistant-facing APIs end-to-end from the dashboard, including public metadata, thread creation, chat, compaction, generation, feedback, and distinct labels or tags.
- Request History -- Capture and search request and response metadata across system APIs and assistant traffic with replay into the explorer, retention cleanup, body truncation, and redaction controls.
- Monitoring surfaces in the dashboard --
API ExplorerandRequest Historyare included directly in the product under theMonitoringsection for day-to-day operator use. - Migration script -- Existing deployments can add the new request-history table with
migrations/008_upgrade_to_v0.10.0.sql.
AssistantHub now includes two operator-facing tools in the dashboard:
Request Historyfor searchable HTTP request and response observabilityAPI Explorerfor executing system APIs and assistant-facing APIs against the live server
Operational notes:
- Request-history capture is configurable under
RequestHistorysettings inassistanthub.json - Sensitive headers and selected JSON fields are redacted before persistence
- Request and response bodies are size-limited and binary payloads are summarized rather than stored in full
- The explorer uses the runtime
/openapi.jsonroute instead of a stale checked-in spec as its source of truth
- Slack integration per assistant -- Configure Slack connectivity directly on assistant settings with
Enable Slack, app token, bot token, channel ID, start-of-message indicator, and draft connectivity verification. - Shared chat execution rail -- Slack requests reuse the same retrieval, compaction, citation, inference, and history flow as AssistantHub chat instead of a separate inference path.
- Thread-aware Slack replies -- Incoming Slack messages map to deterministic AssistantHub threads and replies are posted back to the originating Slack thread.
- Slack verification API and dashboard flow -- Added
POST /v1.0/assistants/{assistantId}/settings/slack/verifyplus dashboard support for testing draft values before save. - Chat history origin tracking --
chat_history.originnow records request source such asweborslack. - Migration script:
migrations/007_upgrade_to_v0.9.0.sql
AssistantHub supports per-assistant Slack connectivity through Assistant Settings.
- Enable Slack on an assistant and provide:
App Token(xapp-...)Bot Token(xoxb-...)Channel IDStart-of-Message Indicator
- Use
Verify Connectivityin the dashboard before saving - AssistantHub maintains one Socket Mode connection per Slack-enabled assistant
- In configured channels, messages are processed when they start with the configured indicator or mention the bot
- Direct messages to the bot are also supported
- Slack conversations reuse the same non-streaming chat execution rail as AssistantHub chat, including retrieval, citations, compaction, and history persistence
- Slack responses are posted back into the originating Slack thread
Operational notes:
- Slack tokens are stored in the AssistantHub database in plaintext, so rely on your deployment's at-rest protections
- The Slack app must have Socket Mode enabled and be invited to any private channels it should service
- AssistantHub consumes the
EasySlackNuGet package at version1.0.1
- Metadata filtering for chat completions -- Filter RAG retrieval to only return documents matching specified labels and/or tags. Labels are simple string lists (required/excluded). Tags are key-value conditions supporting operators:
Equals,NotEquals,Contains,StartsWith,EndsWith,GreaterThan,LessThan,IsNull,IsNotNull. Filters can be configured as defaults on an assistant (applied to every conversation) and/or supplied per-request via themetadata_filterfield on the chat completion request body. When both are present, they are merged (required labels/tags unioned, excluded labels/tags unioned). - Per-request
metadata_filteron chat completions -- ThePOST /v1.0/assistants/{id}/chatendpoint accepts an optionalmetadata_filterobject in the request body. This is an AssistantHub extension to the OpenAI-compatible chat schema. Clients that omit it get standard unfiltered retrieval. Example:{ "messages": [{"role": "user", "content": "What were the Q4 results?"}], "metadata_filter": { "required_labels": ["finance", "quarterly-report"], "excluded_labels": ["draft"], "required_tags": [ {"key": "department", "condition": "Equals", "value": "accounting"} ] } } - Assistant-level default filters -- New
RetrievalLabelFilterandRetrievalTagFiltersettings on each assistant. Configure via the dashboard (Retrieval Filters section) or API. These defaults are applied to every chat retrieval for that assistant. - Filter discovery endpoints -- Four new API endpoints to discover available filter values:
GET /v1.0/collections/{collectionId}/labels/distinct(admin)GET /v1.0/collections/{collectionId}/tags/distinct(admin)GET /v1.0/assistants/{assistantId}/labels/distinct(public)GET /v1.0/assistants/{assistantId}/tags/distinct(public)
- Dashboard -- Retrieval Filters configuration in assistant settings, collapsible metadata filter panel in the chat UI for per-session filtering, and metadata filter display in the history detail view
- Auditing -- The effective merged filter is stored in
ChatHistory.MetadataFilterand displayed in the History View modal - Docker image tags updated to v0.7.0
- See CHANGELOG.md for full details
- LLM-based re-ranking -- After initial retrieval, an LLM scores each chunk's relevance to the user's query and filters out low-quality results before context injection
- See CHANGELOG.md for full details
- Native web crawlers -- Built-in web crawling engine that automatically discovers, retrieves, and ingests website content. Configure a URL, schedule, and ingestion rule, and AssistantHub handles the rest
- Crawl plans and scheduling -- Persistent crawler configurations with automatic recurring execution on configurable intervals (one-time, minutes, hours, days, weeks)
- Delta-based crawling -- Subsequent crawls compare against the previous enumeration to process only new, changed, and deleted content
- Document traceability -- Every crawled document is linked back to its source crawler and operation. Filter the Documents view by crawler to see all ingested content
- On-demand controls -- Start, stop, test connectivity, and preview discovered content from the dashboard or API
- Full dashboard integration -- Crawlers management view, operations viewer with statistics, enumeration browser, and Documents view integration
- 16 new API endpoints -- Complete CRUD, lifecycle control, statistics, and enumeration access for crawl plans and operations
- See CHANGELOG.md for full details
- Query rewrite -- LLM-based query rewriting for improved retrieval recall
- Full multi-tenancy -- Row-level tenant isolation, three-tier authorization, auto-provisioning, tenant-scoped routes
- See CHANGELOG.md for full details
- Initial release with multi-assistant platform, automated document ingestion, flexible search modes, streaming chat, and browser-based dashboard
- See CHANGELOG.md for full details
- Assistants -- Create and manage multiple AI assistants, each with their own configuration, personality, and knowledge base.
- Documents -- Upload documents (PDF, text, HTML, and more) to build a knowledge base for each assistant. Documents are automatically chunked, embedded, and indexed.
- Crawlers -- Native web, CIFS/SMB, and NFS crawling engine that automatically discovers, retrieves, and ingests repository content on a schedule. Supports delta-based crawling (only new/changed/deleted content is processed), configurable depth, parallelism, throttling, content filtering, web authentication, and CIFS/NFS connectivity validation. Each crawled document is traceable back to its source crawler and operation.
- Ingestion Rules -- Define reusable ingestion configurations that specify target S3 buckets, RecallDB collections, summarization, chunking strategies, and embedding settings. Documents reference an ingestion rule for processing.
- Summarization -- Optionally summarize document content before or after chunking using configurable completion endpoints, improving retrieval quality for long documents.
- Endpoint Management -- Manage, test, and explicitly load or warm embedding and completion (inference) endpoint models on the Partio service directly from the dashboard or API.
- Search -- Leverages Verbex for TF-IDF/text document search and pgvector/RecallDB for vector, full-text, and hybrid retrieval. Configure per-assistant search modes with tunable scoring weights for optimal retrieval from your document corpus.
- Retrieval Gate -- Optional LLM-based retrieval gate that intelligently decides whether each user message requires a new document search or can be answered from existing conversation context, reducing unnecessary retrieval calls.
- Chat -- Public-facing chat endpoint that retrieves relevant context from your documents and generates responses using configurable LLM providers (Ollama, OpenAI, Gemini). Supports real-time SSE streaming, metadata filters, optional
attached_document_idsthat constrain retrieval to selected assistant documents, and one-offlocal_attachmentsuploaded from the user's machine for the current chat request. - Conversation Compaction -- Automatic summarization of older messages when the conversation approaches the context window limit, preserving continuity across long conversations.
- Feedback -- Collect thumbs-up/thumbs-down feedback and free-text comments on assistant responses to monitor quality and improve over time.
- Multi-Tenant -- Full row-level tenant isolation with three-tier authorization (Global Admin via API key or
IsAdminflag, Tenant Admin, User). Auto-provisioning of tenant resources, per-tenant S3 bucket isolation ({tenantId}_prefix), and tenant-scoped RecallDB mapping. - Dashboard -- Browser-based management UI for configuring assistants, uploading documents, viewing feedback, managing endpoints, and testing chat.
- Model Context Protocol (MCP) -- Standalone MCP server for the platform management surface with HTTP, TCP, and WebSocket transports, Claude/Cursor install support, default secret redaction for sensitive fields, and binary wrappers for document and bucket-object flows.
- Query rewrite -- Optionally rewrite user queries into multiple semantically varied phrasings before retrieval to broaden recall and capture synonyms, alternate phrasing, and conceptual restatements
- LLM-based re-ranking -- Re-ranking scores each retrieved chunk for relevance using an LLM, filtering low-quality results before context injection.
- Metadata filtering -- Filter RAG retrieval by document labels (required/excluded string lists) and tags (key-value conditions with conditional operators). Configure default filters per assistant and/or override per-conversation via the
metadata_filterfield on chat completion requests. - Tool policy controls -- Assistant owners/admins can save and validate disabled-by-default model tool policies, set assistant-level Tavily overrides, mark completion endpoints as explicitly tool-capable, and inspect effective tool availability. Assistant chat can expose enabled server-side tools to explicit OpenAI-compatible or Ollama tool-capable endpoints, execute requested tools on the server, run bounded DocumentAtom extraction against assistant documents or per-turn local uploads, enforce per-turn tool budgets including S3 object byte caps, and return structured tool outputs to the model. Streaming chat emits safe tool progress events and final answer chunks for tool-enabled turns; live Docker/provider/browser validation remains tracked in
archive/TOOL_CALLS.md. - Source citations -- Optional per-assistant citation metadata that maps model claims to source documents and citation-capable tool evidence with bracket notation, relevance scores, text excerpts, and web URLs when web search contributes evidence. Configurable document linking via presigned S3 URLs or authenticated download endpoints
- RAG evaluation -- Built-in evaluation framework for measuring retrieval and response quality. Define ground-truth facts (question/expected-facts pairs) per assistant, run automated evaluation passes with LLM-based judging, and review per-fact results with pass/fail verdicts. Supports custom judge prompts and real-time SSE progress streaming.
- Mark the endpoint used for tool routing as tool-capable only when the backend is known to support model tool calls. Use
OpenAIChatCompletionsfor OpenAI-compatible chat-completions endpoints orOllamaChatfor Ollama. LeaveToolRoutingInferenceEndpointIdblank to use the response endpoint for tool routing, or select a smaller/dedicated router endpoint when the response model should only write final answers. - Open Assistant Settings, keep
EnableToolCallsdisabled by default, then enable only the tool groups the assistant should use: collection search/read/enumeration, DocumentAtom extraction, Verbex search/enumeration, document-backed S3 reads, explicitly opted-in bucket-wide S3 reads, bucket enumeration, and Tavily web search. - Set per-tool caps and allow-lists in
ToolPolicyJson, then use the effective tool list, validation route, and admin dry-run diagnostics route to confirm which tools and endpoint capabilities are available. Completion endpoint tool-call capability is configured through AssistantHub fields and persisted on the Partio endpoint using reserved labels/tags. Collection search may optionally useEnableServerGeneratedQueryVariantsto add deterministic punctuation/quote-normalized variants withinMaxSearchQueriesPerCall;MaxDocumentsConsideredPerSearchandMaxResultsConsideredPerSearchbound exhaustive search work, and real tool timeouts fail withErrorCode=timeout.ReturnFullSearchContentstaysfalseby default so search returns excerpts and exact text is requested throughcollection_read_chunks. Search metadata includes searched queries/modes plusDocumentsConsideredandResultsConsideredwhen available. Validation returns stableErrorCodessuch asinvalid_tool_policy_json,unknown_allowed_tool,no_tool_enabled, andno_available_tools; diagnostics checks the effective tool-routing endpoint for explicit tool-call capability without executing tools. Tavily can use assistant-level endpoint/API-key overrides, or fall back to system-wideExternalSearchsettings. - Test with non-streaming chat first, then validate streaming chat if browser users should see safe tool progress statuses. Streaming chat emits started, heartbeat, completed, failed, and denied tool-status events without raw arguments or outputs; recoverable failures return stable
ErrorCodevalues such asinvalid_arguments,policy_denial,provider_missing,provider_http_error, andtimeout. Browser clients should mark interrupted streams clearly instead of leaving a pending spinner. Admins can inspect redacted tool-call records under assistant tool-call history and linked request-history/chat-history details. - Provider usage metadata is preserved when available. OpenAI-compatible prompt, completion, total, reasoning-token, and tool-definition-token counters are normalized into assistant performance telemetry and exposed through SDK response models. If an assistant explicitly enables
ExposeThinking, provider-returned thinking/reasoning text can be displayed in chat as separatethinkingcontent; it remains disabled by default.
Dashboard i18n baseline: the current AssistantHub dashboard remains English-only and does not yet include the required i18next runtime. New tool-call UI strings follow the existing dashboard convention, while server-driven tool feedback uses stable status_code values plus safe display labels so a future i18n pass can localize client text without changing persisted or wire-level event semantics.
The fastest way to run AssistantHub and all its dependencies is with Docker Compose. This is the recommended deployment method. The Docker deployment uses PostgreSQL by default for AssistantHub, Less3, Partio, RecallDB, and Verbex metadata.
cd docker
docker compose up -dOnce all services are healthy, open http://localhost:8801 to access the dashboard.
On a fresh startup, assistanthub-server now waits for partio-server to become healthy before it starts. This avoids the transient partio-server:8400 DNS/startup race that could previously abort AssistantHub startup immediately after a factory reset.
For CIFS/NFS crawl plans in the local Docker deployment, remember that localhost from inside assistanthub-server means the container, not the host machine. The default compose file maps host.docker.internal to the Docker host, and AssistantHub normalizes loopback file-server hostnames to that alias when it is available so local shares such as //localhost/Share can be reached from the server container.
Note: Deploying individual services outside of Docker is also possible, but requires manual configuration and deployment of each dependency (PostgreSQL with pgvector, Ollama, Less3, DocumentAtom, Partio, RecallDB, Verbex). The Docker Compose stack handles all service wiring, health checks, and startup ordering automatically, which is why manual setup documentation is not provided.
The Docker Compose stack orchestrates the following services:
| Service | Port | Description |
|---|---|---|
| assistanthub-server | 8800 | The core AssistantHub REST API server (.NET 10). Handles all business logic: assistant management, document ingestion orchestration, chat with RAG, user authentication, and integration with all downstream services. |
| assistanthub-mcp-server | 8820 / 8821 / 8822 | Standalone Voltaic-based MCP server for AssistantHub. Exposes tenants, users, credentials, assistants, settings, storage, ingestion, endpoints, crawl, eval, history, request history, and runtime configuration over HTTP, TCP, and WebSocket MCP transports. |
| assistanthub-dashboard | 8801 | Browser-based management dashboard (React 19, served by nginx). Provides a full UI for configuring assistants, uploading documents, managing endpoints, viewing feedback/history, and live chat testing. Proxies API requests to the server. |
| ollama | 11434 | Local LLM inference engine. Runs language models (e.g., gemma3:4b) for chat completion, conversation compaction, retrieval gate classification, and title generation. Models are persisted in a Docker volume. |
| less3 | 8000 | S3-compatible object storage server. Stores uploaded document files. AssistantHub uses the S3 API to write, read, and delete document objects during ingestion and cleanup. |
| less3-ui | 8001 | Web-based management UI for Less3. Allows direct browsing and management of S3 buckets and objects. |
| documentatom-server | 8301 | Document processing service. Extracts text content from uploaded files (PDF, DOCX, HTML, text, and more), returning structured cells that represent the document's content. |
| documentatom-dashboard | 8302 | Web-based management UI for DocumentAtom. |
| partio-server | 8321 | Text chunking, embedding, and summarization service. Splits extracted text into chunks using configurable strategies, computes vector embeddings via configurable embedding endpoints, and optionally summarizes content using a completion endpoint. Also manages embedding and completion endpoint configurations. |
| partio-dashboard | 8322 | Web-based management UI for Partio. Allows direct management of embedding and completion endpoints. |
| postgres | 5432 | PostgreSQL with the pgvector extension. Provides separate databases for AssistantHub, Less3, Partio, RecallDB, and Verbex. |
| postgres-init | n/a | One-shot initialization verifier that creates service roles/databases, installs vector for RecallDB, and verifies service-role connectivity before app services start. |
| recalldb-server | 8401 | Vector and full-text search database. Wraps PostgreSQL/pgvector with a REST API for storing, searching, and managing document embeddings. Supports vector search (semantic similarity), full-text search (keyword matching), and hybrid search (weighted combination). |
| recalldb-dashboard | 8402 | Web-based management UI for RecallDB. Allows direct browsing of collections, records, and search testing. |
| verbex-server | 8501 | Inverted-index search server. Stores document text records and supports TF-IDF/text search through AssistantHub proxy APIs. |
| verbex-dashboard | 8502 | Web-based management UI for Verbex. Allows direct browsing of indices, records, and search testing. |
The Docker stack uses a single postgres container with a named postgres-data volume. postgres-init creates separate databases and application roles before AssistantHub, Less3, Partio, RecallDB, and Verbex start.
The compose stack starts PostgreSQL with max_connections=250 so concurrent document ingestion, Verbex indexing, RecallDB embedding writes, object storage, and dashboard/API activity have enough connection headroom during crawler bursts.
| Service | Database | Role |
|---|---|---|
| AssistantHub | assistanthub |
assistanthub_app |
| Less3 | less3 |
less3_app |
| Partio | partio |
partio_app |
| RecallDB | recalldb |
recalldb_app |
| Verbex | verbex |
verbex_app |
Local-only database defaults are in docker/.env and are mirrored in the mounted JSON config files. Keep those values synchronized if you change database names or credentials.
Troubleshooting:
- If
postgresis unhealthy, inspectdocker logs assistanthub-postgresand confirm port5432is not already bound by another local database. - If
postgres-initfails, inspectdocker logs assistanthub-postgres-init; failures usually indicate a credential drift betweendocker/.envand the service JSON files, a stale volume, or a missingvectorextension. - If stale SQLite files exist under
docker/assistanthub/data,docker/less3, ordocker/partio/data, they are ignored by the current compose file. Rundocker/factory/reset.batordocker/factory/reset.shto remove them while resetting the deployment.
Use docker/status.bat or docker/status.sh to view the local container ID, name, creation time, status, and published ports without the image and command columns from the default docker ps -a output.
If you already have Ollama running on your host machine or on another server, you can skip the containerized Ollama and point AssistantHub at your existing instance instead.
1. Comment out the Ollama service in docker/compose.yaml:
Comment out (or remove) the ollama and ollama-init services and the Ollama model volume:
services:
# --- Infrastructure ---
# ollama:
# image: ollama/ollama:0.30.4
# container_name: ollama
# ports:
# - "11434:11434"
# environment:
# OLLAMA_NUM_PARALLEL: "4"
# OLLAMA_MAX_LOADED_MODELS: "4"
# volumes:
# - ollama-models:/root/.ollama
# restart: unless-stoppedAlso comment out the ollama-models volume at the bottom of the file:
volumes:
postgres-data:
# ollama-models:And remove ollama-init from the partio-server service's depends_on list.
2. Update docker/assistanthub/assistanthub.json to point to your Ollama instance:
In the Inference section, change the Endpoint from the container hostname to your Ollama instance's address:
"Inference": {
"Provider": "Ollama",
"Endpoint": "http://host.docker.internal:11434",
"ApiKey": "default",
"DefaultModel": "gemma3:4b"
}- Ollama on the same machine (Docker Desktop): Use
http://host.docker.internal:11434. The special hostnamehost.docker.internalresolves to your host machine from inside Docker containers. Do not uselocalhost-- inside a container,localhostrefers to the container itself, not your host machine. - Ollama on the same machine (Linux without Docker Desktop): Use
http://172.17.0.1:11434(the default Docker bridge gateway), or run the compose stack withnetwork_mode: host. You may also need to setOLLAMA_HOST=0.0.0.0in your Ollama configuration so it listens on all interfaces. - Ollama on another machine: Use that machine's IP or hostname, e.g.
http://192.168.1.50:11434. Ensure the Ollama port is accessible from the Docker network.
3. Update docker/partio/partio.json to point to your Ollama instance:
In the DefaultEmbeddingEndpoints section, change the Endpoint from the container hostname to match the address you used above:
"DefaultEmbeddingEndpoints": [
{
"Model": "all-minilm",
"Endpoint": "http://host.docker.internal:11434",
"ApiFormat": "Ollama",
"ApiKey": null
}
]4. Update embedding and completion endpoints in the Partio dashboard:
After startup, open the Partio dashboard at http://localhost:8322 and update both the embedding endpoints and completion endpoints to point to your Ollama instance:
- Change the Endpoint URL from
http://ollama:11434to your instance's address (e.g.http://host.docker.internal:11434). - Change the Health Check URL from a relative path (
/api/tags) to a fully-qualified URL (e.g.http://host.docker.internal:11434/api/tags). Health checks using relative paths will fail with an "invalid request URI" error.
Without these changes, document ingestion (embeddings) and chat completions will fail.
5. Start the stack:
cd docker
docker compose up -d| Dashboard | URL | Default Credentials |
|---|---|---|
| AssistantHub | http://localhost:8801 | Email: admin@assistanthub, Password: password |
| Less3 | http://localhost:8001 | Admin API Key: less3admin, Access Key: default, Secret Key: default |
| DocumentAtom | http://localhost:8302 | No authentication configured by default |
| Verbex | http://localhost:8502 | Admin API Key: verbexadmin |
| Partio | http://localhost:8322 | Email: admin@partio, Password: password, Admin API Key: partioadmin |
| RecallDB | http://localhost:8402 | Email: admin@recall, Password: password, Admin API Key: recalldbadmin |
Important: Change all default passwords immediately after first login.
Verbex powers text/TF-IDF document search in ARTIFACTS > Indices > Search. The dashboard requests Verbex matched terms, per-term score/frequency details, and document term statistics so results can show unique terms, total term occurrences, matched query terms, and score details. The dashboard also includes ARTIFACTS > Indices for index metadata/top terms and ARTIFACTS > Indices > Records for browsing, creating, updating labels/tags/custom metadata, and deleting Verbex records through AssistantHub. Search result detail modals expose copyable IDs and JSON payloads for inspection. RecallDB collection search remains available in ARTIFACTS > Collections > Search with full-text, vector, label/tag, term, date, document, neighbor, and continuation-token controls. Ingestion rules can optionally set VerbexIndexId; leaving it blank uses the tenant default Verbex index.
New document ingestion indexes extracted text into Verbex automatically when Verbex.EnableIngestion is enabled. Existing completed documents from deployments upgraded to v0.14.0 need a one-time admin reindex before they appear in ARTIFACTS > Indices > Search.
Reindex one document:
curl -X POST http://localhost:8800/v1.0/documents/{documentId}/reindex \
-H "Authorization: Bearer <admin-token>" \
-H "Content-Type: application/json" \
-d "{}"Reindex a bounded page of completed documents:
curl -X POST "http://localhost:8800/v1.0/documents/reindex?maxResults=100" \
-H "Authorization: Bearer <admin-token>" \
-H "Content-Type: application/json" \
-d "{\"IncludeAlreadyIndexed\":false}"Repeat the batch call with the returned ContinuationToken until EndOfResults is true. Set IncludeAlreadyIndexed to true to repair or replace existing Verbex records. To verify search, upload or reindex a text document, then search the target index from ARTIFACTS > Indices > Search or call POST /v1.0/indices/{indexId}/search with a term from the document.
The server reads configuration from assistanthub.json in the working directory. For Docker deployments, this file is located at docker/assistanthub/assistanthub.json and is mounted into the container.
{
"Webserver": {
"Hostname": "*",
"Port": 8800,
"Ssl": false
},
"Database": {
"Type": "Postgresql",
"Filename": "",
"Hostname": "postgres",
"Port": 5432,
"DatabaseName": "assistanthub",
"Username": "assistanthub_app",
"Password": "assistanthub_password",
"Schema": "public",
"RequireEncryption": false,
"LogQueries": false
},
"S3": {
"Region": "USWest1",
"BucketName": "default",
"AccessKey": "default",
"SecretKey": "default",
"EndpointUrl": "http://less3:8000",
"UseSsl": false,
"BaseUrl": "http://less3:8000"
},
"DocumentAtom": {
"Endpoint": "http://documentatom-server:8000",
"AccessKey": "default"
},
"Chunking": {
"Endpoint": "http://partio-server:8400",
"AccessKey": "partioadmin",
"EndpointId": "default"
},
"Embeddings": {
"Endpoint": "http://partio-server:8400",
"AccessKey": "partioadmin",
"EndpointId": "default"
},
"Inference": {
"Provider": "Ollama",
"Endpoint": "http://ollama:11434",
"ApiKey": "default",
"DefaultModel": "gemma3:4b"
},
"RecallDb": {
"Endpoint": "http://recalldb-server:8600",
"AccessKey": "recalldbadmin",
"SupportsMultiDocumentFilter": true
},
"Verbex": {
"Endpoint": "http://verbex-server:8080",
"AccessKey": "verbexadmin",
"DashboardUrl": "http://localhost:8502",
"DefaultIndexId": "default",
"EnableIngestion": true,
"RequireIngestion": true,
"MaxContentCharacters": 0
},
"ExternalSearch": {
"Enabled": false,
"AllowFallback": true,
"MaxResults": 10,
"TimeoutMs": 30000,
"SafeSearch": true,
"AllowRawContent": false,
"IncludeDomains": [],
"ExcludeDomains": [],
"Providers": [
{
"Name": "default",
"ProviderType": "Tavily",
"Endpoint": "https://api.tavily.com/search",
"ApiKey": "${TAVILY_API_KEY}",
"Enabled": false,
"IsDefault": true,
"TimeoutMs": 30000
}
]
},
"AdminApiKeys": [
"changeme"
],
"DefaultTenant": {
"Id": "default",
"Name": "Default"
},
"ProcessingLog": {
"Directory": "./processing-logs/",
"RetentionDays": 30
},
"ChatHistory": {
"RetentionDays": 7
},
"Crawl": {
"EnumerationDirectory": "./crawl-enumerations/"
},
"Logging": {
"ConsoleLogging": true,
"EnableColors": false,
"FileLogging": true,
"LogDirectory": "./logs/",
"LogFilename": "assistanthub.log",
"IncludeDateInFilename": true,
"MinimumSeverity": 1,
"Servers": []
}
}| Section | Description |
|---|---|
Webserver |
Hostname, port, and SSL toggle for the HTTP listener. |
Database |
Database type (Sqlite, Postgresql, SqlServer, Mysql) and connection details. |
S3 |
S3-compatible object storage (Less3) for uploaded documents. |
DocumentAtom |
Endpoint and access key for the DocumentAtom document-processing service. |
Chunking |
Endpoint, access key, and default endpoint ID for the Partio chunking service. |
Embeddings |
Endpoint, access key, and default endpoint ID for the Partio embeddings service. |
Inference |
LLM provider (Ollama, OpenAI, or Gemini), endpoint, API key, and default model. |
RecallDb |
Endpoint, access key, dashboard URL, and capability flags for the RecallDB vector database service. SupportsMultiDocumentFilter defaults to true; set it to false only for RecallDB deployments that do not accept native DocumentIds search filters, which makes AssistantHub loop over single-document searches and log a fallback warning. |
Verbex |
Endpoint, access key, dashboard URL, default index ID, and ingestion failure policy for Verbex text search. |
ExternalSearch |
Disabled-by-default external web-search providers. Tavily uses ProviderType: "Tavily" and can read ApiKey from ${TAVILY_API_KEY} when globally enabled and exposed through assistant tool policy. Both factory and runtime Docker server JSON files include the disabled Tavily provider placeholder. Admins can check redacted readiness counts with GET /v1.0/configuration/external-search/status. |
AdminApiKeys |
List of API keys that grant global admin access (not tied to any tenant). Users with IsAdmin=true also receive global admin privileges. |
DefaultTenant |
ID and name for the default tenant, auto-created on first run. |
ProcessingLog |
Directory and retention for per-document processing logs (namespaced by tenant). |
ChatHistory |
Retention period in days for chat history records (0 = keep indefinitely). Background cleanup runs hourly. |
Crawl |
Directory for storing crawl enumeration files (delta snapshots used for change detection between crawl runs). |
Logging |
Console/file logging toggles, severity level, log directory, and optional syslog servers. |
To completely reset AssistantHub to a clean state, use the factory reset script:
cd docker
docker compose down
cd factory
./reset.sh # Linux/macOS
reset.bat # WindowsThe script will prompt you to type RESET to confirm. This destroys all runtime data (PostgreSQL data, uploaded documents, logs, request history, and Verbex runtime data) and restores factory-default configuration files. Downloaded Ollama models are kept by default; pass --include-models to remove them as well.
After the reset completes, start the environment again:
cd docker
docker compose up -dExpected behavior after reset:
postgres-initmust complete before database-backed services startassistanthub-serverwill not start untilpartio-serveris healthy- this is intentional and prevents AssistantHub from failing early while validating chunking and embeddings connectivity
- if startup appears slower than before, wait for Partio to finish its health checks and model initialization
AssistantHub exposes a versioned REST API at /v1.0/. All authenticated endpoints require a bearer token in the Authorization header or as a token query parameter.
For complete endpoint documentation including request/response schemas and examples, see REST_API.md.
| Category | Endpoints | Description |
|---|---|---|
| Health | GET /, HEAD / |
Server info and health check (unauthenticated) |
| Authentication | POST /v1.0/authenticate |
Authenticate with email/password (+ optional TenantId) or bearer token |
| WhoAmI | GET /v1.0/whoami |
Return current authentication context (tenant, role, user) |
| Tenants | PUT/GET /v1.0/tenants, GET/PUT/DELETE/HEAD /v1.0/tenants/{id} |
Tenant management (global admin only) |
| Users | PUT/GET /v1.0/tenants/{tenantId}/users, GET/PUT/DELETE/HEAD .../users/{id} |
Tenant-scoped user management |
| Credentials | PUT/GET /v1.0/tenants/{tenantId}/credentials, GET/PUT/DELETE/HEAD .../credentials/{id} |
Tenant-scoped credential management |
| Buckets | PUT/GET /v1.0/buckets, GET/DELETE/HEAD /v1.0/buckets/{name} |
S3 bucket management (tenant-scoped by {tenantId}_ prefix) |
| Bucket Objects | GET/PUT/POST/DELETE /v1.0/buckets/{name}/objects |
S3 object management with upload, download, metadata, and directory creation (tenant-scoped) |
| Collections | PUT/GET /v1.0/collections, GET/PUT/DELETE/HEAD /v1.0/collections/{id} |
RecallDB collection management (admin only) |
| Collection Records | PUT/GET /v1.0/collections/{id}/records, GET/DELETE .../records/{recordId} |
Browse and manage records within collections (admin only) |
| Collection Metadata | GET /v1.0/collections/{id}/labels/distinct, GET .../tags/distinct |
Discover distinct label values and tag keys in a collection (admin only) |
| Ingestion Rules | PUT/GET /v1.0/ingestion-rules, GET/PUT/DELETE/HEAD /v1.0/ingestion-rules/{id} |
Document processing rule management |
| Embedding Endpoints | PUT /v1.0/endpoints/embedding, POST .../enumerate, GET/PUT/DELETE/HEAD .../{id}, GET .../health, POST .../test, POST .../load |
Partio embedding endpoint management, smoke testing, and model load/warm actions (admin only) |
| Completion Endpoints | PUT /v1.0/endpoints/completion, POST .../enumerate, GET/PUT/DELETE/HEAD .../{id}, GET .../health, POST .../test, POST .../load |
Partio completion endpoint management, smoke testing, and model load/warm actions (admin only) |
| Assistants | PUT/GET /v1.0/assistants, GET/PUT/DELETE/HEAD /v1.0/assistants/{id} |
Assistant management (owner or admin) |
| Assistant Settings | GET/PUT /v1.0/assistants/{id}/settings, POST .../settings/slack/verify, POST .../settings/tools/validate, GET .../tools |
Per-assistant endpoint, prompt, RAG, Slack, and tool policy configuration. Includes draft Slack connectivity verification and effective tool-policy inspection (owner or admin). |
| Assistant Analytics | GET /v1.0/assistants/{id}/analytics/{overview,timeseries,stages,endpoints,slowest,feedback} |
Assistant-scoped performance, endpoint, retrieval, slow request, and feedback analytics |
| Crawl Plans | PUT/GET /v1.0/crawlplans, POST /v1.0/crawlplans/connectivity, GET/PUT/DELETE/HEAD /v1.0/crawlplans/{id}, POST .../start, POST .../stop, POST .../connectivity, GET .../enumerate |
Crawler management with schedule control, draft/saved connectivity testing, and content preview |
| Crawl Operations | GET /v1.0/crawlplans/{id}/operations, GET .../statistics, GET/DELETE .../operations/{id}, GET .../statistics, GET .../enumeration |
Crawl execution history, statistics, and enumeration file access |
| Documents | PUT/GET /v1.0/documents, GET/DELETE/HEAD /v1.0/documents/{id}, GET .../processing-log |
Document upload, management, and processing log access |
| Feedback | GET /v1.0/feedback, GET/DELETE /v1.0/feedback/{id} |
View and manage user feedback |
| History | GET /v1.0/history, GET/DELETE /v1.0/history/{id} |
View and manage chat history with timing metrics |
| Threads | GET /v1.0/threads |
List conversation threads |
| Models | GET /v1.0/models, POST /v1.0/models/pull, GET .../pull/status, DELETE /v1.0/models/{modelName} |
List, pull, delete, and check pull status for inference models |
| Eval Facts | PUT/GET /v1.0/eval/facts, GET/PUT/DELETE /v1.0/eval/facts/{factId} |
Ground-truth fact management for RAG evaluation |
| Eval Runs | POST/GET /v1.0/eval/runs, GET/DELETE /v1.0/eval/runs/{runId}, GET .../results, GET .../stream |
Start, list, and stream evaluation runs with LLM-judged results |
| Eval Results | GET /v1.0/eval/results/{resultId} |
Retrieve individual evaluation result details |
| Eval Judge Prompt | GET /v1.0/eval/judge-prompt/default |
Retrieve the default judge prompt template |
| Configuration | GET/PUT /v1.0/configuration, GET /v1.0/configuration/external-search/status |
View/update server configuration and inspect safe external-search readiness counts (admin only) |
| Public Assistant Documents | GET /v1.0/assistants/{id}/documents |
Completed documents from the assistant collection that may be selected for attached-document chat (unauthenticated when enabled) |
| Public Chat | POST /v1.0/assistants/{id}/chat |
Chat completion with RAG, optional metadata filtering, and optional attached_document_ids (unauthenticated, SSE or JSON) |
| Public Generate | POST /v1.0/assistants/{id}/generate |
Lightweight inference without RAG (unauthenticated) |
| Public Compact | POST /v1.0/assistants/{id}/compact |
Force conversation compaction (unauthenticated) |
| Public Feedback | POST /v1.0/assistants/{id}/feedback |
Submit feedback (unauthenticated) |
| Public Info | GET /v1.0/assistants/{id}/public |
Get assistant public info and appearance (unauthenticated) |
| Public Metadata | GET /v1.0/assistants/{id}/labels/distinct, GET .../tags/distinct |
Discover available label and tag filter values for an assistant's collection (unauthenticated) |
| Public Threads | POST /v1.0/assistants/{id}/threads |
Create a conversation thread (unauthenticated) |
AssistantHub also includes a standalone MCP server under src/AssistantHub.McpServer/ for management and operator workflows. It mirrors the main REST control plane as MCP tools over Voltaic transports.
Default transport endpoints:
| Transport | Default endpoint |
|---|---|
| HTTP JSON-RPC | http://127.0.0.1:8820/rpc |
| HTTP events | http://127.0.0.1:8820/events |
| TCP | tcp://127.0.0.1:8821 |
| WebSocket | ws://127.0.0.1:8822/mcp |
Supported tool families include:
system/*,auth/*tenant/*,user/*,credential/*assistant/*,assistant/settings/*bucket/*,bucket/object/*,collection/*,collection/record/*document/*,ingestionrule/*embeddingendpoint/*,completionendpoint/*,model/*crawlplan/*,crawloperation/*history/*,thread/*,requesthistory/*,assistantanalytics/*eval/*configuration/*
Operational notes:
configuration/get,assistant/settings/*, andcredential/*redact secret-bearing fields by default.- Document and bucket-object binary transfers use base64 envelopes and enforce
Storage.MaxInlineBinaryBytes. - Eval SSE and public assistant chat/generate/compact/feedback/download routes remain REST-only in the current MCP release.
Quick start:
dotnet build src/AssistantHub.sln
dotnet run --project src/AssistantHub.Server/AssistantHub.Server.csproj
dotnet run --project src/AssistantHub.McpServer/AssistantHub.McpServer.csprojInstall Claude/Cursor snippets from the built output:
cd src/AssistantHub.McpServer/bin/Debug/net10.0
./AssistantHub.McpServer install --dry-run
./AssistantHub.McpServer installDocker assets are included for the MCP server:
- image build script:
build-mcp.bat - Dockerfile:
src/AssistantHub.McpServer/Dockerfile - compose config:
docker/assistanthub-mcp/assistanthub-mcp.json
See MCP_API.md for the full tool catalog and route coverage matrix, and docs/CLAUDE_MCP.md for Claude/Cursor setup guidance.
┌──────────────────┐
│ Dashboard │
│ (React / Vite) │
│ Port 8801 │
└────────┬─────────┘
│
│ HTTP (nginx reverse proxy)
▼
┌──────────────────┐
│ AssistantHub │
│ Server (.NET 10) │
│ Port 8800 │
└──┬────┬────┬──┬──┘
│ │ │ │
┌──────────────┘ │ │ └──────────────┐
│ │ │ │
▼ ▼ ▼ ▼
┌──────────────────┐ ┌────────────────┐ ┌──────────────────┐
│ DocumentAtom │ │ RecallDB │ │ Less3 │
│ (Doc Processing) │ │(Vector Search) │ │ (S3 Storage) │
│ Port 8301 │ │ Port 8401 │ │ Port 8000 │
└────────┬─────────┘ └────────┬───────┘ └──────────────────┘
│ │
▼ ▼
┌──────────────────┐ ┌──────────────────┐
│ Partio │ │ PostgreSQL │
│ (Chunk/Embed) │ │ (PostgreSQL) │
│ Port 8321 │ │ Port 5432 │
└────────┬─────────┘ └──────────────────┘
│
▼
┌──────────────────┐
│ Ollama │
│ (LLM Inference) │
│ Port 11434 │
└──────────────────┘
┌─────────┐ ┌──────────────┐ ┌──────────────┐
│ User │ │ AssistantHub │ │ Less3 │
│(Browser │──1───►│ Server │──2───►│ (S3 Storage) │
│ or API)│ │ │ └──────────────┘
└─────────┘ └──────┬───────┘
│
3 │
▼
┌──────────────┐
│ DocumentAtom │ Extracts text cells
│ │ from PDF, DOCX, HTML, etc.
└──────┬───────┘
│
4 │ Text cells
▼
┌──────────────┐
│ Partio │ Optionally summarizes cells,
│ │ chunks text, computes embeddings
└──────┬───────┘
│
5 │ Chunks + embeddings
▼
┌──────────────┐
│ RecallDB │ Stores chunks and vectors
│ (PostgreSQL/ │ for retrieval
│ pgvector) │
└──────────────┘
- User uploads a document via the API or dashboard, selecting an ingestion rule.
- The document file is stored in the ingestion rule's S3 bucket via Less3.
- DocumentAtom extracts text content from the document, returning structured cells.
- Partio processes the cells: optionally summarizes (pre- or post-chunking per the rule), splits into chunks using the rule's chunking strategy, and computes vector embeddings via the configured embedding endpoint.
- Chunks and embeddings are stored in the ingestion rule's RecallDB collection. Chunk record IDs are saved on the document for cleanup on deletion.
┌─────────┐ ┌──────────────┐ ┌──────────────┐
│ User │ │ AssistantHub │ │ RecallDB │
│(Browser │──1───►│ Server │──2───►│(PostgreSQL/ │
│ │ │ │ │ pgvector) │
│ or API)│ │ │◄──3───│ │
└─────────┘ └──────┬───────┘ └──────────────┘
▲ │
│ 4 │ Context + messages
│ ▼
│ ┌──────────────┐
└─────6──────│ Ollama │ Generates response
│ (Inference) │ (streaming or batch)
└──────────────┘
- User sends a message to the chat endpoint with conversation history and optional
attached_document_ids. - If attached document IDs are present, the server validates they are completed documents in the assistant tenant and collection.
- If RAG is enabled (and the retrieval gate permits), the server embeds the query and searches RecallDB using the assistant's configured search mode (vector, full-text, or hybrid). Attached document IDs narrow the RecallDB search; they do not force whole-document summarization.
- RecallDB returns relevant document chunks ranked by similarity score.
- The server assembles the system prompt with retrieved context and sends the full message list to the configured inference provider (Ollama, OpenAI, or Gemini). If the conversation exceeds the context window, older messages are compacted first.
- The LLM generates a response.
- The response is streamed back to the user token-by-token via SSE (or returned as a complete JSON response). Chat history with timing metrics is persisted.
- Backend: .NET 10 (C#), WatsonWebserver
- Frontend: React 19, Vite 6, JavaScript
- Database: PostgreSQL by default in Docker; product binaries also support SQLite, SQL Server, and MySQL
- Vector Search: RecallDB backed by PostgreSQL with pgvector
- Text Search: Verbex backed by PostgreSQL
- Document Processing: DocumentAtom (text extraction), Partio (chunking, embedding, summarization)
- Object Storage: Less3 (S3-compatible)
- Inference Providers: Ollama (local), OpenAI (cloud), Gemini (cloud)
- Containerization: Docker, Docker Compose
- Web Server (Dashboard): nginx
Client libraries are available for integrating with the AssistantHub API:
| SDK | Location | Description |
|---|---|---|
| JavaScript/TypeScript | sdk/js/ |
Dual ESM/CJS output, native fetch, async generators for SSE streaming |
| Python | sdk/python/ |
Pydantic v2 models, httpx client, PEP 561 compliant |
| C# | sdk/csharp/ |
.NET 8.0, System.Text.Json, typed exceptions, IAsyncEnumerable streaming |
Each SDK directory contains its own README with installation instructions and usage examples.
- Bug Reports and Feature Requests -- Use the Issues tab to report bugs or request new features.
- Questions and Discussion -- Use the Discussions tab for general questions, ideas, and community feedback.
- Improvements -- We are happy to accept pull requests, please keep them focused and short
This project is licensed under the MIT License. See LICENSE.md for details.
Copyright (c) 2026 Joel Christner.