AI-Powered New Hire Knowledge Accelerator built on Azure OpenAI + pgvector.
OnboardIQ ingests your company's M365 content such as SharePoint documents, Teams meeting transcripts, and OneNote notebooks, then answers new-hire questions with grounded, source-cited responses.
onboardiq/
|-- app/ # FastAPI backend
| |-- api/routes.py # REST endpoints
| |-- config/ # pydantic-settings
| |-- core/ # DB engine, ORM models
| |-- ingestion/ # chunker, embedder, Graph API client
| |-- migrations/ # Alembic + pgvector indexes/constraints
| `-- query/ # retriever, answer generator
`-- frontend/ # React + Vite + Tailwind UI
`-- src/
|-- components/
| |-- ChatView.tsx
| |-- IngestView.tsx
| `-- Sidebar.tsx
|-- api.ts
`-- App.tsx
| Area | File(s) | Status |
|---|---|---|
| FastAPI app + lifespan | app/main.py |
Complete |
| Settings via pydantic-settings | app/config/settings.py |
Complete |
| Async DB engine manager | app/core/database.py |
Complete |
| ORM model (DocumentChunk) | app/core/orm.py |
Complete |
| Alembic migrations (pgvector + unique chunk constraint) | app/migrations/ |
Complete |
| Text chunker (token-based with tiktoken) | app/ingestion/chunker.py |
Complete |
| Embedding service (retry + upsert support) | app/ingestion/embedder.py |
Complete |
| Graph API client (OAuth2 token flow + demo fallback) | app/ingestion/graph_client.py |
Partial |
| pgvector cosine retriever | app/query/retriever.py |
Complete |
| GPT-4o answer generator (JSON confidence output) | app/query/generator.py |
Complete |
REST API routes (/api/ingest, /api/query, /api/health) |
app/api/routes.py |
Complete |
-
MS Graph content ingestion (
graph_client.py) OAuth2 token retrieval is implemented, but SharePoint file enumeration, content download, and Teams transcript retrieval still fall back to synthetic demo data. -
Background ingestion queue (
routes.py/ingest) Ingestion now runs in-process for local development. For production-scale workloads, move it to ARQ, Celery, or Azure Queue Storage. -
Prompt tuning (
generator.py_SYSTEM_PROMPT) The generator now returns structured JSON confidence, but the prompt can still be improved with few-shot examples for stronger grounding quality.
- Python 3.11+
- uv package manager
- Node.js 18+ and pnpm
- PostgreSQL 15+ with the
pgvectorextension enabled - Azure OpenAI resource with
text-embedding-3-largeandgpt-4odeployments
The easiest way to run PostgreSQL locally is via Docker:
docker run -d \
--name onboardiq-pg \
-e POSTGRES_USER=postgres \
-e POSTGRES_PASSWORD=postgres \
-e POSTGRES_DB=onboardiq \
-p 5432:5432 \
pgvector/pgvector:pg16Run this from the project root:
uv synccp .env.example .env
# Edit .env with your credentialsKey variables:
| Variable | Example |
|---|---|
DATABASE_URL |
postgresql://postgres:postgres@localhost:5432/onboardiq |
LLM_PROVIDER |
azure or openai or gemini |
AZURE_OPENAI_ENDPOINT |
https://your-resource.openai.azure.com/ |
AZURE_OPENAI_API_KEY |
your key (if using azure provider) |
AZURE_OPENAI_EMBEDDING_DEPLOYMENT |
embedding deployment name (default: text-embedding-3-large) |
OPENAI_API_KEY |
your key (if using openai provider) |
GEMINI_API_KEY |
your key (if using gemini provider) |
AZURE_TENANT_ID |
your Azure tenant id (for Graph token flow) |
AZURE_CLIENT_ID |
your Azure app client id |
AZURE_CLIENT_SECRET |
your Azure app client secret |
uv run alembic -c app/alembic.ini upgrade headThis creates the tables, vector support, and the unique constraint used for chunk upserts.
uv run uvicorn app.main:app --reloadAPI: http://localhost:8000
Docs: http://localhost:8000/docs
Health: http://localhost:8000/api/health
cd frontend
pnpm install
pnpm devFrontend: http://localhost:3000
The Vite dev server proxies /api/* requests to http://localhost:8000.
-
Start PostgreSQL and make sure
.envcontains real values, not placeholders. At minimum,DATABASE_URLand the credentials for your selectedLLM_PROVIDERmust be valid. -
Install dependencies:
uv sync- Apply migrations:
uv run alembic -c app/alembic.ini upgrade head- Start the API:
uv run uvicorn app.main:app --reload- Verify the server:
http://localhost:8000/http://localhost:8000/api/healthhttp://localhost:8000/docs
- Test ingestion in Swagger with
POST /api/ingest:
{
"source_type": "sharepoint",
"site_id": "demo-site"
}- Test querying in Swagger with
POST /api/query:
{
"question": "When is the benefits enrollment deadline for new hires?"
}-
If
uv run alembic upgrade headfails withNo 'script_location' key found in configuration, useuv run alembic -c app/alembic.ini upgrade head. This repo storesalembic.iniinsideapp/. -
If
http://localhost:8000/healthreturns404, usehttp://localhost:8000/api/health. The API router is mounted under/api. -
If
POST /api/ingestreturns500, check the terminal runninguvicorn. The most common causes are:DATABASE_URLstill points to placeholder values- Azure/OpenAI credentials are still placeholders
- database migrations were not applied
MS Graph API / demo content
|
v
graph_client.py (token flow + source loading)
|
v
chunker.py (split into token-bounded chunks with overlap)
|
v
embedder.py (embeddings + pgvector upsert)
|
v
PostgreSQL + pgvector (store chunks + vector search index)
User question (POST /api/query)
|
v
embedder.embed_text() (embed question)
|
v
retriever.search() (cosine ANN search)
|
v
generator.generate() (grounded answer + structured confidence)
|
v
QueryResponse (answer, sources, confidence)
- Implement SharePoint file enumeration and content extraction via Microsoft Graph.
- Implement Teams transcript retrieval via Microsoft Graph.
- Move
/ingestto a background task queue such as ARQ. - Add authentication middleware (Azure AD JWT validation) for production use.
- Write integration tests against a local pgvector Docker container.
- Set up Azure Container Apps or AKS deployment with Helm charts.