OnboardIQ

AI-Powered New Hire Knowledge Accelerator built on Azure OpenAI + pgvector.

OnboardIQ ingests your company's M365 content such as SharePoint documents, Teams meeting transcripts, and OneNote notebooks, then answers new-hire questions with grounded, source-cited responses.

Project Structure

onboardiq/
|-- app/                  # FastAPI backend
|   |-- api/routes.py     # REST endpoints
|   |-- config/           # pydantic-settings
|   |-- core/             # DB engine, ORM models
|   |-- ingestion/        # chunker, embedder, Graph API client
|   |-- migrations/       # Alembic + pgvector indexes/constraints
|   `-- query/            # retriever, answer generator
`-- frontend/             # React + Vite + Tailwind UI
    `-- src/
        |-- components/
        |   |-- ChatView.tsx
        |   |-- IngestView.tsx
        |   `-- Sidebar.tsx
        |-- api.ts
        `-- App.tsx

Backend State

Implemented

Area	File(s)	Status
FastAPI app + lifespan	`app/main.py`	Complete
Settings via pydantic-settings	`app/config/settings.py`	Complete
Async DB engine manager	`app/core/database.py`	Complete
ORM model (DocumentChunk)	`app/core/orm.py`	Complete
Alembic migrations (pgvector + unique chunk constraint)	`app/migrations/`	Complete
Text chunker (token-based with tiktoken)	`app/ingestion/chunker.py`	Complete
Embedding service (retry + upsert support)	`app/ingestion/embedder.py`	Complete
Graph API client (OAuth2 token flow + demo fallback)	`app/ingestion/graph_client.py`	Partial
pgvector cosine retriever	`app/query/retriever.py`	Complete
GPT-4o answer generator (JSON confidence output)	`app/query/generator.py`	Complete
REST API routes (`/api/ingest`, `/api/query`, `/api/health`)	`app/api/routes.py`	Complete

Remaining Gaps

MS Graph content ingestion (graph_client.py) OAuth2 token retrieval is implemented, but SharePoint file enumeration, content download, and Teams transcript retrieval still fall back to synthetic demo data.
Background ingestion queue (routes.py /ingest) Ingestion now runs in-process for local development. For production-scale workloads, move it to ARQ, Celery, or Azure Queue Storage.
Prompt tuning (generator.py _SYSTEM_PROMPT) The generator now returns structured JSON confidence, but the prompt can still be improved with few-shot examples for stronger grounding quality.

Setup

Prerequisites

Python 3.11+
uv package manager
Node.js 18+ and pnpm
PostgreSQL 15+ with the pgvector extension enabled
Azure OpenAI resource with text-embedding-3-large and gpt-4o deployments

The easiest way to run PostgreSQL locally is via Docker:

docker run -d \
  --name onboardiq-pg \
  -e POSTGRES_USER=postgres \
  -e POSTGRES_PASSWORD=postgres \
  -e POSTGRES_DB=onboardiq \
  -p 5432:5432 \
  pgvector/pgvector:pg16

1. Install backend dependencies

Run this from the project root:

uv sync

2. Configure environment

cp .env.example .env
# Edit .env with your credentials

Key variables:

Variable	Example
`DATABASE_URL`	`postgresql://postgres:postgres@localhost:5432/onboardiq`
`LLM_PROVIDER`	`azure` or `openai` or `gemini`
`AZURE_OPENAI_ENDPOINT`	`https://your-resource.openai.azure.com/`
`AZURE_OPENAI_API_KEY`	your key (if using `azure` provider)
`AZURE_OPENAI_EMBEDDING_DEPLOYMENT`	embedding deployment name (default: `text-embedding-3-large`)
`OPENAI_API_KEY`	your key (if using `openai` provider)
`GEMINI_API_KEY`	your key (if using `gemini` provider)
`AZURE_TENANT_ID`	your Azure tenant id (for Graph token flow)
`AZURE_CLIENT_ID`	your Azure app client id
`AZURE_CLIENT_SECRET`	your Azure app client secret

3. Run database migrations

uv run alembic -c app/alembic.ini upgrade head

This creates the tables, vector support, and the unique constraint used for chunk upserts.

4. Start the API server

uv run uvicorn app.main:app --reload

API: http://localhost:8000

Docs: http://localhost:8000/docs

Health: http://localhost:8000/api/health

5. Start the frontend

cd frontend
pnpm install
pnpm dev

Frontend: http://localhost:3000

The Vite dev server proxies /api/* requests to http://localhost:8000.

Quick Test

Start PostgreSQL and make sure .env contains real values, not placeholders. At minimum, DATABASE_URL and the credentials for your selected LLM_PROVIDER must be valid.
Install dependencies:

uv sync

Apply migrations:

uv run alembic -c app/alembic.ini upgrade head

Start the API:

uv run uvicorn app.main:app --reload

Verify the server:

http://localhost:8000/
http://localhost:8000/api/health
http://localhost:8000/docs

Test ingestion in Swagger with POST /api/ingest:

{
  "source_type": "sharepoint",
  "site_id": "demo-site"
}

Test querying in Swagger with POST /api/query:

{
  "question": "When is the benefits enrollment deadline for new hires?"
}

Troubleshooting

If uv run alembic upgrade head fails with No 'script_location' key found in configuration, use uv run alembic -c app/alembic.ini upgrade head. This repo stores alembic.ini inside app/.
If http://localhost:8000/health returns 404, use http://localhost:8000/api/health. The API router is mounted under /api.
If POST /api/ingest returns 500, check the terminal running uvicorn. The most common causes are:
- DATABASE_URL still points to placeholder values
- Azure/OpenAI credentials are still placeholders
- database migrations were not applied

Architecture

Ingestion Pipeline

MS Graph API / demo content
        |
        v
graph_client.py         (token flow + source loading)
        |
        v
chunker.py              (split into token-bounded chunks with overlap)
        |
        v
embedder.py             (embeddings + pgvector upsert)
        |
        v
PostgreSQL + pgvector   (store chunks + vector search index)

Query Pipeline

User question (POST /api/query)
        |
        v
embedder.embed_text()   (embed question)
        |
        v
retriever.search()      (cosine ANN search)
        |
        v
generator.generate()    (grounded answer + structured confidence)
        |
        v
QueryResponse           (answer, sources, confidence)

Next Steps

Implement SharePoint file enumeration and content extraction via Microsoft Graph.
Implement Teams transcript retrieval via Microsoft Graph.
Move /ingest to a background task queue such as ARQ.
Add authentication middleware (Azure AD JWT validation) for production use.
Write integration tests against a local pgvector Docker container.
Set up Azure Container Apps or AKS deployment with Helm charts.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
app		app
frontend		frontend
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
package-lock.json		package-lock.json
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OnboardIQ

Project Structure

Backend State

Implemented

Remaining Gaps

Setup

Prerequisites

1. Install backend dependencies

2. Configure environment

3. Run database migrations

4. Start the API server

5. Start the frontend

Quick Test

Troubleshooting

Architecture

Ingestion Pipeline

Query Pipeline

Next Steps

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OnboardIQ

Project Structure

Backend State

Implemented

Remaining Gaps

Setup

Prerequisites

1. Install backend dependencies

2. Configure environment

3. Run database migrations

4. Start the API server

5. Start the frontend

Quick Test

Troubleshooting

Architecture

Ingestion Pipeline

Query Pipeline

Next Steps

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages