The open-source enrichment engine for go-to-market teams. A LangGraph agent swarm wired into Bright Data's web infrastructure. Drop in a CSV, get back company intelligence, person contacts, and buying signals, with a citation on every cell.
Commercial enrichment tools cost $0.50 to $2.00 per row, lock data behind contracts, and rarely tell you where a field came from. Generic scrapers are cheap but ship raw HTML — you still have to build the agent, the extraction logic, the confidence scoring, the CRM mapping.
Open Enrich is the missing layer in between. The web is the dataset, Bright Data is the access layer, LangGraph is the orchestrator, and the output is a CSV your SDRs can paste into Salesforce or HubSpot at 9am Monday.
| Commercial enrichment tools | Generic scrapers | Open Enrich | |
|---|---|---|---|
| Cost per row | $0.50 to $2.00 | low, but you build everything | ~$0.01 to $0.05 in API calls |
| Source attribution | rare | none | every cell, with URL + snippet |
| LinkedIn / Crunchbase | yes (contracted) | blocked or unreliable | yes (Bright Data structured scrapers) |
| Person-level emails | yes (gated) | no | yes (SERP pattern discovery + LinkedIn fallback) |
| Self-hosted | no | yes | yes |
| CRM-formatted export | yes | no | yes (Salesforce + HubSpot mappings) |
A single CSV row becomes a stateful LangGraph run. Discovery first, then specialist agents fan out in parallel, then a validator merges results.
Each agent is a typed LangChain tool() graph with structured Zod output. Tools wrap Bright Data's SERP API, Web Unlocker, LinkedIn / Crunchbase scrapers, and Deep Lookup. Failures are isolated per-agent so one bad scrape never kills a row.
ICP qualification — employee count, revenue, industry, HQ, company type, funding stage Buying signals — recent funding, hiring velocity, leadership changes, job postings, tech changes Personalization hooks — company mission, recent news, blog topics, conference activity Contact data — work emails (SERP pattern + verification), phone, LinkedIn, GitHub, Twitter, title, seniority, location Social / web — LinkedIn URL, Twitter, GitHub org, company blog, careers page Custom fields — describe what you want in plain English, an agent does the research
Confidence scoring is built in. A field corroborated by three independent sources scores 0.95; a single-source heuristic scores 0.50. Every value carries the URL it came from.
Three packages, one engine.
| Package | What it is | Install |
|---|---|---|
@brightdata/enrich-core |
The engine. LangGraph, agents, Bright Data tool wrappers, Zod schemas. Used by the other two. | workspace only |
@brightdata/enrich |
Headless CLI. enrich leads.csv --describe "...". NDJSON streaming, resumable runs, dry-run cost estimates. |
npm i -g @brightdata/enrich |
open-enrich web |
Next.js 16 web app. Upload, configure, watch agents run live, export to CRM. | self-hosted or hosted demo |
You'll need a Bright Data account with two zones provisioned: SERP API and Web Unlocker. Plus an OpenRouter (or OpenAI / Anthropic) key for the LLM layer.
# 1. Clone and install
git clone https://github.com/brightdata/open-enrich.git
cd open-enrich
pnpm install
# 2. Configure
cp packages/web/.env.example packages/web/.env.local
# fill in:
# BRIGHT_DATA_API_KEY
# BRIGHT_DATA_SERP_ZONE
# BRIGHT_DATA_UNLOCKER_ZONE
# OPENROUTER_API_KEY
# 3. Run the web app
pnpm dev:web
# open http://localhost:3000# CLI: describe what you want in plain English
npx @brightdata/enrich leads.csv \
--describe "company size, recent funding, work email of contact"
# Or pick from presets
npx @brightdata/enrich leads.csv \
--fields employee_count,industry,funding_stage,person_email
# Programmatic estimate before kicking off a 5k-row run
npx @brightdata/enrich leads.csv --describe "..." --dry-runimport { runEnrichment } from "@brightdata/enrich-core";
for await (const event of runEnrichment({
rows: csvRows,
fields: ["employee_count", "funding_stage", "person_email"],
identifierColumn: "email",
credentials: { brightDataApiKey, serpZone, unlockerZone, openrouterKey },
})) {
if (event.type === "result") console.log(event.row, event.fields);
}Open Enrich ships an Agent Skill at skills/enriching-tables that teaches Claude Code, Cursor, Codex, Gemini CLI, and other agents how to enrich any table with the @brightdata/enrich CLI — onboarding, prerequisites, the dry-run-then-confirm workflow, the full field catalog, and troubleshooting.
# Install just this skill into your agent
npx skills add brightdata/open-enrich/skills/enriching-tables
# Or every skill in the repo
npx skills add brightdata/open-enrichIt's listed on skills.sh — discovery is automatic from the skills/<name>/SKILL.md layout, so no submission step is needed. After it's installed, just ask your agent to "enrich this CSV with company size and funding" and it takes over.
Open Enrich fingerprints the file before it asks you anything. Drop a Salesforce export, a HubSpot contact list, a LinkedIn Sales Navigator dump, or any CRM CSV. It detects the format, picks the best identifier column, and skips fields you already have.
Detected: HubSpot Export
Identifier: Company Domain (98% coverage)
Skipping 4 fields already present in your CSV
Enriching 2,847 of 2,900 rows with 9 new fields
Estimated cost: $42.71 | Estimated time: 18m
Coverage gauges, format-specific presets, smart column mapping. The boring part of enrichment, handled.
The trust layer most commercial tools don't ship. Every enriched cell carries:
- the source URL the value came from
- the quote that supports it
- a confidence score (0 to 1) factoring corroboration across sources
- a freshness timestamp (amber after 24h, red after 7d)
Hover a cell, see the receipts. Click a row, get a full evidence panel side by side with the original CSV data. Export a Markdown audit report for sales ops.
Post-enrichment is where most tools dump you. Open Enrich segments the run into three queues:
- Fully enriched — every requested field at high confidence. Ship today.
- Has buying signal — funding, hiring spike, leadership change. Prioritize.
- Needs review — partial data or low confidence. Manual triage.
Filter, search, segment. Then export with field mappings that drop into Salesforce or HubSpot without column-rename gymnastics. Or copy as Google Sheets / TSV / JSON.
| Layer | Choice | Why |
|---|---|---|
| Data access | Bright Data (SERP API, Web Unlocker, structured scrapers, Deep Lookup) | 150M+ residential IPs, dedicated LinkedIn / Crunchbase scrapers, AI search across 1000+ sources |
| Orchestration | LangGraph 1.2 + Deep Agents 1.9 | stateful agent graphs, parallel fan-out, auto-eviction for large scrapes, subagent task delegation |
| LLM layer | OpenRouter (default), works with OpenAI / Anthropic / any compatible provider | model-agnostic, swap providers via env var |
| Validation | Zod 4 | typed structured output on every agent response |
| Web app | Next.js 16 + React 19 + Tailwind v4 + Framer Motion | App Router, RSC, SSE streaming |
| CLI | commander + tsup | bin distribution, NDJSON output, resumable runs |
| Demo quota | Turso libSQL | edge-friendly per-IP lifetime limits |
Bright Data bills per tool call at roughly $0.0015 each, uniform across SERP, Unlocker, and structured scrapers. A typical row touches 5 to 15 tool calls depending on what you ask for. LLM cost is on top, paid to your provider.
| Field set | Tool calls / row | Bright Data cost | LLM cost (OpenRouter / GPT-4o-mini) |
|---|---|---|---|
| Quick CRM fill (5 fields) | ~4 | $0.006 | ~$0.003 |
| Startup prospecting (14 fields) | ~10 | $0.015 | ~$0.012 |
| Enterprise research (11 fields, deep) | ~15 | $0.023 | ~$0.018 |
| With person enrichment | +6 | +$0.009 | +$0.005 |
Compare to $0.50 to $2.00 per row at commercial enrichment vendors. Run enrich --dry-run for a per-run estimate before you commit.
All config lives in env vars. The CLI also accepts a stored config via enrich login.
# Required
BRIGHT_DATA_API_KEY= # https://brightdata.com/cp/setting/users
BRIGHT_DATA_SERP_ZONE= # zone name from your Bright Data dashboard
BRIGHT_DATA_UNLOCKER_ZONE= # zone name from your Bright Data dashboard
OPENROUTER_API_KEY= # https://openrouter.ai/keys
# Optional (web app)
NEXT_PUBLIC_BASE_PATH= # serve under /open-enrich for subpath deploys
DEMO_MODE=1 # enable 50-row-per-IP lifetime cap (requires Turso)
TURSO_DATABASE_URL=
TURSO_AUTH_TOKEN=Provider-agnostic LLM: swap OPENROUTER_API_KEY for OPENAI_API_KEY or ANTHROPIC_API_KEY and createLLM() picks the right adapter.
open-enrich/
├── packages/
│ ├── core/ @brightdata/enrich-core ← the engine
│ │ ├── agents/ LangGraph nodes, Deep Agents, prompts
│ │ ├── tools/ Bright Data tool wrappers
│ │ ├── services/ SERP, Web Unlocker, structured scrapers
│ │ ├── schemas/ Zod schemas for structured output
│ │ ├── csv/ parsing, format detection, CRM export
│ │ └── run-enrichment.ts public API entrypoint
│ │
│ ├── cli/ @brightdata/enrich ← terminal UX
│ │ └── src/ commander commands, login, NDJSON streaming
│ │
│ └── web/ open-enrich ← Next.js 16 app
│ ├── app/ App Router pages + API routes (SSE)
│ ├── components/ upload, config, table, modals, panels
│ ├── hooks/ useEnrichment, useCSVParser, useFilters
│ └── lib/ demo-mode, MCP, quality metrics
│
└── docs/
└── assets/ README assets (logos, demo GIF)
Done and shipping today: company enrichment, person enrichment, source attribution, confidence scoring, CSV format detection, CRM export, MCP server integration, cost tracking, demo quota.
Next:
- batch resumable runs for 100k+ row CSVs from the web UI
- a webhook / queue mode for always-on enrichment from a CRM source
- richer signal detection (G2 mentions, podcast appearances, conference rosters)
- agent observability via LangSmith traces in the UI
See TASKS.md for the full historical changelog.
PRs welcome. Three things worth knowing before you open one:
- Run typecheck and build before pushing.
pnpm typecheck && pnpm build. - The web app uses Next.js 16, which has breaking API changes from older versions. Read the relevant doc in
node_modules/next/dist/docs/before touching App Router internals. - Don't add agents speculatively. New agents should solve a named extraction problem the current set can't handle, with a benchmark to prove it.
Issues are the place to file bugs and propose features.
MIT. Use it, fork it, ship it.