Skip to content

brightdata/open-enrich

Repository files navigation

Open Enrich

Open Enrich

The open-source enrichment engine for go-to-market teams. A LangGraph agent swarm wired into Bright Data's web infrastructure. Drop in a CSV, get back company intelligence, person contacts, and buying signals, with a citation on every cell.


License: MIT Node Next.js LangGraph Bright Data skills.sh


Powered by Bright Data

▶ Demo

Open Enrich in action

Why this exists

Commercial enrichment tools cost $0.50 to $2.00 per row, lock data behind contracts, and rarely tell you where a field came from. Generic scrapers are cheap but ship raw HTML — you still have to build the agent, the extraction logic, the confidence scoring, the CRM mapping.

Open Enrich is the missing layer in between. The web is the dataset, Bright Data is the access layer, LangGraph is the orchestrator, and the output is a CSV your SDRs can paste into Salesforce or HubSpot at 9am Monday.

Commercial enrichment tools Generic scrapers Open Enrich
Cost per row $0.50 to $2.00 low, but you build everything ~$0.01 to $0.05 in API calls
Source attribution rare none every cell, with URL + snippet
LinkedIn / Crunchbase yes (contracted) blocked or unreliable yes (Bright Data structured scrapers)
Person-level emails yes (gated) no yes (SERP pattern discovery + LinkedIn fallback)
Self-hosted no yes yes
CRM-formatted export yes no yes (Salesforce + HubSpot mappings)

How it works

A single CSV row becomes a stateful LangGraph run. Discovery first, then specialist agents fan out in parallel, then a validator merges results.

Open Enrich architecture

Each agent is a typed LangChain tool() graph with structured Zod output. Tools wrap Bright Data's SERP API, Web Unlocker, LinkedIn / Crunchbase scrapers, and Deep Lookup. Failures are isolated per-agent so one bad scrape never kills a row.


What you can extract

ICP qualification — employee count, revenue, industry, HQ, company type, funding stage Buying signals — recent funding, hiring velocity, leadership changes, job postings, tech changes Personalization hooks — company mission, recent news, blog topics, conference activity Contact data — work emails (SERP pattern + verification), phone, LinkedIn, GitHub, Twitter, title, seniority, location Social / web — LinkedIn URL, Twitter, GitHub org, company blog, careers page Custom fields — describe what you want in plain English, an agent does the research

Confidence scoring is built in. A field corroborated by three independent sources scores 0.95; a single-source heuristic scores 0.50. Every value carries the URL it came from.


The monorepo

Three packages, one engine.

Package What it is Install
@brightdata/enrich-core The engine. LangGraph, agents, Bright Data tool wrappers, Zod schemas. Used by the other two. workspace only
@brightdata/enrich Headless CLI. enrich leads.csv --describe "...". NDJSON streaming, resumable runs, dry-run cost estimates. npm i -g @brightdata/enrich
open-enrich web Next.js 16 web app. Upload, configure, watch agents run live, export to CRM. self-hosted or hosted demo

Quick start

Prerequisites

You'll need a Bright Data account with two zones provisioned: SERP API and Web Unlocker. Plus an OpenRouter (or OpenAI / Anthropic) key for the LLM layer.

# 1. Clone and install
git clone https://github.com/brightdata/open-enrich.git
cd open-enrich
pnpm install

# 2. Configure
cp packages/web/.env.example packages/web/.env.local
# fill in:
#   BRIGHT_DATA_API_KEY
#   BRIGHT_DATA_SERP_ZONE
#   BRIGHT_DATA_UNLOCKER_ZONE
#   OPENROUTER_API_KEY

# 3. Run the web app
pnpm dev:web
# open http://localhost:3000

Or skip the UI

# CLI: describe what you want in plain English
npx @brightdata/enrich leads.csv \
  --describe "company size, recent funding, work email of contact"

# Or pick from presets
npx @brightdata/enrich leads.csv \
  --fields employee_count,industry,funding_stage,person_email

# Programmatic estimate before kicking off a 5k-row run
npx @brightdata/enrich leads.csv --describe "..." --dry-run

Or embed the engine

import { runEnrichment } from "@brightdata/enrich-core";

for await (const event of runEnrichment({
  rows: csvRows,
  fields: ["employee_count", "funding_stage", "person_email"],
  identifierColumn: "email",
  credentials: { brightDataApiKey, serpZone, unlockerZone, openrouterKey },
})) {
  if (event.type === "result") console.log(event.row, event.fields);
}

Or drive it from your coding agent

Open Enrich ships an Agent Skill at skills/enriching-tables that teaches Claude Code, Cursor, Codex, Gemini CLI, and other agents how to enrich any table with the @brightdata/enrich CLI — onboarding, prerequisites, the dry-run-then-confirm workflow, the full field catalog, and troubleshooting.

# Install just this skill into your agent
npx skills add brightdata/open-enrich/skills/enriching-tables

# Or every skill in the repo
npx skills add brightdata/open-enrich

It's listed on skills.sh — discovery is automatic from the skills/<name>/SKILL.md layout, so no submission step is needed. After it's installed, just ask your agent to "enrich this CSV with company size and funding" and it takes over.


CSV-aware from the first row

Open Enrich fingerprints the file before it asks you anything. Drop a Salesforce export, a HubSpot contact list, a LinkedIn Sales Navigator dump, or any CRM CSV. It detects the format, picks the best identifier column, and skips fields you already have.

Detected: HubSpot Export
Identifier: Company Domain (98% coverage)
Skipping 4 fields already present in your CSV
Enriching 2,847 of 2,900 rows with 9 new fields
Estimated cost: $42.71  |  Estimated time: 18m

Coverage gauges, format-specific presets, smart column mapping. The boring part of enrichment, handled.


Source transparency

The trust layer most commercial tools don't ship. Every enriched cell carries:

  • the source URL the value came from
  • the quote that supports it
  • a confidence score (0 to 1) factoring corroboration across sources
  • a freshness timestamp (amber after 24h, red after 7d)

Hover a cell, see the receipts. Click a row, get a full evidence panel side by side with the original CSV data. Export a Markdown audit report for sales ops.


Outbound-ready, not just enriched

Post-enrichment is where most tools dump you. Open Enrich segments the run into three queues:

  1. Fully enriched — every requested field at high confidence. Ship today.
  2. Has buying signal — funding, hiring spike, leadership change. Prioritize.
  3. Needs review — partial data or low confidence. Manual triage.

Filter, search, segment. Then export with field mappings that drop into Salesforce or HubSpot without column-rename gymnastics. Or copy as Google Sheets / TSV / JSON.


Tech stack

Layer Choice Why
Data access Bright Data (SERP API, Web Unlocker, structured scrapers, Deep Lookup) 150M+ residential IPs, dedicated LinkedIn / Crunchbase scrapers, AI search across 1000+ sources
Orchestration LangGraph 1.2 + Deep Agents 1.9 stateful agent graphs, parallel fan-out, auto-eviction for large scrapes, subagent task delegation
LLM layer OpenRouter (default), works with OpenAI / Anthropic / any compatible provider model-agnostic, swap providers via env var
Validation Zod 4 typed structured output on every agent response
Web app Next.js 16 + React 19 + Tailwind v4 + Framer Motion App Router, RSC, SSE streaming
CLI commander + tsup bin distribution, NDJSON output, resumable runs
Demo quota Turso libSQL edge-friendly per-IP lifetime limits

Cost

Bright Data bills per tool call at roughly $0.0015 each, uniform across SERP, Unlocker, and structured scrapers. A typical row touches 5 to 15 tool calls depending on what you ask for. LLM cost is on top, paid to your provider.

Field set Tool calls / row Bright Data cost LLM cost (OpenRouter / GPT-4o-mini)
Quick CRM fill (5 fields) ~4 $0.006 ~$0.003
Startup prospecting (14 fields) ~10 $0.015 ~$0.012
Enterprise research (11 fields, deep) ~15 $0.023 ~$0.018
With person enrichment +6 +$0.009 +$0.005

Compare to $0.50 to $2.00 per row at commercial enrichment vendors. Run enrich --dry-run for a per-run estimate before you commit.


Configuration

All config lives in env vars. The CLI also accepts a stored config via enrich login.

# Required
BRIGHT_DATA_API_KEY=          # https://brightdata.com/cp/setting/users
BRIGHT_DATA_SERP_ZONE=        # zone name from your Bright Data dashboard
BRIGHT_DATA_UNLOCKER_ZONE=    # zone name from your Bright Data dashboard
OPENROUTER_API_KEY=           # https://openrouter.ai/keys

# Optional (web app)
NEXT_PUBLIC_BASE_PATH=        # serve under /open-enrich for subpath deploys
DEMO_MODE=1                   # enable 50-row-per-IP lifetime cap (requires Turso)
TURSO_DATABASE_URL=
TURSO_AUTH_TOKEN=

Provider-agnostic LLM: swap OPENROUTER_API_KEY for OPENAI_API_KEY or ANTHROPIC_API_KEY and createLLM() picks the right adapter.


Project layout

open-enrich/
├── packages/
│   ├── core/                 @brightdata/enrich-core   ← the engine
│   │   ├── agents/           LangGraph nodes, Deep Agents, prompts
│   │   ├── tools/            Bright Data tool wrappers
│   │   ├── services/         SERP, Web Unlocker, structured scrapers
│   │   ├── schemas/          Zod schemas for structured output
│   │   ├── csv/              parsing, format detection, CRM export
│   │   └── run-enrichment.ts public API entrypoint
│   │
│   ├── cli/                  @brightdata/enrich        ← terminal UX
│   │   └── src/              commander commands, login, NDJSON streaming
│   │
│   └── web/                  open-enrich               ← Next.js 16 app
│       ├── app/              App Router pages + API routes (SSE)
│       ├── components/       upload, config, table, modals, panels
│       ├── hooks/            useEnrichment, useCSVParser, useFilters
│       └── lib/              demo-mode, MCP, quality metrics
│
└── docs/
    └── assets/               README assets (logos, demo GIF)

Roadmap

Done and shipping today: company enrichment, person enrichment, source attribution, confidence scoring, CSV format detection, CRM export, MCP server integration, cost tracking, demo quota.

Next:

  • batch resumable runs for 100k+ row CSVs from the web UI
  • a webhook / queue mode for always-on enrichment from a CRM source
  • richer signal detection (G2 mentions, podcast appearances, conference rosters)
  • agent observability via LangSmith traces in the UI

See TASKS.md for the full historical changelog.


Contributing

PRs welcome. Three things worth knowing before you open one:

  1. Run typecheck and build before pushing. pnpm typecheck && pnpm build.
  2. The web app uses Next.js 16, which has breaking API changes from older versions. Read the relevant doc in node_modules/next/dist/docs/ before touching App Router internals.
  3. Don't add agents speculatively. New agents should solve a named extraction problem the current set can't handle, with a benchmark to prove it.

Issues are the place to file bugs and propose features.


License

MIT. Use it, fork it, ship it.


Bright Data

Built on Bright Data's web infrastructure. The internet is the dataset.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages