mcp-datahub-v1.7.0
mcp-datahub v1.7.0 — GraphQL Schema Alignment & Validation Infrastructure
Corrects GraphQL query field paths across four client modules by validating every query against the upstream DataHub schema source files. Adds automated schema validation infrastructure to prevent future drift — all 59 query/mutation constants are now checked against the official .graphql definitions from datahub-project/datahub.
+25,426 lines | -318 lines | 46 files changed
Highlights
GraphQL Query Corrections (#121)
Cross-referenced all GraphQL queries with the upstream DataHub schema files (datahub-graphql-core/src/main/resources/*.graphql) and corrected field paths that did not match the actual API:
| Module | Issue | Fix |
|---|---|---|
documents.go |
DocumentRelatedAsset, DocumentRelatedDocument, DocumentParentDocument queried a direct urn field that doesn't exist on these wrapper types |
Changed to relatedAssets { asset { urn } }, relatedDocuments { document { urn } }, parentDocument { document { urn } } per upstream documents.graphql |
structured_properties.go |
Fragment targeted non-existent type EntityStructuredPropertiesResult |
Changed to StructuredProperties per upstream entity.graphql |
data_contracts.go |
Queried contract { result(refresh: false) { type assertionResults { ... } } } — the result field and its nested structure don't exist on DataContract |
Rewrote to contract { properties { freshness/schema/dataQuality { assertion { urn } } } status { state } } per upstream contract.graphql |
semantic_search.go |
Used non-existent input type SemanticSearchInput |
Changed to SearchAcrossEntitiesInput with searchAcrossEntities query per upstream search.graphql |
All corrections were verified against both the upstream .graphql source files (v1.4.0.3 and v1.5.0.1) and a live DataHub v1.4.0.3 instance.
Schema Validation Infrastructure (#121)
Adds automated, offline-capable validation of GraphQL queries against the upstream DataHub schema:
testdata/datahub-schema/— 31.graphqlschema files synced from datahub-project/datahub at tag v1.5.0.1, checked into the repo for CI without network accesstestdata/datahub-schema/sync.sh— downloads schema files from any tagged DataHub releasepkg/client/schema_validation_test.go— validates all 59 query/mutation constants against the schema: checks fragment targets, top-level query/mutation fields, inline fragment type names, and input type referencesmake schema-sync— download schema files for a target versionmake schema-check— run schema validation (now part ofmake verify)
Workflow for targeting a new DataHub version:
DATAHUB_VERSION=v1.5.0.1 make schema-sync # pull schema files
make schema-check # validate all queriesDataHub Version Compatibility Matrix
Updated CLAUDE.md with a verified compatibility matrix:
| DataHub Version | Features Available | Schema Validated |
|---|---|---|
| 1.3.x+ (minimum) | All read tools, all write operations except documents | No (pre-dates schema sync) |
| 1.4.x+ (full) | + Documents (create/update/delete), semantic search | Yes (v1.4.0.3) |
| 1.5.x+ (current) | + Batch data product operations | Yes (v1.5.0.1) |
Schema files were diff'd between v1.4.0.3 and v1.5.0.1 — the only change is a new batchAddToDataProducts/batchRemoveFromDataProducts mutation in entity.graphql. All types used by this library are identical across both versions.
Breaking Changes
types.AssertionResult simplified
The AssertionResult type in pkg/types/data_contracts.go was simplified to match the actual DataHub DataContract schema:
Removed fields:
ResultType string— the real API does not expose per-assertion result types through the contract queryNativeResults map[string]string— the real API does not expose native result details through the contract query
Before:
type AssertionResult struct {
AssertionURN string `json:"assertion_urn"`
Type string `json:"type"`
ResultType string `json:"result_type"`
NativeResults map[string]string `json:"native_results,omitempty"`
}After:
type AssertionResult struct {
AssertionURN string `json:"assertion_urn"`
Type string `json:"type"`
}If you were reading ResultType or NativeResults from AssertionResult, those fields were never populated by the actual DataHub API.
DataContract.Status values changed
The Status field now contains the DataContractState enum value from the status.state field (e.g., "ACTIVE", "PENDING") rather than the previously unpopulated result.type field (which was intended to contain "PASSING" / "FAILING").
Compatibility
| Requirement | Version |
|---|---|
| Go | 1.25+ |
| DataHub (minimum) | 1.3.x |
| DataHub (full feature set incl. documents) | 1.4.x+ |
| DataHub (schema validated against) | v1.5.0.1 |
Installation
Claude Desktop (macOS/Windows)
Download the .mcpb bundle for your platform and double-click to install:
- macOS Apple Silicon (M1/M2/M3/M4):
mcp-datahub_1.7.0_darwin_arm64.mcpb - macOS Intel:
mcp-datahub_1.7.0_darwin_amd64.mcpb - Windows:
mcp-datahub_1.7.0_windows_amd64.mcpb
Homebrew (macOS)
brew install txn2/tap/mcp-datahubClaude Code CLI
claude mcp add datahub \
-e DATAHUB_URL=https://your-datahub.example.com/api/graphql \
-e DATAHUB_TOKEN=your-token \
-- mcp-datahubDocker
docker pull ghcr.io/txn2/mcp-datahub:v1.7.0Verification
All release artifacts are signed with Cosign. Verify with:
cosign verify-blob --bundle mcp-datahub_1.7.0_linux_amd64.tar.gz.sigstore.json \
mcp-datahub_1.7.0_linux_amd64.tar.gz