Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
16 commits
Select commit Hold shift + click to select a range
70ebc47
feat(dataplex): add tools to support metadata enrichment workflow
harmonisha-wq May 21, 2026
5036178
feat(dataplex): add data profile, discovery, and quality tools
harmonisha-wq May 29, 2026
48a0227
Fix description for get_run_status reference
harmonisha-wq Jun 3, 2026
c13e5d8
Update dataplex.yaml
harmonisha-wq Jun 3, 2026
4f9662c
docs: add tools documentation and update integration tests
harmonisha-wq Jun 4, 2026
d7ffc5b
test: add integration tests for remaining Dataplex lifecycle tools
harmonisha-wq Jun 5, 2026
adecd14
docs: remove unsupported projectID parameter from Dataplex tool refer…
harmonisha-wq Jun 5, 2026
f7ab9a7
docs: correct lookup_context description to remove unsupported name p…
harmonisha-wq Jun 5, 2026
8d51770
docs(dataplex): separate tool invocation parameters from tools.yaml r…
harmonisha-wq Jun 10, 2026
ad89107
docs(dataplex): replace empty Requirements heading with direct IAM Pe…
harmonisha-wq Jun 10, 2026
0f4dc36
refactor(dataplex): remove unused tokenSource field and oauth2 import…
harmonisha-wq Jun 10, 2026
6dab865
refactor(dataplex): remove unnecessary projectID parameter from scan …
harmonisha-wq Jun 10, 2026
3030e1b
refactor(dataplex): streamline parameter descriptions by removing red…
harmonisha-wq Jun 10, 2026
d4e67ee
refactor(dataplex): enforce compile-time tools.Tool interface verific…
harmonisha-wq Jun 10, 2026
9c1d029
refactor(dataplex): refactor all 10 write tools to use BaseTool and C…
harmonisha-wq Jun 11, 2026
c30a8db
Fix dataplex search dq scans parameters and update unit & integration…
harmonisha-wq Jun 11, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions cmd/internal/config_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -1846,6 +1846,10 @@ func TestPrebuiltTools(t *testing.T) {
Name: "discovery",
ToolNames: []string{"search_entries", "lookup_entry", "search_aspect_types", "lookup_context", "search_dq_scans"},
},
"enrich": tools.ToolsetConfig{
Name: "enrich",
ToolNames: []string{"search_entries", "lookup_entry", "lookup_context", "generate_data_insights", "get_data_insights", "generate_data_profile", "get_data_profile", "discover_metadata", "get_discovery_results", "check_data_quality", "get_data_quality_results", "get_operation", "get_run_status"},
},
},
},
{
Expand Down
100 changes: 55 additions & 45 deletions cmd/internal/imports.go
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,51 @@ import (
// Import prompt packages for side effect of registration
_ "github.com/googleapis/mcp-toolbox/internal/prompts/custom"

_ "github.com/googleapis/mcp-toolbox/internal/sources/alloydbadmin"
_ "github.com/googleapis/mcp-toolbox/internal/sources/alloydbpg"
_ "github.com/googleapis/mcp-toolbox/internal/sources/bigquery"
_ "github.com/googleapis/mcp-toolbox/internal/sources/bigtable"
_ "github.com/googleapis/mcp-toolbox/internal/sources/cassandra"
_ "github.com/googleapis/mcp-toolbox/internal/sources/clickhouse"
_ "github.com/googleapis/mcp-toolbox/internal/sources/cloudgda"
_ "github.com/googleapis/mcp-toolbox/internal/sources/cloudhealthcare"
_ "github.com/googleapis/mcp-toolbox/internal/sources/cloudloggingadmin"
_ "github.com/googleapis/mcp-toolbox/internal/sources/cloudmonitoring"
_ "github.com/googleapis/mcp-toolbox/internal/sources/cloudsqladmin"
_ "github.com/googleapis/mcp-toolbox/internal/sources/cloudsqlmssql"
_ "github.com/googleapis/mcp-toolbox/internal/sources/cloudsqlmysql"
_ "github.com/googleapis/mcp-toolbox/internal/sources/cloudsqlpg"
_ "github.com/googleapis/mcp-toolbox/internal/sources/cloudstorage"
_ "github.com/googleapis/mcp-toolbox/internal/sources/cockroachdb"
_ "github.com/googleapis/mcp-toolbox/internal/sources/couchbase"
_ "github.com/googleapis/mcp-toolbox/internal/sources/datalineage"
_ "github.com/googleapis/mcp-toolbox/internal/sources/dataplex"
_ "github.com/googleapis/mcp-toolbox/internal/sources/dataproc"
_ "github.com/googleapis/mcp-toolbox/internal/sources/dgraph"
_ "github.com/googleapis/mcp-toolbox/internal/sources/elasticsearch"
_ "github.com/googleapis/mcp-toolbox/internal/sources/firebird"
_ "github.com/googleapis/mcp-toolbox/internal/sources/firestore"
_ "github.com/googleapis/mcp-toolbox/internal/sources/http"
_ "github.com/googleapis/mcp-toolbox/internal/sources/looker"
_ "github.com/googleapis/mcp-toolbox/internal/sources/mindsdb"
_ "github.com/googleapis/mcp-toolbox/internal/sources/mongodb"
_ "github.com/googleapis/mcp-toolbox/internal/sources/mssql"
_ "github.com/googleapis/mcp-toolbox/internal/sources/mysql"
_ "github.com/googleapis/mcp-toolbox/internal/sources/neo4j"
_ "github.com/googleapis/mcp-toolbox/internal/sources/oceanbase"
_ "github.com/googleapis/mcp-toolbox/internal/sources/oracle"
_ "github.com/googleapis/mcp-toolbox/internal/sources/postgres"
_ "github.com/googleapis/mcp-toolbox/internal/sources/redis"
_ "github.com/googleapis/mcp-toolbox/internal/sources/serverlessspark"
_ "github.com/googleapis/mcp-toolbox/internal/sources/singlestore"
_ "github.com/googleapis/mcp-toolbox/internal/sources/snowflake"
_ "github.com/googleapis/mcp-toolbox/internal/sources/spanner"
_ "github.com/googleapis/mcp-toolbox/internal/sources/sqlite"
_ "github.com/googleapis/mcp-toolbox/internal/sources/tidb"
_ "github.com/googleapis/mcp-toolbox/internal/sources/trino"
_ "github.com/googleapis/mcp-toolbox/internal/sources/valkey"
_ "github.com/googleapis/mcp-toolbox/internal/sources/yugabytedb"

// Import tool packages for side effect of registration
_ "github.com/googleapis/mcp-toolbox/internal/tools/alloydb/alloydbcreatecluster"
_ "github.com/googleapis/mcp-toolbox/internal/tools/alloydb/alloydbcreateinstance"
Expand Down Expand Up @@ -113,6 +158,16 @@ import (
_ "github.com/googleapis/mcp-toolbox/internal/tools/couchbase"
_ "github.com/googleapis/mcp-toolbox/internal/tools/dataform/dataformcompilelocal"
_ "github.com/googleapis/mcp-toolbox/internal/tools/datalineage/datalineagesearchlineage"
_ "github.com/googleapis/mcp-toolbox/internal/tools/dataplex/dataplexcheckdataquality"
_ "github.com/googleapis/mcp-toolbox/internal/tools/dataplex/dataplexdiscovermetadata"
_ "github.com/googleapis/mcp-toolbox/internal/tools/dataplex/dataplexgeneratedatainsights"
_ "github.com/googleapis/mcp-toolbox/internal/tools/dataplex/dataplexgeneratedataprofile"
_ "github.com/googleapis/mcp-toolbox/internal/tools/dataplex/dataplexgetdatainsights"
_ "github.com/googleapis/mcp-toolbox/internal/tools/dataplex/dataplexgetdataprofile"
_ "github.com/googleapis/mcp-toolbox/internal/tools/dataplex/dataplexgetdataqualityresults"
_ "github.com/googleapis/mcp-toolbox/internal/tools/dataplex/dataplexgetdiscoveryresults"
_ "github.com/googleapis/mcp-toolbox/internal/tools/dataplex/dataplexgetoperation"
_ "github.com/googleapis/mcp-toolbox/internal/tools/dataplex/dataplexgetrunstatus"
_ "github.com/googleapis/mcp-toolbox/internal/tools/dataplex/dataplexlookupcontext"
_ "github.com/googleapis/mcp-toolbox/internal/tools/dataplex/dataplexlookupentry"
_ "github.com/googleapis/mcp-toolbox/internal/tools/dataplex/dataplexsearchaspecttypes"
Expand Down Expand Up @@ -268,49 +323,4 @@ import (
_ "github.com/googleapis/mcp-toolbox/internal/tools/utility/wait"
_ "github.com/googleapis/mcp-toolbox/internal/tools/valkey"
_ "github.com/googleapis/mcp-toolbox/internal/tools/yugabytedbsql"

_ "github.com/googleapis/mcp-toolbox/internal/sources/alloydbadmin"
_ "github.com/googleapis/mcp-toolbox/internal/sources/alloydbpg"
_ "github.com/googleapis/mcp-toolbox/internal/sources/bigquery"
_ "github.com/googleapis/mcp-toolbox/internal/sources/bigtable"
_ "github.com/googleapis/mcp-toolbox/internal/sources/cassandra"
_ "github.com/googleapis/mcp-toolbox/internal/sources/clickhouse"
_ "github.com/googleapis/mcp-toolbox/internal/sources/cloudgda"
_ "github.com/googleapis/mcp-toolbox/internal/sources/cloudhealthcare"
_ "github.com/googleapis/mcp-toolbox/internal/sources/cloudloggingadmin"
_ "github.com/googleapis/mcp-toolbox/internal/sources/cloudmonitoring"
_ "github.com/googleapis/mcp-toolbox/internal/sources/cloudsqladmin"
_ "github.com/googleapis/mcp-toolbox/internal/sources/cloudsqlmssql"
_ "github.com/googleapis/mcp-toolbox/internal/sources/cloudsqlmysql"
_ "github.com/googleapis/mcp-toolbox/internal/sources/cloudsqlpg"
_ "github.com/googleapis/mcp-toolbox/internal/sources/cloudstorage"
_ "github.com/googleapis/mcp-toolbox/internal/sources/cockroachdb"
_ "github.com/googleapis/mcp-toolbox/internal/sources/couchbase"
_ "github.com/googleapis/mcp-toolbox/internal/sources/datalineage"
_ "github.com/googleapis/mcp-toolbox/internal/sources/dataplex"
_ "github.com/googleapis/mcp-toolbox/internal/sources/dataproc"
_ "github.com/googleapis/mcp-toolbox/internal/sources/dgraph"
_ "github.com/googleapis/mcp-toolbox/internal/sources/elasticsearch"
_ "github.com/googleapis/mcp-toolbox/internal/sources/firebird"
_ "github.com/googleapis/mcp-toolbox/internal/sources/firestore"
_ "github.com/googleapis/mcp-toolbox/internal/sources/http"
_ "github.com/googleapis/mcp-toolbox/internal/sources/looker"
_ "github.com/googleapis/mcp-toolbox/internal/sources/mindsdb"
_ "github.com/googleapis/mcp-toolbox/internal/sources/mongodb"
_ "github.com/googleapis/mcp-toolbox/internal/sources/mssql"
_ "github.com/googleapis/mcp-toolbox/internal/sources/mysql"
_ "github.com/googleapis/mcp-toolbox/internal/sources/neo4j"
_ "github.com/googleapis/mcp-toolbox/internal/sources/oceanbase"
_ "github.com/googleapis/mcp-toolbox/internal/sources/oracle"
_ "github.com/googleapis/mcp-toolbox/internal/sources/postgres"
_ "github.com/googleapis/mcp-toolbox/internal/sources/redis"
_ "github.com/googleapis/mcp-toolbox/internal/sources/serverlessspark"
_ "github.com/googleapis/mcp-toolbox/internal/sources/singlestore"
_ "github.com/googleapis/mcp-toolbox/internal/sources/snowflake"
_ "github.com/googleapis/mcp-toolbox/internal/sources/spanner"
_ "github.com/googleapis/mcp-toolbox/internal/sources/sqlite"
_ "github.com/googleapis/mcp-toolbox/internal/sources/tidb"
_ "github.com/googleapis/mcp-toolbox/internal/sources/trino"
_ "github.com/googleapis/mcp-toolbox/internal/sources/valkey"
_ "github.com/googleapis/mcp-toolbox/internal/sources/yugabytedb"
)
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,20 @@ aliases:
* **Tools:**
* `search_entries`: Searches for entries in Knowledge Catalog.
* `lookup_entry`: Retrieves a specific entry from Knowledge Catalog.
* `search_aspect_types`: Finds aspect types relevant to the
query.
* `search_aspect_types`: Finds aspect types relevant to the query.
* `lookup_context`: Retrieves rich metadata regarding one or more data assets along with their relationships.
* `search_dq_scans`: Search for data quality scans in Dataplex.
* `generate_data_insights`: Creates a new Dataplex Data Documentation scan template and triggers the run.
* `get_data_insights`: Retrieves the final generated data insights for a completed scan.
* `generate_data_profile`: Creates a new Dataplex Data Profile scan template and triggers the run.
* `get_data_profile`: Retrieves the final generated data profile results.
* `discover_metadata`: Creates a new Dataplex Data Discovery scan template and triggers the run.
* `get_discovery_results`: Retrieves the final generated data discovery results.
* `check_data_quality`: Creates a new Dataplex Data Quality scan template and triggers the run.
* `get_data_quality_results`: Retrieves the final generated data quality results.
* `get_operation`: Retrieves the status of a Dataplex long-running operation.
* `get_run_status`: Retrieves the execution status of the latest background job run.
* **Toolsets:**
* `discovery`: Metadata discovery and search toolset (`search_entries`, `lookup_entry`, `search_aspect_types`, `lookup_context`, `search_dq_scans`).
* `enrich`: Metadata enrichment pipeline orchestration and execution toolset (`search_entries`, `lookup_entry`, `lookup_context`, `generate_data_insights`, `get_data_insights`, `generate_data_profile`, `get_data_profile`, `discover_metadata`, `get_discovery_results`, `check_data_quality`, `get_data_quality_results`, `get_operation`, `get_run_status`).

Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
---
title: "dataplex-check-data-quality"
type: docs
weight: 1
description: >
Creates a new Dataplex Data Quality scan template for a specified BigQuery table and triggers the initial asynchronous execution run using custom defined quality rules.
aliases:
- /integrations/dataplex/tools/dataplex-check-data-quality/
---

## About

A `dataplex-check-data-quality` tool triggers a new Data Quality scan to evaluate rules (e.g. non-null, value range limits, custom SQL assertions) against table rows.

Since scan template creation is asynchronous, this tool returns an LRO name. You must poll `dataplex-get-operation` with this ID until it is done, extract the `scanId`, and poll `dataplex-get-run-status` with the `scanId` until the job is `SUCCEEDED` before calling `dataplex-get-data-quality-results` to fetch results.


## Compatible Sources

{{< compatible-sources >}}

## IAM Permissions

Knowledge Catalog uses [Identity and Access Management (IAM)][iam-overview] to control
user and group access to Knowledge Catalog resources. Toolbox will use your
[Application Default Credentials (ADC)][adc] to authorize and authenticate when
interacting with [Knowledge Catalog][dataplex-docs].

In addition to [setting the ADC for your server][set-adc], you need to ensure
the IAM identity has been given the correct IAM permissions for the tasks you
intend to perform. See [Knowledge Catalog IAM permissions][iam-permissions]
and [Knowledge Catalog IAM roles][iam-roles] for more information on
applying IAM permissions and roles to an identity.

[iam-overview]: https://cloud.google.com/dataplex/docs/iam-and-access-control
[adc]: https://cloud.google.com/docs/authentication#adc
[set-adc]: https://cloud.google.com/docs/authentication/provide-credentials-adc
[iam-permissions]: https://cloud.google.com/dataplex/docs/iam-permissions
[iam-roles]: https://cloud.google.com/dataplex/docs/iam-roles

## Parameters

The `dataplex-check-data-quality` tool accepts the following parameters:

| **field** | **type** | **required** | **description** |
| --------- | :------: | :----------: | --------------- |
| resourcePath | string | true | The resource path of the target BigQuery table (format: `projects/{project}/datasets/{dataset}/tables/{table}`). |
| location | string | true | The Google Cloud region where the scan should be executed (e.g. `us-central1`). |
| publish | boolean | false | If true, publishes the quality results directly to the Dataplex Universal Catalog. Defaults to false. |
| specJSON | string | true | A raw JSON string defining the quality checks rules (e.g. `{"rules": [{"column": "age", "nonNullExpectation": {}}]}`, maps directly to `dataplexpb.DataQualitySpec`). |

## Example

```yaml
kind: tool
name: check_data_quality
type: dataplex-check-data-quality
source: my-dataplex-source
description: Trigger a new data quality scan.
```

## Reference

| **field** | **type** | **required** | **description** |
|-------------|:--------:|:------------:|----------------------------------------------------|
| type | string | true | Must be "dataplex-check-data-quality". |
| source | string | true | Name of the source the tool should execute on. |
| description | string | true | Description of the tool that is passed to the LLM. |
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
---
title: "dataplex-discover-metadata"
type: docs
weight: 1
description: >
Creates a new Dataplex Data Discovery scan template for a specified Cloud Storage bucket and triggers the initial asynchronous execution run to crawl files, infer schemas, and register tables in BigQuery.
aliases:
- /integrations/dataplex/tools/dataplex-discover-metadata/
---

## About

A `dataplex-discover-metadata` tool triggers a new Data Discovery scan to automatically crawl GCS directories, infer schemas/partitions, and publish them as BigQuery tables.

Since scan template creation is asynchronous, this tool returns an LRO name. You must poll `dataplex-get-operation` with this ID until it is done, extract the `scanId`, and poll `dataplex-get-run-status` with the `scanId` until the job is `SUCCEEDED` before calling `dataplex-get-discovery-results` to fetch results.


## Compatible Sources

{{< compatible-sources >}}

## IAM Permissions

Knowledge Catalog uses [Identity and Access Management (IAM)][iam-overview] to control
user and group access to Knowledge Catalog resources. Toolbox will use your
[Application Default Credentials (ADC)][adc] to authorize and authenticate when
interacting with [Knowledge Catalog][dataplex-docs].

In addition to [setting the ADC for your server][set-adc], you need to ensure
the IAM identity has been given the correct IAM permissions for the tasks you
intend to perform. See [Knowledge Catalog IAM permissions][iam-permissions]
and [Knowledge Catalog IAM roles][iam-roles] for more information on
applying IAM permissions and roles to an identity.

[iam-overview]: https://cloud.google.com/dataplex/docs/iam-and-access-control
[adc]: https://cloud.google.com/docs/authentication#adc
[set-adc]: https://cloud.google.com/docs/authentication/provide-credentials-adc
[iam-permissions]: https://cloud.google.com/dataplex/docs/iam-permissions
[iam-roles]: https://cloud.google.com/dataplex/docs/iam-roles

## Parameters

The `dataplex-discover-metadata` tool accepts the following parameters:

| **field** | **type** | **required** | **description** |
| --------- | :------: | :----------: | --------------- |
| resourcePath | string | true | The resource path of the target Cloud Storage bucket (format: `//storage.googleapis.com/{bucket_name}`). |
| location | string | true | The Google Cloud region where the scan should be executed (e.g. `us-central1`). |

## Example

```yaml
kind: tool
name: discover_metadata
type: dataplex-discover-metadata
source: my-dataplex-source
description: Trigger a new metadata discovery scan.
```

## Reference

| **field** | **type** | **required** | **description** |
|-------------|:--------:|:------------:|----------------------------------------------------|
| type | string | true | Must be "dataplex-discover-metadata". |
| source | string | true | Name of the source the tool should execute on. |
| description | string | true | Description of the tool that is passed to the LLM. |
Loading
Loading