Skip to content

feat(dataplex): add tools to support metadata enrichment workflow#3270

Open
harmonisha-wq wants to merge 16 commits into
googleapis:mainfrom
harmonisha-wq:feat/dataplex-enrichment
Open

feat(dataplex): add tools to support metadata enrichment workflow#3270
harmonisha-wq wants to merge 16 commits into
googleapis:mainfrom
harmonisha-wq:feat/dataplex-enrichment

Conversation

@harmonisha-wq

Copy link
Copy Markdown
Contributor

Description

Add the insights related tools for the enrich toolset in Dataplex.

PR Checklist

Thank you for opening a Pull Request! Before submitting your PR, there are a
few things you can do to make sure it goes smoothly:

  • Make sure you reviewed
    CONTRIBUTING.md
  • Make sure to open an issue as a
    bug/issue
    before writing your code! That way we can discuss the change, evaluate
    designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)
  • Make sure to add ! if this involve a breaking change

🛠️ Fixes #3269

@harmonisha-wq harmonisha-wq requested review from a team as code owners May 21, 2026 07:01

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for Dataplex Data Insights by adding four new tools: generate_data_insights, get_data_insights, get_operation, and get_run_status. These tools allow for asynchronous generation and retrieval of data documentation for BigQuery resources. The reviewer identified several performance and security improvements in the internal/sources/dataplex implementation, including the need for input validation on operation names to prevent path traversal, more efficient token management, and optimizing the retrieval of the latest scan job by using server-side ordering and pagination instead of client-side iteration.

Comment thread internal/sources/dataplex/dataplex.go Outdated
Comment thread internal/sources/dataplex/dataplex.go Outdated
Comment thread internal/sources/dataplex/dataplex.go Outdated
Comment thread internal/sources/dataplex/dataplex.go
Comment thread internal/sources/dataplex/dataplex.go Outdated
@harmonisha-wq harmonisha-wq force-pushed the feat/dataplex-enrichment branch 2 times, most recently from 83016d5 to 2e89480 Compare May 21, 2026 10:47
Comment thread internal/sources/dataplex/dataplex.go Outdated
@harmonisha-wq harmonisha-wq force-pushed the feat/dataplex-enrichment branch 2 times, most recently from e3a91fa to ce3547a Compare May 27, 2026 05:36
Comment thread internal/prebuiltconfigs/tools/dataplex.yaml Outdated
Comment thread internal/prebuiltconfigs/tools/dataplex.yaml Outdated
Comment thread internal/prebuiltconfigs/tools/dataplex.yaml Outdated
Comment thread internal/prebuiltconfigs/tools/dataplex.yaml Outdated
Comment thread internal/prebuiltconfigs/tools/dataplex.yaml Outdated
Comment thread internal/prebuiltconfigs/tools/dataplex.yaml Outdated
Comment thread internal/prebuiltconfigs/tools/dataplex.yaml Outdated
@harmonisha-wq harmonisha-wq force-pushed the feat/dataplex-enrichment branch 9 times, most recently from 776be3f to 651f2ce Compare June 1, 2026 04:26
Comment thread internal/prebuiltconfigs/tools/dataplex.yaml Outdated
Comment thread internal/prebuiltconfigs/tools/dataplex.yaml
Comment thread internal/prebuiltconfigs/tools/dataplex.yaml Outdated

@Yuan325 Yuan325 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @harmonisha-wq thank you for adding these tools. Please also update (1) docs, and (2) integration tests for these tools. Thank you!

Comment thread internal/prebuiltconfigs/tools/dataplex.yaml Outdated

@Yuan325 Yuan325 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, please see the following feedbacks on top of the comments:

  1. Please update the prebuilt-config docs as well in docs/en/integrations/
  2. Please apply the comments across all newly added tools.

Comment thread internal/prebuiltconfigs/tools/dataplex.yaml
Comment thread internal/sources/dataplex/dataplex.go Outdated
Comment thread tests/dataplex/dataplex_integration_test.go Outdated
Comment thread tests/dataplex/dataplex_integration_test.go Outdated
Comment thread tests/dataplex/dataplex_integration_test.go Outdated
Comment thread tests/dataplex/dataplex_integration_test.go Outdated
Comment thread tests/dataplex/dataplex_integration_test.go Outdated
@Yuan325 Yuan325 added the release candidate Use label to signal PR should be included in the next release. label Jun 9, 2026
@harmonisha-wq harmonisha-wq force-pushed the feat/dataplex-enrichment branch 2 times, most recently from b2384ea to 37b6cb9 Compare June 11, 2026 08:44
This commit adds 6 new MCP tools to support Dataplex Data Profile,
Data Discovery, and Data Quality workflows, completely integrated
using standard gRPC client connections:
- generate_data_profile / get_data_profile
- generate_data_discovery / get_data_discovery
- generate_data_quality / get_data_quality

It also reuses get_operation and get_run_status to track these scans,
and refactors GetDataInsights to a generic GetDataScan method.

TAG=agy
CONV=74c80935-9552-4038-b5b9-5c0d69b81a8d
Corrected the reference to 'get_run_status' for clarity.
This commit addresses reviewer feedback for the Dataplex enrichment workflow:
1. Moves the 'enrich' toolset configuration block to the bottom of dataplex.yaml.
2. Adds documentation pages (markdown files) under docs/en/integrations/knowledge-catalog/tools/ for all 10 newly introduced/modified Dataplex tools.
3. Updates dataplex_integration_test.go to register the new tools, verify parameters of get endpoints, and implements runDataplexDataScanLifecycleIntegrationTest to verify the asynchronous scan creation, LRO polling (get_operation), job run checking (get_run_status), and scan result retrieving (get_data_profile) end-to-end.

TAG=agy
CONV=74c80935-9552-4038-b5b9-5c0d69b81a8d
This commit registers and adds complete end-to-end lifecycle integration tests for:
- generate_data_insights / get_data_insights
- discover_metadata / get_discovery_results
- check_data_quality / get_data_quality_results

This covers all 10 newly introduced Dataplex tools in the integration test suite.

TAG=agy
CONV=74c80935-9552-4038-b5b9-5c0d69b81a8d
…eference tables across all Knowledge Catalog tools
…rmissions heading across all Knowledge Catalog tools
…generation and lookup methods across Dataplex source and tools
…undant 'Required.' prefixes and ensuring optional parameters start with 'Optional.'
…ation, ensure correct annotations, deduplicate path normalization into dataplexcommon, and streamline integration tests into a table-driven suite
…onfigBase

This refactors the remaining 10 Dataplex write tools (checkdataquality, discovermetadata, generatedatainsights, generatedataprofile, getdatainsights, getdataprofile, getdataqualityresults, getdiscoveryresults, getoperation, getrunstatus) to match the new BaseTool framework design, and fixes the configuration initialization in their corresponding unit test files.

TAG=agy
CONV=74c80935-9552-4038-b5b9-5c0d69b81a8d
@harmonisha-wq harmonisha-wq force-pushed the feat/dataplex-enrichment branch 6 times, most recently from eb9ddd3 to 8f56181 Compare June 11, 2026 16:03
@harmonisha-wq harmonisha-wq force-pushed the feat/dataplex-enrichment branch from 8f56181 to c30a8db Compare June 11, 2026 16:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release candidate Use label to signal PR should be included in the next release.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add enrichment tools to dataplex

4 participants