dataplex-discover-metadata

Creates a new Dataplex Data Discovery scan template for a specified Cloud Storage bucket and triggers the initial asynchronous execution run to crawl files, infer schemas, and register tables in BigQuery.

About

A dataplex-discover-metadata tool triggers a new Data Discovery scan to automatically crawl GCS directories, infer schemas/partitions, and publish them as BigQuery tables.

Since scan template creation is asynchronous, this tool returns an LRO name. You must poll dataplex-get-operation with this ID until it is done, extract the scanId, and poll dataplex-get-run-status with the scanId until the job is SUCCEEDED before calling dataplex-get-discovery-results to fetch results.

Compatible Sources

This tool can be used with the following database sources:

Source Name
Knowledge Catalog (formerly known as Dataplex) Source

Requirements

IAM Permissions

Knowledge Catalog uses Identity and Access Management (IAM) to control user and group access to Knowledge Catalog resources. Toolbox will use your Application Default Credentials (ADC) to authorize and authenticate when interacting with [Knowledge Catalog][dataplex-docs].

In addition to setting the ADC for your server, you need to ensure the IAM identity has been given the correct IAM permissions for the tasks you intend to perform. See Knowledge Catalog IAM permissions and Knowledge Catalog IAM roles for more information on applying IAM permissions and roles to an identity.

Parameters

The dataplex-discover-metadata tool accepts the following parameters:

fieldtyperequireddescription
resourcePathstringtrueThe resource path of the target Cloud Storage bucket (format: //storage.googleapis.com/{bucket_name}).
locationstringtrueThe Google Cloud region where the scan should be executed (e.g. us-central1).

Example

kind: tool
name: discover_metadata
type: dataplex-discover-metadata
source: my-dataplex-source
description: Trigger a new metadata discovery scan.

Reference

fieldtyperequireddescription
typestringtrueMust be “dataplex-discover-metadata”.
sourcestringtrueName of the source the tool should execute on.
descriptionstringtrueDescription of the tool that is passed to the LLM.