Dữ liệu đang tăng trưởng với tốc độ chóng mặt, nhưng làm thế nào để…
Automate data governance, extend your data fabric with Dataplex-BigLake integration
Unlocking the full potential of data requires breaking down the silo between open-source data formats and data warehouses. At the same time, it is critical to enable data governance team to apply policies regardless of where the data happens, whether - on file or columnar storage.
Today, data governance teams must become subject matter experts on each storage system on which a company's data resides. As of February 2022, Google Dataplex provided a unified place to apply policies, propagated across both the raw store and the data warehouse in GCP. Instead of specifying policies in multiple places, carry the cognitive load of translating policies from “what you want the storage system to do” to “how your data will behave” Dataplex provides a single point for clear policy management. Now, Google is making it easier for you with BigLake.
Earlier this year, Google made BigLake generally available, BigLake unifies the data fabric between Data Lake and Data Warehouse by extending BigQuery storage to open file formats. Today, we announce BigLake Integration with Dataplex (available in preview). This integration eliminates the configuration steps for the admin taking advantage of BigLake and managing policies across GCS and BigQuery from a unified console.
Previously, you could point Dataplex at a Google Cloud Storage (GCS) bucket, and Dataplex will detect discover and extract all metadata from the data lake and register this metadata in BigQuery (and Dataproc Metastore, Data Catalog) for analysis and search. With the BigLake integration capability, we are building on this capability by allowing an “upgrade” of a bucket asset, and instead of just creating external tables in BigQuery for analysis - Dataplex will create policy-capable BigLake tables!
The immediate implication is that admins can now assign column, row, and table policies to the BigLake tables auto-created by Dataplex, as with BigLake - the infrastructure (GCS) layer is separate from the analysis layer (BigQuery). Dataplex will handle the creation of a BigQuery connection and a BigQuery publishing dataset and ensure the BigQuery service account has the correct permissions on the bucket.
But wait - there’s more. With this release of Dataplex, we are also introducing advanced logging called governance logs. Governance logs allow tracking the exact state of policy propagation to tables and columns - adding an additional level of detail going beyond the high-level “status” for the bucket and into fine-grained status and logs for tables, columns.
What’s next?
- We have updated our documentation for managing buckets and have additional detail regarding policy propagation and the upgrade process.
- Stay tuned for an exciting roadmap ahead, with more automation around policy management.
For more information, please visit:
- Google Cloud Dataplex
Contact Gimasys for advice on a transformation strategy that is right for your business situation and to experience the free Google Cloud Platform service:
- Hotline: Hanoi: 0987 682 505 – Ho Chi Minh: 0974 417 099
- Email: gcp@gimasys.com
Source: Gimasys