{"id":14272,"date":"2023-06-07T16:12:11","date_gmt":"2023-06-07T09:12:11","guid":{"rendered":"https:\/\/gcloudvn.com\/?p=14272"},"modified":"2023-08-16T13:25:15","modified_gmt":"2023-08-16T06:25:15","slug":"how-a-green-energy-provider-used-dataplex-for-its-data-governance-and-quality","status":"publish","type":"post","link":"https:\/\/gcloudvn.com\/en\/kienthuc\/how-a-green-energy-provider-used-dataplex-for-its-data-governance-and-quality\/","title":{"rendered":"How a green energy provider used Dataplex for its data governance and quality"},"content":{"rendered":"<p style=\"text-align: justify;\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-14279 size-full\" src=\"https:\/\/gcloudvn.com\/wp-content\/uploads\/2023\/06\/dataplex1.png\" alt=\"How a green energy provider used Dataplex for its data governance and quality\" width=\"600\" height=\"337\" srcset=\"https:\/\/gcloudvn.com\/wp-content\/uploads\/2023\/06\/dataplex1.png 600w, https:\/\/gcloudvn.com\/wp-content\/uploads\/2023\/06\/dataplex1-18x10.png 18w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-weight: 400;\">Brazil is one of the world\u2019s most promising renewables markets, in which Casa Dos Ventos is a leading pioneer and investor. With our innovation and investments we are leading the transition into a more competitive and sustainable future.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-weight: 400;\">Rely on \u201cBig Data\u201d to support big and important business decisions. Most of the data is stored in the BigQuery serverless enterprise data warehouse. Google continuously uses its improved tools and services <a href=\"https:\/\/gcloudvn.com\/en\/google-cloud-platform\/\">Google Cloud GCP<\/a> to accelerate business operations more efficiently.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-weight: 400;\">For example, in wind farm operations, the data is used to quantify energy production, losses and efficiency. For meteorological masts (also known as metmasts), the sensor data and configurations are constantly ingested and analyzed for their health. In newer and green field projects, we use data to make decisions on our investments.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-weight: 400;\">We need trusted data to make these decisions to avoid going wrong with our goals around uptime, efficiency, and investment returns! However, controlling data quality has been a challenge for us, frequently leading us to data firefighting.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-weight: 400;\">Previously, we built homegrown solutions that could have worked for us better \u2014 like setting rules and alerts in BI tools,or writing custom Python scripts. These approaches were hard to scale, standardize, and often costly.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-weight: 400;\">To solve these problems, we turned to Dataplex, an intelligent data fabric that unifies distributed data, to achieve better data governance in our organization and build trust in the data. With Dataplex, we now have a very streamlined way of organizing our data, securing, and monitoring data for data quality.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-weight: 400;\">We started Dataplex implementation with three key goals:\u00a0<\/span><\/p>\n<ul style=\"text-align: justify;\">\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Define a data governance framework for the organization.\u00a0<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Create reports that routinely measure adherence to the framework.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Create reports that routinely measure the quality of the data.<\/span><\/li>\n<\/ul>\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_80 counter-hierarchy ez-toc-counter ez-toc-grey ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of contents<\/p>\n<span class=\"ez-toc-title-toggle\"><a href=\"#\" class=\"ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle\" aria-label=\"Toggle Table of Content\"><span class=\"ez-toc-js-icon-con\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #999;color:#999\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewbox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #999;color:#999\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewbox=\"0 0 24 24\" version=\"1.2\" baseprofile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/span><\/a><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/gcloudvn.com\/en\/kienthuc\/how-a-green-energy-provider-used-dataplex-for-its-data-governance-and-quality\/#Xac_dinh_khung_quan_tri_du_lieu_cho_to_chuc\" >Define a data governance framework for the organization.\u00a0<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/gcloudvn.com\/en\/kienthuc\/how-a-green-energy-provider-used-dataplex-for-its-data-governance-and-quality\/#Tao_bao_cao_cho_noi_dung_du_lieu_va_quan_ly_noi_dung_du_lieu\" >Create reports for data assets and govern data assets\u00a0<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/gcloudvn.com\/en\/kienthuc\/how-a-green-energy-provider-used-dataplex-for-its-data-governance-and-quality\/#Quet_chat_luong_du_lieu_va_tao_cac_report_ve_chat_luong_du_lieu\" >Create data quality scans and reports\u00a0<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/gcloudvn.com\/en\/kienthuc\/how-a-green-energy-provider-used-dataplex-for-its-data-governance-and-quality\/#Y_dinh_tuong_lai\" >Looking ahead<\/a><\/li><\/ul><\/nav><\/div>\n<h2 style=\"text-align: justify;\"><span class=\"ez-toc-section\" id=\"Xac_dinh_khung_quan_tri_du_lieu_cho_to_chuc\"><\/span><b>Define a data governance framework for the organization.\u00a0<\/b><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p style=\"text-align: justify;\"><span style=\"font-weight: 400;\">We started by organizing the data in alignment with the business and then using Dataplex to set policies for this organization.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-weight: 400;\">Dataplex abstracts away the underlying data storage systems by using the constructs like lake, data zone, and assets. We decided to map these constructs to our business with the following framework:\u00a0<\/span><\/p>\n<ul style=\"text-align: justify;\">\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Lake - One lake per department in the company<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Data Zone - Separate data in subareas using the zone\u00a0<\/span>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"2\"><span style=\"font-weight: 400;\">Raw zone - Contains datasets used for raw tables or tables with few modifications \/ aggregations<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"2\"><span style=\"font-weight: 400;\">Curated zone - Contains datasets with aggregate tables or prediction tables (used by ML models)<\/span><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p style=\"text-align: justify;\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-14278 size-full\" src=\"https:\/\/gcloudvn.com\/wp-content\/uploads\/2023\/06\/dataplex2.png\" alt=\"Case Study: How Google Dataplex helps Casa Dos Ventos manage and ensure data quality 2\" width=\"600\" height=\"178\" srcset=\"https:\/\/gcloudvn.com\/wp-content\/uploads\/2023\/06\/dataplex2.png 600w, https:\/\/gcloudvn.com\/wp-content\/uploads\/2023\/06\/dataplex2-18x5.png 18w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><\/p>\n<h2 style=\"text-align: justify;\"><span class=\"ez-toc-section\" id=\"Tao_bao_cao_cho_noi_dung_du_lieu_va_quan_ly_noi_dung_du_lieu\"><\/span><b>Create reports for data assets and govern data assets\u00a0<\/b><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p style=\"text-align: justify;\"><span style=\"font-weight: 400;\">To monitor our data's governance stature, we have two reports that capture the current state.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-weight: 400;\">The first report tracks the entire data estate. We used BigQuery APIs and developed Python scripts (scheduled by Composer) to extract the metadata of all <a href=\"https:\/\/gcloudvn.com\/en\/bigquery\/\">BigQuery<\/a> tables in the organization. It also measures critical aspects like # of documented tables and views.<\/span><\/p>\n<p style=\"text-align: justify;\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-14277 size-full\" src=\"https:\/\/gcloudvn.com\/wp-content\/uploads\/2023\/06\/dataplex3.png\" alt=\"Case Study: How Google Dataplex helps Casa Dos Ventos manage and ensure data quality 3\" width=\"600\" height=\"351\" srcset=\"https:\/\/gcloudvn.com\/wp-content\/uploads\/2023\/06\/dataplex3.png 600w, https:\/\/gcloudvn.com\/wp-content\/uploads\/2023\/06\/dataplex3-18x12.png 18w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-weight: 400;\">Secondly, we also track our progress in continuously bringing our data estate into Dataplex governance. We followed the same process (API + Python code) to build the following dashboard. Currently, the datasets under Dataplex stand at 71.6% per this dashboard. Our goal is to get to 100% and then maintain that.<\/span><\/p>\n<p style=\"text-align: justify;\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-14276 size-full\" src=\"https:\/\/gcloudvn.com\/wp-content\/uploads\/2023\/06\/dataplex4.png\" alt=\"Case Study: How Google Dataplex helps Casa Dos Ventos manage and ensure data quality 4\" width=\"600\" height=\"455\" srcset=\"https:\/\/gcloudvn.com\/wp-content\/uploads\/2023\/06\/dataplex4.png 600w, https:\/\/gcloudvn.com\/wp-content\/uploads\/2023\/06\/dataplex4-16x12.png 16w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><\/p>\n<h2 style=\"text-align: justify;\"><span class=\"ez-toc-section\" id=\"Quet_chat_luong_du_lieu_va_tao_cac_report_ve_chat_luong_du_lieu\"><\/span><b>Create data quality scans and reports\u00a0<\/b><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p style=\"text-align: justify;\"><span style=\"font-weight: 400;\">Once data is under management in Dataplex, we build data quality reports and a dashboard in Dataplex with a few simple clicks.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-weight: 400;\">Multiple data quality scans run within Dataplex, one for each critical table.\u00a0<\/span><\/p>\n<p style=\"text-align: justify;\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-14275 size-full\" src=\"https:\/\/gcloudvn.com\/wp-content\/uploads\/2023\/06\/dataplex5.png\" alt=\"Case Study: How Google Dataplex helps Casa Dos Ventos manage and ensure data quality 5\" width=\"600\" height=\"131\" srcset=\"https:\/\/gcloudvn.com\/wp-content\/uploads\/2023\/06\/dataplex5.png 600w, https:\/\/gcloudvn.com\/wp-content\/uploads\/2023\/06\/dataplex5-18x4.png 18w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-weight: 400;\">To create rules we used the built-in rules but also created our own using custom SQL statements. For example, to make sure we do not ever have any rows that match a particular condition, we created a SQL rule that returns FALSE when we have even a single row matching the condition.<\/span><\/p>\n<p style=\"text-align: justify;\">(SELECT COUNT( ) as count_values<br \/>\nFROM `metmastDB.TableX`<br \/>\nWHERE `columnX` IS NULL and columnY&lt;&gt;\u201dsome string\u201d<br \/>\n) =0<\/p>\n<p style=\"text-align: justify;\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-14274 size-full\" src=\"https:\/\/gcloudvn.com\/wp-content\/uploads\/2023\/06\/dataplex6.png\" alt=\"Case Study: How Google Dataplex helps Casa Dos Ventos manage and ensure data quality 6\" width=\"600\" height=\"149\" srcset=\"https:\/\/gcloudvn.com\/wp-content\/uploads\/2023\/06\/dataplex6.png 600w, https:\/\/gcloudvn.com\/wp-content\/uploads\/2023\/06\/dataplex6-18x4.png 18w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-weight: 400;\">When these checks fail, we rely on the query shown by Dataplex AutoDQ to find the rows that failed.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-weight: 400;\">To build a dashboard for data quality, we use the logs in Cloud Logging and set up a sink to BigQuery. Once the data lands in BigQuery, we create a view with following query:<\/span><\/p>\n<p style=\"text-align: justify;\">SELECT<br \/>\ntimestamp,<br \/>\nresource.type,<br \/>\nresource.labels.datascan_id,<br \/>\nresource.labels.location,<br \/>\njsonpayload_v1_datascanevent.scope,<br \/>\njsonpayload_v1_datascanevent.type as type_scan_event,<br \/>\njsonpayload_v1_datascanevent.trigger,<br \/>\nSPLIT(jsonpayload_v1_datascanevent.datasource, &#039;\/&#039;)[offset(1)] datasource_project,<br \/>\nSPLIT(jsonpayload_v1_datascanevent.datasource, &#039;\/&#039;)[offset(3)] datasource_location,<br \/>\nSPLIT(jsonpayload_v1_datascanevent.datasource, &#039;\/&#039;)[offset(5)] datasource_lake,<br \/>\nSPLIT(jsonpayload_v1_datascanevent.datasource, &#039;\/&#039;)[offset(7)] datasource_zone,<br \/>\njsonpayload_v1_datascanevent.dataquality.dimensionpassed.uniqueness,<br \/>\njsonpayload_v1_datascanevent.dataquality.dimensionpassed.completeness,<br \/>\njsonpayload_v1_datascanevent.dataquality.dimensionpassed.validity,<br \/>\njsonpayload_v1_datascanevent.dataquality.rowcount,<br \/>\njsonpayload_v1_datascanevent.dataquality.passed<br \/>\nFROM `datalake-cver.Analytics_Data_Quality_cdv.dataplex_googleapis_com_data_scan` DATA_SCAN<\/p>\n<p style=\"text-align: justify;\"><span style=\"font-weight: 400;\">Creating this view enables separating data quality scan results by lake and zones.<\/span><\/p>\n<p style=\"text-align: justify;\"><span style=\"font-weight: 400;\">We then use Tableau to:\u00a0<\/span><\/p>\n<ul style=\"text-align: justify;\">\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Create a dashboard and\u00a0<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Send notifications by email for responsible users using alerts in Tableau<\/span><\/li>\n<\/ul>\n<p style=\"text-align: justify;\"><span style=\"font-weight: 400;\">Here is our Tableau dashboard:<\/span><\/p>\n<p style=\"text-align: justify;\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-14273 size-full\" src=\"https:\/\/gcloudvn.com\/wp-content\/uploads\/2023\/06\/dataplex7.png\" alt=\"Case Study: How Google Dataplex helps Casa Dos Ventos manage and ensure data quality 7\" width=\"600\" height=\"212\" srcset=\"https:\/\/gcloudvn.com\/wp-content\/uploads\/2023\/06\/dataplex7.png 600w, https:\/\/gcloudvn.com\/wp-content\/uploads\/2023\/06\/dataplex7-18x6.png 18w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><\/p>\n<h2 style=\"text-align: justify;\"><span class=\"ez-toc-section\" id=\"Y_dinh_tuong_lai\"><\/span><b>Looking ahead<\/b><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p style=\"text-align: justify;\"><span style=\"font-weight: 400;\">While we have achieved a much better governance posture, we also look forward to expanding our usage of Dataplex further. We are starting to use the Lineage feature for BigQuery tables and learning how to integrate Data Quality with Lineage. This will enable us to check the dashboard and views impacted by data quality issues easily. We are also planning to manage SQL scripts in our Github account.<\/span><\/p>\n<p style=\"text-align: justify;\">Cloud has been and is an inevitable trend in the technology development and optimization system of enterprises. Gimasys - Premier Partner of Google in Vietnam is the unit providing, consulting the structure, designing the optimal Cloud solution for you. For technical support, you can contact Gimasys - Premier Partner of Google in Vietnam at the following information:<\/p>\n<ul style=\"text-align: justify;\">\n<li aria-level=\"1\"><b>Hotline:\u00a0<\/b>0974 417 099 (HCM) | 0987 682 505 (HN)<\/li>\n<li aria-level=\"1\"><b>Email:\u00a0<\/b><a href=\"mailto:gcp@gimasys.com\" target=\"_blank\" rel=\"nofollow noopener\">gcp@gimasys.com<\/a><\/li>\n<\/ul>\n<p style=\"text-align: right;\"><b>Source:\u00a0<a href=\"https:\/\/gcloudvn.com\/en\/\">Gimasys<\/a><\/b><\/p>","protected":false},"excerpt":{"rendered":"<p>Brazil l\u00e0 m\u1ed9t trong nh\u1eefng th\u1ecb tr\u01b0\u1eddng n\u0103ng l\u01b0\u1ee3ng t\u00e1i t\u1ea1o h\u1ee9a h\u1eb9n nh\u1ea5t th\u1ebf gi\u1edbi, trong \u0111\u00f3 Casa Dos Ventos l\u00e0 nh\u00e0 \u0111\u1ea7u t\u01b0 v\u00e0 ti\u00ean phong h\u00e0ng \u0111\u1ea7u. V\u1edbi s\u1ef1 \u0111\u1ed5i m\u1edbi v\u00e0 \u0111\u1ea7u t\u01b0 c\u1ee7a m\u00ecnh, h\u1ecd&hellip;<\/p>","protected":false},"author":2,"featured_media":14279,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"footnotes":""},"categories":[1,135],"tags":[],"class_list":["post-14272","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-kienthuc","category-google-cloud-platform","entry","has-media"],"_links":{"self":[{"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/posts\/14272","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/comments?post=14272"}],"version-history":[{"count":0,"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/posts\/14272\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/media\/14279"}],"wp:attachment":[{"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/media?parent=14272"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/categories?post=14272"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/tags?post=14272"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}