{"id":19528,"date":"2024-07-24T15:38:15","date_gmt":"2024-07-24T08:38:15","guid":{"rendered":"https:\/\/gcloudvn.com\/?p=19528"},"modified":"2024-07-31T09:57:09","modified_gmt":"2024-07-31T02:57:09","slug":"cach-lay-mau-du-lieu-product-tu-google-bigquery","status":"publish","type":"post","link":"https:\/\/gcloudvn.com\/en\/kienthuc\/cach-lay-mau-du-lieu-product-tu-google-bigquery\/","title":{"rendered":"Get your BigQuery production sample, all self-serving"},"content":{"rendered":"<section class=\"wpb-content-wrapper\"><div class=\"vc_row wpb_row vc_row-fluid\"><div class=\"wpb_column vc_column_container vc_col-sm-12\"><div class=\"vc_column-inner\"><div class=\"wpb_wrapper\">\n\t<div class=\"wpb_text_column wpb_content_element\" >\n\t\t<div class=\"wpb_wrapper\">\n\t\t\t<p><span style=\"font-weight: 400;\">A recap from <\/span><a href=\"https:\/\/cloud.google.com\/blog\/products\/data-analytics\/speed-up-your-data-science-with-bigquery-sampling-from-prod\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">part 1 about BigQuery autoreceive sample<\/span><\/a><span style=\"font-weight: 400;\"> Google is proposing a solution for the problem of getting fresh PROD samples from BigQuery. The solution also provides safety measures to avoid accidental data exfiltration, and at the same time, it\u2019s self-serving. You get a fresh sample every day. No more outdated schemas or stale samples.<\/span><\/p>\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_83 counter-hierarchy ez-toc-counter ez-toc-grey ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of contents<\/p>\n<span class=\"ez-toc-title-toggle\"><a href=\"#\" class=\"ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle\" aria-label=\"Toggle Table of Content\"><span class=\"ez-toc-js-icon-con\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #999;color:#999\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewbox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #999;color:#999\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewbox=\"0 0 24 24\" version=\"1.2\" baseprofile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/span><\/a><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/gcloudvn.com\/en\/kienthuc\/cach-lay-mau-du-lieu-product-tu-google-bigquery\/#Cach_thuc_hoat_dong_chi_tiet\" >How it works in detail\u00a0<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/gcloudvn.com\/en\/kienthuc\/cach-lay-mau-du-lieu-product-tu-google-bigquery\/#DevOps\" >DevOps\u00a0<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/gcloudvn.com\/en\/kienthuc\/cach-lay-mau-du-lieu-product-tu-google-bigquery\/#Chinh_sach\" >Policy<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/gcloudvn.com\/en\/kienthuc\/cach-lay-mau-du-lieu-product-tu-google-bigquery\/#Gia_dinh_su_dung_voi_vai_tro_la_nha_khoa_hoc_du_lieu\" >Assumed use as a data scientist\u00a0<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/gcloudvn.com\/en\/kienthuc\/cach-lay-mau-du-lieu-product-tu-google-bigquery\/#Gioi_han_khong_phai_la_kich_thuoc\" >The limit is not the size\u00a0<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/gcloudvn.com\/en\/kienthuc\/cach-lay-mau-du-lieu-product-tu-google-bigquery\/#Cach_hoat_dong_cua_gioi_han_va_kich_thuoc_trong_BigQuery\" >How limits and dimensions work in BigQuery<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/gcloudvn.com\/en\/kienthuc\/cach-lay-mau-du-lieu-product-tu-google-bigquery\/#Chu_ky_lay_mau\" >Sampling cycle\u00a0<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-8\" href=\"https:\/\/gcloudvn.com\/en\/kienthuc\/cach-lay-mau-du-lieu-product-tu-google-bigquery\/#Gioi_han\" >Limitations<\/a><ul class='ez-toc-list-level-4' ><li class='ez-toc-heading-level-4'><a class=\"ez-toc-link ez-toc-heading-9\" href=\"https:\/\/gcloudvn.com\/en\/kienthuc\/cach-lay-mau-du-lieu-product-tu-google-bigquery\/#JOINs_va_WHEREs\" >JOINs and WHEREs<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-10\" href=\"https:\/\/gcloudvn.com\/en\/kienthuc\/cach-lay-mau-du-lieu-product-tu-google-bigquery\/#Lam_mo_du_lieu\" >Obfuscate data<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-11\" href=\"https:\/\/gcloudvn.com\/en\/kienthuc\/cach-lay-mau-du-lieu-product-tu-google-bigquery\/#Co_xay_ra_viec_dong_nhat_khi_thuc_hien_khong\" >Does uniformity occur during implementation?<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-12\" href=\"https:\/\/gcloudvn.com\/en\/kienthuc\/cach-lay-mau-du-lieu-product-tu-google-bigquery\/#Luu_y_ve_cac_cau_lenh_TABLESAMPLE\" >Note about TABLESAMPLE statements<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-13\" href=\"https:\/\/gcloudvn.com\/en\/kienthuc\/cach-lay-mau-du-lieu-product-tu-google-bigquery\/#Truong_hop_nguoi_dung_can_nhieu_hon_cac_bang_phan_phoi_khac\" >In case users need more than other distribution boards<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-14\" href=\"https:\/\/gcloudvn.com\/en\/kienthuc\/cach-lay-mau-du-lieu-product-tu-google-bigquery\/#Views_giai_phap_thay_the\" >Views: the workaround<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-15\" href=\"https:\/\/gcloudvn.com\/en\/kienthuc\/cach-lay-mau-du-lieu-product-tu-google-bigquery\/#Lieu_viec_nay_co_gian_lan_hay_khong\" >Li\u1ec7u vi\u1ec7c n\u00e0y c\u00f3 gian l\u1eadn hay kh\u00f4ng?<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-16\" href=\"https:\/\/gcloudvn.com\/en\/kienthuc\/cach-lay-mau-du-lieu-product-tu-google-bigquery\/#Thiet_ke_giai_phap\" >Solution design<\/a><\/li><\/ul><\/li><\/ul><\/nav><\/div>\n<h2><span class=\"ez-toc-section\" id=\"Cach_thuc_hoat_dong_chi_tiet\"><\/span><b>How it works in detail\u00a0<\/b><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">Wondering if this method will work for you, and if this solution is in line with your organization's security policy? <\/span><span style=\"font-weight: 400;\">Google's assumptions are that DevOps is not interested in preparing samples themselves and that it is better to let data scientists serve themselves. First of all because it is not the responsibility of DevOps to reason about data, whereas it is where data scientists are the subject matter experts.\u00a0<\/span><\/p>\n<h2><span class=\"ez-toc-section\" id=\"DevOps\"><\/span><b>DevOps\u00a0<\/b><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">In this case, Google assumes you only want to evaluate once whether you have data access to a particular table in production. Google also assumes that you don't want to manually mediate each sample request. This means you can encode your review in a simple JSON file that Google calls a policy.\u00a0<\/span><\/p>\n<h2><span class=\"ez-toc-section\" id=\"Chinh_sach\"><\/span><b>Policy<\/b><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">In the following JSON example, there are two parts, limit and default_sample:\u00a0<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">limit: Determines the maximum amount of data you can retrieve from a table. You can specify an amount, a percentage, or both. In case you specify both, the percentage will be converted to quantity and the minimum amount between the percentage (converted to quantity) and quantity will be used.\u00a0<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">default_sample: Used in case the request does not exist or has an \u201cerror\u201d such as a non-JSON file or an empty file.\u00a0<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">Eg:<\/span><\/p>\n<p><span style=\"font-weight: 400;\">{<\/span><\/p>\n<p><span style=\"font-weight: 400;\">&#8220;limit&#8221;: {<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0\u00a0&#8220;count&#8221;: 300000,<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0\u00a0&#8220;percentage&#8221;: 40.1<\/span><\/p>\n<p><span style=\"font-weight: 400;\">},<\/span><\/p>\n<p><span style=\"font-weight: 400;\">&#8220;default_sample&#8221;: {<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0\u00a0&#8220;size&#8221;: {<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0\u00a0\u00a0\u00a0&#8220;count&#8221;: 9,<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0\u00a0\u00a0\u00a0&#8220;percentage&#8221;: 6.5<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0\u00a0},<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0\u00a0&#8220;spec&#8221;: {<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0\u00a0\u00a0\u00a0&#8220;type&#8221;: &#8220;random&#8221;<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0\u00a0}<\/span><\/p>\n<p><span style=\"font-weight: 400;\">}<\/span><\/p>\n<p><span style=\"font-weight: 400;\">}<\/span><\/p>\n<h2><span class=\"ez-toc-section\" id=\"Gia_dinh_su_dung_voi_vai_tro_la_nha_khoa_hoc_du_lieu\"><\/span><b>Assumed use as a data scientist\u00a0<\/b><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">Google assumes that you are a data scientist who wants to determine whether you have access to production data. Once you have access, you will request different samples whenever you need them. When you wake up the next day, your samples will be ready for you. Take a look at the following request formats:<\/span><\/p>\n<p><span style=\"font-weight: 400;\">A request has the same structure as an item <span style=\"text-decoration: underline;\"><strong>default_sample<\/strong><\/span> in policy, including:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">size: Specify the amount of table data you want. You can specify an amount, a percentage, or both. In case you specify both, the larger value between quantity and percentage (converted to quantity) will be used as the actual value.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">spec: Specifies how to sample production data by providing the following information:<\/span>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"2\"><span style=\"font-weight: 400;\">type: Can be sorted or random.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"2\"><span style=\"font-weight: 400;\">properties: If sorted, specify which column to use for sorting and the sorting direction:<\/span>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"3\"><span style=\"font-weight: 400;\">by: Column name.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"3\"><span style=\"font-weight: 400;\">direction: Sorting direction (can be ASC or DESC).<\/span><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">Eg:<\/span><\/p>\n<p><span style=\"font-weight: 400;\">{<\/span><\/p>\n<p><span style=\"font-weight: 400;\">&#8220;__doc__&#8221;: &#8220;Full sample request&#8221;,<\/span><\/p>\n<p><span style=\"font-weight: 400;\">&#8220;__table_source__&#8221;: &#8220;bigquery-public-data.new_york_taxi_trips.tlc_green_trips_2015&#8221;,<\/span><\/p>\n<p><span style=\"font-weight: 400;\">&#8220;size&#8221;: {<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0\u00a0&#8220;count&#8221;: 3000,<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0\u00a0&#8220;percentage&#8221;: 11.7<\/span><\/p>\n<p><span style=\"font-weight: 400;\">},<\/span><\/p>\n<p><span style=\"font-weight: 400;\">&#8220;spec&#8221;: {<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0\u00a0&#8220;type&#8221;: &#8220;sorted&#8221;,<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0\u00a0&#8220;properties&#8221;: {<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0\u00a0\u00a0\u00a0&#8220;by&#8221;: &#8220;dropoff_datetime&#8221;,<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0\u00a0\u00a0\u00a0&#8220;direction&#8221;: &#8220;DESC&#8221;<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0\u00a0}<\/span><\/p>\n<p><span style=\"font-weight: 400;\">}<\/span><\/p>\n<p><span style=\"font-weight: 400;\">}<\/span><\/p>\n<p><span style=\"font-weight: 400;\">To be more specific about this content, Google will give an example. This will help you understand the limits and sizes when deploying on BigQuery.<\/span><\/p>\n<h2><span class=\"ez-toc-section\" id=\"Gioi_han_khong_phai_la_kich_thuoc\"><\/span><b>The limit is not the size<\/b><span style=\"font-weight: 400;\">\u00a0<\/span><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">There is a small but important difference in how limit differs from size. In the policy, you have a limit, using the smallest value between the amount and the percentage. Limits are used to limit the amount of data provided. Size used for default requests and sampling. It uses the largest value between the number and the percentage, the size cannot exceed the limit.<\/span><\/p>\n<h2><span class=\"ez-toc-section\" id=\"Cach_hoat_dong_cua_gioi_han_va_kich_thuoc_trong_BigQuery\"><\/span><b>How limits and dimensions work in BigQuery<\/b><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">The table in this scenario has 50,000 rows.<\/span><\/p>\n<table>\n<tbody>\n<tr>\n<td><b>Field<\/b><\/td>\n<td><b>Where<\/b><\/td>\n<td><b>count<\/b><\/td>\n<td><b>percentage<\/b><\/td>\n<\/tr>\n<tr>\n<td><b>limit<\/b><\/td>\n<td><b>Policy<\/b><\/td>\n<td><b>30,000<\/b><\/td>\n<td><b>10<\/b><\/td>\n<\/tr>\n<tr>\n<td><b>size<\/b><\/td>\n<td><b>Request<\/b><\/td>\n<td><b>10,000<\/b><\/td>\n<td><b>40<\/b><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>It is then converted to:<\/p>\n<table>\n<tbody>\n<tr>\n<td><b>Field<\/b><\/td>\n<td><b>Where<\/b><\/td>\n<td><b>count<\/b><\/td>\n<td><b>percentage<\/b><\/td>\n<td><b>% in row count<\/b><\/td>\n<td><b>Final value<\/b><\/td>\n<td><b>Semantic<\/b><\/td>\n<\/tr>\n<tr>\n<td><b>limit<\/b><\/td>\n<td><b>Policy<\/b><\/td>\n<td><b>30,000<\/b><\/td>\n<td><b>10<\/b><\/td>\n<td><b>5,000<\/b><\/td>\n<td><b>5,000<\/b><\/td>\n<td><b>min(30000, 5000)<\/b><\/td>\n<\/tr>\n<tr>\n<td><b>size<\/b><\/td>\n<td><b>Request<\/b><\/td>\n<td><b>10,000<\/b><\/td>\n<td><b>40<\/b><\/td>\n<td><b>20,000<\/b><\/td>\n<td><b>20,000\u00a0<\/b><\/td>\n<td><b>max(10000, 20000)<\/b><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><span style=\"font-weight: 400;\">In this case, the sample size is limited to 5,000 rows, or 10% of 50,000 rows.\u00a0<\/span><\/p>\n<h2><span class=\"ez-toc-section\" id=\"Chu_ky_lay_mau\"><\/span><b>Sampling cycle\u00a0<\/b><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">In the Figure below, you have the flow of data sampling bypassing the infrastructure:<\/span><\/p>\n<p>&nbsp;<\/p>\n<figure id=\"attachment_19535\" aria-describedby=\"caption-attachment-19535\" style=\"width: 644px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/gcloudvn.com\/en\/kienthuc\/cach-lay-mau-du-lieu-product-tu-google-bigquery\/attachment\/screenshot-2024-07-24-at-14-39-23\/\" rel=\"attachment wp-att-19535\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-19535\" src=\"https:\/\/gcloudvn.com\/wp-content\/uploads\/2024\/07\/Screenshot-2024-07-24-at-14.39.23.png\" alt=\"Figure BigQuery Sampling Stream\" width=\"644\" height=\"1028\" srcset=\"https:\/\/gcloudvn.com\/wp-content\/uploads\/2024\/07\/Screenshot-2024-07-24-at-14.39.23.png 644w, https:\/\/gcloudvn.com\/wp-content\/uploads\/2024\/07\/Screenshot-2024-07-24-at-14.39.23-8x12.png 8w\" sizes=\"auto, (max-width: 644px) 100vw, 644px\" \/><\/a><figcaption id=\"caption-attachment-19535\" class=\"wp-caption-text\"><em>Figure BigQuery Sampling Stream<\/em><\/figcaption><\/figure>\n<p><span style=\"font-weight: 400;\">H\u00ecnh tr\u00ean c\u00f3 th\u1ec3 tr\u00f4ng c\u00f3 v\u1ebb ph\u1ee9c t\u1ea1p v\u00e0 ng\u01b0\u1eddi d\u00f9ng c\u1ea7n \u0111\u1ea3m b\u1ea3o r\u1eb1ng:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Sample inflation should not occur, i.e. your sample should not increase with each sampling cycle. This means that policies must be followed.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">You are responsible for invalid requests.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Keep schemas in sync with production.\u00a0<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">In detail, the sampler has the following flow:<\/span><\/p>\n<ol>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Cloud Scheduler places a START message into PubSub's COMMAND topic. It tells the sampler function to start sampling.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">The sampler function will do the following:<\/span>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"2\"><span style=\"font-weight: 400;\">Delete all previous samples in the Data Science environment.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"2\"><span style=\"font-weight: 400;\">Lists all available policies in the policy bucket.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"2\"><span style=\"font-weight: 400;\">For each table it finds, sends a SAMPLE_START command with the corresponding policy.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"2\"><span style=\"font-weight: 400;\">For each SAMPLE_START command, check to see if a corresponding request file exists. They are in the request bucket.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"2\"><span style=\"font-weight: 400;\">Requests are checked according to policy.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"2\"><span style=\"font-weight: 400;\">A compliance sampling request is sent to the BigQuery source. It is inserted into the corresponding table in the Data Science Environment.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"2\"><span style=\"font-weight: 400;\">Every error the sampler function finds, it reports to PubSub's ERROR topic.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"2\"><span style=\"font-weight: 400;\">The error handling function is triggered by any message in this topic. It sends email notification of errors.<\/span><\/li>\n<\/ul>\n<\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Assume that the sampler function is not executed within 24 hours. It then triggers an alert to be sent to PubSub's ERROR topic.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">If there is a \u201cfatal\u201d error in either the sampling or the error handling function, it sends an email alert.<\/span><\/li>\n<\/ol>\n<h3><span class=\"ez-toc-section\" id=\"Gioi_han\"><\/span><b>Limitations<\/b><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><span style=\"font-weight: 400;\">Google will cover each point in detail in the following sections. For reference, here's a short list of things Google doesn't support:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">JOINs of any kind<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">WHERE clauses<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Auto-obfuscation (the data is auto-anonymized before inserting the sample)<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Column exclusion<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Row exclusion<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Correct uniform sampling distribution<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Non-uniform data sampling distributions (such as Gaussian, PoGoogler, and Pareto)<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">Google will provide \u201cexplanations\u201d for \u201cNOs\u201d \u2013 most of which fall into one of the following categories:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">It is too complex and time consuming to implement.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">You can use views.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">It would be too expensive for you to have it.<\/span><\/li>\n<\/ul>\n<h4><span class=\"ez-toc-section\" id=\"JOINs_va_WHEREs\"><\/span><span style=\"font-weight: 400;\">JOINs and WHEREs<\/span><span class=\"ez-toc-section-end\"><\/span><\/h4>\n<p><span style=\"font-weight: 400;\">The problem with JOINs and WHEREs is that they are too complex to implement to enforce a sampling policy. Here is a simple example:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Table TYPE_OF_AIRCRAFT, which is a simple ID for a specific aircraft, for example, Airbus A320 neo has ID ABC123.\n100% of the data is sampled, that is you can copy the table.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Table FLIGHT_LEG, which is a single flight on a specific day, for example, London Heathrow to Berlin at 14:50 Sunday.\nTen percent is sampled.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Table PASSENGER_FLIGHT_LEG provides which passenger is sitting where in a particular FLIGHT_LEG.\nOnly 10 rows are allowed.<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">You can now construct a query that joins all of these tables together. You can ask all passengers flying in a particular aircraft type on a particular day. In this case, to honor the policies, Google have to do the following:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Execute the query.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Verify how much data from each particular table is being pulled through it.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Start capping based on the \"allowances\".<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">This process will be:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Hard to implement without a SQL AST.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Probably very expensive for you. Therefore, Google will execute and then \"trim\" (you are paying for the full query).<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Can have many edge cases that violate the policies.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Risk of data exfiltration.<\/span><\/li>\n<\/ul>\n<h3><span class=\"ez-toc-section\" id=\"Lam_mo_du_lieu\"><\/span><b>Obfuscate data<\/b><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><span style=\"font-weight: 400;\">Google hi\u1ec3u r\u1eb1ng nhi\u1ec1u ng\u01b0\u1eddi d\u00f9ng c\u1ea7n l\u00e0m m\u1edd d\u1eef li\u1ec7u v\u00ec t\u00ednh ch\u1ea5t b\u1ea3o m\u1eadt. Vi\u1ec7c n\u00e0y \u0111\u01b0\u1ee3c gi\u1ea3i quy\u1ebft b\u1edfi Cloud DLP v\u00e0 c\u0169ng c\u00f3 nhi\u1ec1u gi\u1ea3i ph\u00e1p c\u00f3 kh\u1ea3 n\u0103ng h\u01a1n tr\u00ean th\u1ecb tr\u01b0\u1eddng m\u00e0 b\u1ea1n c\u00f3 th\u1ec3 s\u1eed d\u1ee5ng. Xem b\u00e0i vi\u1ebft tr\u00ean blog: <\/span><a href=\"https:\/\/cloud.google.com\/blog\/products\/identity-security\/taking-charge-of-your-data-using-cloud-dlp-to-de-identify-and-obfuscate-sensitive-information\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">Qu\u1ea3n l\u00fd d\u1eef li\u1ec7u c\u1ee7a b\u1ea1n: s\u1eed d\u1ee5ng Cloud DLP \u0111\u1ec3 nh\u1eadn di\u1ec7n v\u00e0 l\u00e0m m\u1edd th\u00f4ng tin nh\u1ea1y c\u1ea3m.<\/span><\/a><\/p>\n<p><b>Column &amp; row exclusion operations<\/b><\/p>\n<p><span style=\"font-weight: 400;\">Google agrees that excluding columns and rows is simple, and it's even easier and safer to use views or Cloud DLP. The reason Google didn't do this here is because it's a difficult use case to create a generic specification that would work for all use cases. Additionally, there are much better approaches like Cloud DLP. It all depends on why you want to remove the column or row.<\/span><\/p>\n<h3><span class=\"ez-toc-section\" id=\"Co_xay_ra_viec_dong_nhat_khi_thuc_hien_khong\"><\/span><b>Does uniformity occur during implementation?<\/b><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><span style=\"font-weight: 400;\">Except for views, Google relies on TABLESAMPLE statements for cost reasons. A truly random sample means using the ORDER BY RAND() strategy, which requires a full table scan. With TABLESAMPLE statements, you only pay a little more for the amount of data you want.\u00a0<\/span><\/p>\n<h3><span class=\"ez-toc-section\" id=\"Luu_y_ve_cac_cau_lenh_TABLESAMPLE\"><\/span><b>Note about TABLESAMPLE statements<\/b><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><span style=\"font-weight: 400;\">This technique allows Google to sample a table without reading the entire thing. But there is a big NOTE when using TABLESAMPLE. It is neither truly random nor uniform. Your sample will have bias in your table blocks. Here's how it works, according to <\/span><a href=\"https:\/\/cloud.google.com\/bigquery\/docs\/table-sampling\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">this document<\/span><\/a><\/p>\n<p><em><span style=\"font-weight: 400;\">The following example reads approximately 20% of the data blocks from storage and then randomly selects 10% of the rows in those blocks:<\/span><\/em><\/p>\n<p><span style=\"font-weight: 400;\">SELECT * FROM dataset.my_table TABLESAMPLE SYSTEM (20 PERCENT)<\/span><\/p>\n<p><span style=\"font-weight: 400;\">WHERE rand() &lt; 0.1<\/span><\/p>\n<p><span style=\"font-weight: 400;\">An example is always easier. Let us build one with a lot of skewness to show what TABLESAMPLE does. Imagine that your table has a single integer column. Now picture that your blocks have the following characteristics:<\/span><\/p>\n<table>\n<tbody>\n<tr>\n<td><b>Block ID<\/b><\/td>\n<td><b>Average<\/b><\/td>\n<td><b>Distribution<\/b><\/td>\n<td><b>Description<\/b><\/td>\n<\/tr>\n<tr>\n<td><b>1<\/b><\/td>\n<td><b>10<\/b><\/td>\n<td><b>Single value<\/b><\/td>\n<td><b>All values are 10<\/b><\/td>\n<\/tr>\n<tr>\n<td><b>2<\/b><\/td>\n<td><b>9<\/b><\/td>\n<td><b>Single value<\/b><\/td>\n<td><b>All values are 9<\/b><\/td>\n<\/tr>\n<tr>\n<td><b>3<\/b><\/td>\n<td><b>5<\/b><\/td>\n<td><b>Uniform from 0 to 10<\/b><\/td>\n<td><b>\u00a0<\/b><\/td>\n<\/tr>\n<tr>\n<td><b>4<\/b><\/td>\n<td><b>4<\/b><\/td>\n<td><b>Uniform from -1 to 9<\/b><\/td>\n<td><b>\u00a0<\/b><\/td>\n<\/tr>\n<tr>\n<td><b>5<\/b><\/td>\n<td><b>0<\/b><\/td>\n<td><b>Uniform from -5 to 5<\/b><\/td>\n<td><b>\u00a0<\/b><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><span style=\"font-weight: 400;\">At this point, we are  interested in looking at what happens to the average of your sample when using TABLESAMPLE. For simplicity, assume:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Each block has 1,000 records. This puts the actual average of all values in the table to around 5.6.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">You chose a 40% sample.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">TABLESAMPLE will sample 40% of the blocks and you will get two blocks. Let us look at your average. Let us assume that blocks with Block ID 1 and 2 Googlere selected. This means that your sample average is now 9.5. Even if you use the downsampling that is suggested in the documentation, you will still end up with a biased sample. Simply put, if your blocks have bias, your sample has it too.<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">Again, removing the potential bias means increasing the sampling costs to a full table scan.<\/span><\/p>\n<h3><span class=\"ez-toc-section\" id=\"Truong_hop_nguoi_dung_can_nhieu_hon_cac_bang_phan_phoi_khac\"><\/span><b>In case users need more than other distribution boards<\/b><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><span style=\"font-weight: 400;\">There are several reasons why not. The main reason is that other distributions aren't supported by the SQL engine. There is no workaround for the missing feature. The only way to have it is to implement it. Here is where things get complicated. Fair warning, if your stats are rusty, it is going to be rough.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">All the statements below are based on the following Googleird property of the cumulative distribution function (CDF):<\/span><\/p>\n<p><span style=\"font-weight: 400;\">For it to work, you will need to do the following:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Get all data on the target column (which is being the target of the distribution).<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Compute the column's CDF.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Randomly\/uniformly sample the CDF.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Translate the above to a row number\/ID.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Put the rows in the sample.<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">This process can be done, but has some implications, such as the following:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">You will need a full table scan.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">You will have to have a \"beefier\" instance to hold all of the data (think billions of rows), and you will have to compute the CDF.<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">This means that you will be paying for the following:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">The already expensive query (full table scan).<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Time on the expensive instance to compute the sample.<\/span><\/li>\n<\/ul>\n<h3><span class=\"ez-toc-section\" id=\"Views_giai_phap_thay_the\"><\/span><b>Views: the workaround<\/b><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><span style=\"font-weight: 400;\">Google h\u1ed7 tr\u1ee3 l\u1ea5y m\u1eabu t\u1eeb c\u00e1c views. \u0110i\u1ec1u n\u00e0y c\u00f3 ngh\u0129a l\u00e0 b\u1ea1n lu\u00f4n c\u00f3 th\u1ec3 \u0111\u00f3ng g\u00f3i \u0111i\u1ec1u n\u00e0y v\u00e0 \u0111\u1ec3 tr\u00ecnh l\u1ea5y m\u1eabu th\u1ef1c hi\u1ec7n c\u00f4ng vi\u1ec7c c\u1ee7a n\u00f3. Nh\u01b0ng c\u00e1c views kh\u00f4ng h\u1ed7 tr\u1ee3 c\u00e2u l\u1ec7nh TABLESAMPLE c\u1ee7a BigQuery. \u0110i\u1ec1u n\u00e0y c\u00f3 ngh\u0129a l\u00e0 c\u00e1c m\u1eabu ng\u1eabu nhi\u00ean c\u1ea7n qu\u00e9t to\u00e0n b\u1ed9 b\u1ea3ng b\u1eb1ng chi\u1ebfn l\u01b0\u1ee3c ORDER BY RAND(). Qu\u00e9t to\u00e0n b\u1ed9 b\u1ea3ng kh\u00f4ng x\u1ea3y ra tr\u00ean c\u00e1c m\u1eabu kh\u00f4ng ng\u1eabu nhi\u00ean.<\/span><\/p>\n<h3><span class=\"ez-toc-section\" id=\"Lieu_viec_nay_co_gian_lan_hay_khong\"><\/span><b>Li\u1ec7u vi\u1ec7c n\u00e0y c\u00f3 gian l\u1eadn hay kh\u00f4ng?<\/b><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><span style=\"font-weight: 400;\">It is precisely this practice that Google admits to \u201ccheating\u201d with the alternative of using views pushing responsibility onto SecOps and DataOps, who will need to define compliant views and sampling policies. Also, it can be expensive, since querying the view is like executing the basic query and sampling it. Be especially careful with random samples from views due to their full table scan nature on views.<\/span><\/p>\n<h3><span class=\"ez-toc-section\" id=\"Thiet_ke_giai_phap\"><\/span><b>Solution design<\/b><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><span style=\"font-weight: 400;\">Google settled around a very simple solution that has the following components:<\/span><\/p>\n<ul>\n<li aria-level=\"1\"><a href=\"https:\/\/cloud.google.com\/bigquery\" target=\"_blank\" rel=\"noopener\"><b>BigQuery<\/b><\/a><b>: The source and destination of data.<\/b><\/li>\n<\/ul>\n<ul>\n<li aria-level=\"1\"><a href=\"https:\/\/cloud.google.com\/scheduler\" target=\"_blank\" rel=\"noopener\"><b>Cloud Scheduler<\/b><\/a><b>: Our crontab to trigger the sampling on a daily or regular basis.<\/b><\/li>\n<\/ul>\n<ul>\n<li aria-level=\"1\"><a href=\"https:\/\/cloud.google.com\/pubsub\" target=\"_blank\" rel=\"noopener\"><b>Cloud Pub\/Sub<\/b><\/a><b>: Coordinates the triggering, errors, and sampling steps.<\/b><\/li>\n<\/ul>\n<ul>\n<li aria-level=\"1\"><a href=\"https:\/\/cloud.google.com\/storage\" target=\"_blank\" rel=\"noopener\"><b>Cloud Storage<\/b><\/a><b>: Stores the policies and requests (two different buckets).<\/b><\/li>\n<\/ul>\n<ul>\n<li aria-level=\"1\"><a href=\"https:\/\/cloud.google.com\/functions\" target=\"_blank\" rel=\"noopener\"><b>Cloud Functions<\/b><\/a><b>: Our workhorse for logic.<\/b><\/li>\n<\/ul>\n<ul>\n<li aria-level=\"1\"><a href=\"https:\/\/cloud.google.com\/secret-manager\" target=\"_blank\" rel=\"noopener\"><b>Secret Manager<\/b><\/a><b>: Keeps sensitive information.<\/b><\/li>\n<\/ul>\n<ul>\n<li aria-level=\"1\"><a href=\"https:\/\/cloud.google.com\/monitoring\" target=\"_blank\" rel=\"noopener\"><b>Cloud Monitoring<\/b><\/a><b>: Monitors the health of the system.<\/b><\/li>\n<\/ul>\n<p style=\"text-align: right;\"><strong>Source: Gimasys<\/strong><\/p>\n\n\t\t<\/div>\n\t<\/div>\n<\/div><\/div><\/div><\/div><div class=\"vc_row wpb_row vc_row-fluid\"><div class=\"wpb_column vc_column_container vc_col-sm-12\"><div class=\"vc_column-inner\"><div class=\"wpb_wrapper\"><div class=\"templatera_shortcode\"><div class=\"vc_row wpb_row vc_row-fluid\"><div class=\"wpb_column vc_column_container vc_col-sm-12\"><div class=\"vc_column-inner\"><div class=\"wpb_wrapper\"><div class=\"vc_message_box vc_message_box-standard vc_message_box-rounded vc_color-blue\" ><div class=\"vc_message_box-icon\"><i class=\"vc-mono vc-mono-technorati\"><\/i><\/div><p><a href=\"https:\/\/gcloudvn.com\/en\/main-logo-1\/\"><img loading=\"lazy\" decoding=\"async\" class=\"alignleft wp-image-664\" src=\"https:\/\/gcloudvn.com\/wp-content\/uploads\/2021\/06\/main-logo-1.png\" alt=\"\" width=\"221\" height=\"72\" srcset=\"https:\/\/gcloudvn.com\/wp-content\/uploads\/2021\/06\/main-logo-1.png 214w, https:\/\/gcloudvn.com\/wp-content\/uploads\/2021\/06\/main-logo-1-18x6.png 18w, https:\/\/gcloudvn.com\/wp-content\/uploads\/2021\/06\/main-logo-1-183x60.png 183w\" sizes=\"auto, (max-width: 221px) 100vw, 221px\" \/><\/a>As a senior partner of Google in Vietnam, Gimasys has more than 10+ years of experience, consulting on implementing digital transformation for 2000+ domestic corporations. Some typical customers Jetstar, Dien Quan Media, Heineken, Jollibee, Vietnam Airline, HSC, SSI...<\/p>\n<p>Gimasys is currently a strategic partner of many major technology companies in the world such as Salesforce, Oracle Netsuite, Tableau, Mulesoft.<\/p>\n<p>Contact Gimasys - Google Cloud Premier Partner for advice on strategic solutions suitable to the specific needs of your business:<\/p>\n<ul>\n<li>Email: gcp@gimasys.com<\/li>\n<li>Hotline: 0974 417 099<\/li>\n<\/ul>\n<\/div><\/div><\/div><\/div><\/div><\/div><\/div><\/div><\/div><\/div>\n<\/section>","protected":false},"excerpt":{"rendered":"T\u00f3m t\u1eaft t\u1eeb ph\u1ea7n 1 v\u1ec1 C\u00e1ch t\u1ef1 nh\u1eadn m\u1eabu BigQuerry , Google \u0111ang \u0111\u1ec1 xu\u1ea5t m\u1ed9t gi\u1ea3i ph\u00e1p cho v\u1ea5n \u0111\u1ec1 l\u1ea5y m\u1eabu PROD m\u1edbi t\u1eeb BigQuery. Gi\u1ea3i ph\u00e1p n\u00e0y c\u0169ng cung c\u1ea5p c\u00e1c bi\u1ec7n ph\u00e1p an to\u00e0n \u0111\u1ec3&hellip;","protected":false},"author":2,"featured_media":19550,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"footnotes":""},"categories":[135],"tags":[],"class_list":["post-19528","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-google-cloud-platform","entry","has-media"],"_links":{"self":[{"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/posts\/19528","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/comments?post=19528"}],"version-history":[{"count":0,"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/posts\/19528\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/media\/19550"}],"wp:attachment":[{"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/media?parent=19528"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/categories?post=19528"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/tags?post=19528"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}