{"id":3161,"date":"2021-06-21T12:09:47","date_gmt":"2021-06-21T05:09:47","guid":{"rendered":"http:\/\/gcloudvn.wam.vn\/?p=3161"},"modified":"2024-05-24T11:03:54","modified_gmt":"2024-05-24T04:03:54","slug":"traveloka-chuyen-sang-google-cloud-platform-de-phan-tich-big-data-manh-me","status":"publish","type":"post","link":"https:\/\/gcloudvn.com\/en\/cau-chuyen-thanh-cong\/traveloka-chuyen-sang-google-cloud-platform-de-phan-tich-big-data-manh-me\/","title":{"rendered":"Traveloka: Switch to Google Cloud Platform for Powerful Big Data Analytics"},"content":{"rendered":"<section class=\"wpb-content-wrapper\"><p style=\"text-align: justify;\"><div class=\"vc_row wpb_row vc_row-fluid\"><div class=\"wpb_column vc_column_container vc_col-sm-4 wpex-vc_col-has-fill\"><div class=\"vc_column-inner vc_custom_1624252324037\"><div class=\"wpb_wrapper\"><div class=\"vcex-heading vcex-module wpex-text-2xl wpex-font-normal wpex-m-auto wpex-max-w-100 vcex-heading-bottom-border-w-color wpex-block wpex-border-b-2 wpex-border-solid wpex-border-gray-200\" style=\"font-size:20px;font-weight:600;\"><span class=\"vcex-heading-inner wpex-inline-block wpex-clr wpex-relative wpex-pb-5 wpex-border-b-2 wpex-border-solid wpex-border-accent\">SUMMARY<\/span><\/div><div class=\"vcex-spacing wpex-w-100 wpex-clear\" style=\"height:30px\"><\/div>\n\t<div class=\"wpb_text_column wpb_content_element\" >\n\t\t<div class=\"wpb_wrapper\">\n\t\t\t<p><strong>Business problem:\u00a0<\/strong><\/p>\n<ul style=\"text-align: justify;\">\n<li>Debugging problems in Kafka clusters proved difficult and time consuming<\/li>\n<li>Adding more nodes to MongoDB required a lengthy rebalancing process \u2013 and the pool quickly ran out of disk space<\/li>\n<li>Enterprises can only store data for 14 days in MemSQL due to memory limitations, while queries sometimes return out of memory errors.<\/li>\n<\/ul>\n<p style=\"text-align: justify;\"><strong>Solution:<\/strong><\/p>\n<ul style=\"text-align: justify;\">\n<li><a href=\"https:\/\/gcloudvn.com\/en\/google-cloud-platform\/\">Google Cloud Platform<\/a><\/li>\n<li><a href=\"https:\/\/gcloudvn.com\/en\/bigquery\/\">BigQuery<\/a><\/li>\n<li>Cloud Pub\/Sub<\/li>\n<li>Cloud Dataflow<\/li>\n<li><a href=\"https:\/\/gcloudvn.com\/en\/google-kubernetes-engine-gke\/\">Kubernetes Engine<\/a><\/li>\n<li><a href=\"https:\/\/gcloudvn.com\/en\/cloud-storage\/\">Cloud Storage<\/a><\/li>\n<li>Cloud Composer<\/li>\n<li><a href=\"https:\/\/gcloudvn.com\/en\/cloud-sql\/\">Cloud SQL<\/a><\/li>\n<\/ul>\n<p style=\"text-align: justify;\"><strong>Result:<\/strong><\/p>\n<ul style=\"text-align: justify;\">\n<li>Engineers relax, take their time to bring value to the business<\/li>\n<li>Record more than 99.9% availability<\/li>\n<li>Warehouse 400TB (about 500 billion lines) of data<\/li>\n<\/ul>\n<p style=\"text-align: justify;\">\n\n\t\t<\/div>\n\t<\/div>\n<\/div><\/div><\/div><div class=\"wpb_column vc_column_container vc_col-sm-8\"><div class=\"vc_column-inner\"><div class=\"wpb_wrapper\">\n\t<div  class=\"wpb_single_image wpb_content_element vc_align_\">\n\t\t\n\t\t<figure class=\"wpb_wrapper vc_figure\">\n\t\t\t<div class=\"vc_single_image-wrapper   vc_box_border_grey\"><img loading=\"lazy\" decoding=\"async\" width=\"1000\" height=\"563\" src=\"https:\/\/gcloudvn.com\/wp-content\/uploads\/2021\/06\/featured-traveloka.jpg\" class=\"vc_single_image-img attachment-full\" alt=\"Traveloka: Switch to Google Cloud Platform for Powerful Big Data Analytics\" title=\"Traveloka: Switch to Google Cloud Platform for Powerful Big Data Analytics\" srcset=\"https:\/\/gcloudvn.com\/wp-content\/uploads\/2021\/06\/featured-traveloka.jpg 1000w, https:\/\/gcloudvn.com\/wp-content\/uploads\/2021\/06\/featured-traveloka-300x169.jpg 300w, https:\/\/gcloudvn.com\/wp-content\/uploads\/2021\/06\/featured-traveloka-768x432.jpg 768w, https:\/\/gcloudvn.com\/wp-content\/uploads\/2021\/06\/featured-traveloka-18x10.jpg 18w\" sizes=\"auto, (max-width: 1000px) 100vw, 1000px\" \/><\/div>\n\t\t<\/figure>\n\t<\/div>\n<div class=\"vcex-spacing wpex-w-100 wpex-clear\" style=\"height:30px\"><\/div>\n\t<div class=\"wpb_text_column wpb_content_element\" >\n\t\t<div class=\"wpb_wrapper\">\n\t\t\t<p style=\"text-align: justify;\">With Google Cloud technologies such as BigQuery, Traveloka has established a data architecture that meets all performance and availability requirements and enables businesses to gain meaningful insights and can act from large volumes of data. Collect and analyze data in real time for enterprise-wide decision making.<\/p>\n<p style=\"text-align: justify;\">Founded in 2012, Traveloka is a unicorn business that provides reservations for travel, dining and other options. The organization has grown to establish a presence in six ASEAN countries and employs more than 2,000 people, including 400 engineers. Traveloka aims to be a one-stop travel and lifestyle platform for Indonesians and is diversifying and personalizing its services. The business introduced features like car rental bookings and travel destination guides in 2018, and added a host of extras to existing services, such as status notifications flights for flight booking service. Notable last year was the launch of an online credit service.<\/p>\n<p style=\"text-align: justify;\">Traveloka relies on data analytics to provide tailored, personalized services to consumers. This poses a major challenge to the enterprise data analytics team. This team must support the growing business need for actionable insights by collecting data from multiple sources, choosing the right framework for data analysis, managing multiple use cases, and delivering data. Real-time data for stream analysis and reporting. At the same time, businesses must scale infrastructure while reducing costs.<\/p>\n<p style=\"text-align: justify;\">Analytics team activities should support business goals to increase agility and faster time to market for new features and apps. From a technology standpoint, this means speeding up development and delivery without compromising security.<\/p>\n<p style=\"text-align: justify;\"><strong>\u201cAs part of the Google Cloud Platform data transfer analytics solution, support from BigQuery for streaming data is a key advantage for us in supporting the real-time analytics use case.\u201d<\/strong><\/p>\n<p style=\"text-align: right;\"><strong>\u2014<i>Rendy Bambang, Data Engineering Lead, Traveloka\u2014<\/i><\/strong><\/p>\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_80 counter-hierarchy ez-toc-counter ez-toc-grey ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of contents<\/p>\n<span class=\"ez-toc-title-toggle\"><a href=\"#\" class=\"ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle\" aria-label=\"Toggle Table of Content\"><span class=\"ez-toc-js-icon-con\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #999;color:#999\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewbox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #999;color:#999\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewbox=\"0 0 24 24\" version=\"1.2\" baseprofile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/span><\/a><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/gcloudvn.com\/en\/cau-chuyen-thanh-cong\/traveloka-chuyen-sang-google-cloud-platform-de-phan-tich-big-data-manh-me\/#Phan_tich_du_lieu_khong_theo_kip_toc_do\" >Data analytics not keeping pace<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/gcloudvn.com\/en\/cau-chuyen-thanh-cong\/traveloka-chuyen-sang-google-cloud-platform-de-phan-tich-big-data-manh-me\/#Do_tre_thap_va_co_so_ha_tang_duoc_quan_ly_day_du\" >Low latency and fully managed infrastructure<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/gcloudvn.com\/en\/cau-chuyen-thanh-cong\/traveloka-chuyen-sang-google-cloud-platform-de-phan-tich-big-data-manh-me\/#Google_Cloud_Platform_lam_nen_tang\" >Google Cloud Platform as a platform<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/gcloudvn.com\/en\/cau-chuyen-thanh-cong\/traveloka-chuyen-sang-google-cloud-platform-de-phan-tich-big-data-manh-me\/#400TB_du_lieu_duoc_luu_tru_thanh_cong\" >400TB of data successfully stored<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/gcloudvn.com\/en\/cau-chuyen-thanh-cong\/traveloka-chuyen-sang-google-cloud-platform-de-phan-tich-big-data-manh-me\/#Phan_phat_du_lieu_chuan_hoa\" >Normalized data distribution<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/gcloudvn.com\/en\/cau-chuyen-thanh-cong\/traveloka-chuyen-sang-google-cloud-platform-de-phan-tich-big-data-manh-me\/#Cac_van_de_da_duoc_giai_quyet\" >Issues solved<\/a><\/li><\/ul><\/nav><\/div>\n<h2 style=\"text-align: justify;\"><span class=\"ez-toc-section\" id=\"Phan_tich_du_lieu_khong_theo_kip_toc_do\"><\/span><b>Data analytics not keeping pace<\/b><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p style=\"text-align: justify;\">However, as the business expands, Traveloka&#039;s current data analytics environment cannot keep up. This has impacted online data processing that supports a number of use cases \u2013 including fraud detection, personalization, ad optimization, side-selling, A\/B testing and calculation. ad-eligible \u2013 allows business analysts to track performance.<\/p>\n<p style=\"text-align: justify;\">To run the data analytics pipeline, Traveloka relied on an architecture that includes Apache Kafka for ingesting user events, fragmented MongoDB to provide an operational datastore that spans multiple machines, and fragmented MemSQL for Real-time analytics queries. Traveloka processed data from Kafka through a Java user and stored it with the user ID as the primary key in MongoDB. For analysis, Traveloka used event data from Kafka and stored it in MemSQL, which is accessible to business intelligence tools.<\/p>\n<p style=\"text-align: justify;\"><strong>\u201cCloud Pub\/Sub is particularly convenient for us because \u2013 unlike the previous architecture, which required capacity planning for the entry of events \u2013 we can rely on its automation to handle volume and throughput changes without having to do anything.\u201d<\/strong><\/p>\n<p style=\"text-align: right;\"><strong>\u2014<i>Rendy Bambang, Data Engineering Lead, Traveloka\u2014<\/i><\/strong><\/p>\n<h2 style=\"text-align: justify;\"><span class=\"ez-toc-section\" id=\"Do_tre_thap_va_co_so_ha_tang_duoc_quan_ly_day_du\"><\/span><b>Low latency and fully managed infrastructure<\/b><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p style=\"text-align: justify;\">The business decides to explore the market and establish an alternative service needed to provide:<\/p>\n<ul style=\"text-align: justify;\">\n<li>Low end-to-end data latency in guaranteed service level agreement<\/li>\n<li>Fully managed infrastructure to relax engineers, help solve business problems (and spend less time on maintenance and firefighting), including resiliency or availability use of the 99.9% end-to-end system and automatically scale up storage and computing<\/li>\n<\/ul>\n<p style=\"text-align: justify;\">These requirements are filtered into a necessity for a fully managed technology with low end-to-end latency, high performance and availability, and minimal operational demands.<\/p>\n<h2 style=\"text-align: justify;\"><span class=\"ez-toc-section\" id=\"Google_Cloud_Platform_lam_nen_tang\"><\/span><b>Google Cloud Platform as a platform<\/b><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p style=\"text-align: justify;\">Traveloka conducted an assessment and concluded that Google Cloud Platform provided the services and performance to act as the foundation of the data architecture.<\/p>\n<p style=\"text-align: justify;\">For its data pipeline project, Traveloka implemented a cross-cloud environment combining Cloud Pub\/Sub (<a href=\"https:\/\/cloud.google.com\/pubsub\/\" rel=\"nofollow noopener\" target=\"_blank\">https:\/\/cloud.google.com\/pubsub<\/a>) real-time messaging manager for event data ingestion, Cloud Dataflow (<a href=\"https:\/\/cloud.google.com\/dataflow\" rel=\"nofollow noopener\" target=\"_blank\">https:\/\/cloud.google.com\/dataflow<\/a>) for processing streamed data and BigQuery analytics data warehouses to store historical and actual data generated by customer operations, as well as processed data. Each Google Cloud Platform service has helped overcome previous pipeline bottlenecks.<\/p>\n<p style=\"text-align: justify;\">BigQuery Analytical Data Warehouse is key to the new architecture. Rendy Bambang, Data Engineering Lead, Traveloka said: \u201cAs part of the Google Cloud Platform data transfer analytics solution, support from BigQuery for streaming data is a key advantage for us in supporting real-time analytics use case\u201d. \u201cFurthermore, we no longer have to worry about storing historical data for only 14 days because BigQuery stores all that data for us, with computing resources that automatically scale as required. we need.&quot;<\/p>\n<p style=\"text-align: justify;\"><strong>\u201cCloud Dataflow&#039;s ability to create new pipelines and auto-scale without user intervention is a big plus for us, especially when we have to fill up a pipeline for processing. historical data management.\u201d<\/strong><\/p>\n<p style=\"text-align: right;\"><strong>\u2014<i>Rendy Bambang, Data Engineering Lead, Traveloka\u2014<\/i><\/strong><\/p>\n<p style=\"text-align: justify;\">\u201cCloud Pub\/Sub is particularly convenient for us because \u2013 unlike our previous architecture, which required capacity planning for event ingestion \u2013 we can rely on automated its partitioning to handle volume and throughput changes without any work,\u201d added Bambang. \u201cUltimately, Cloud Dataflow&#039;s ability to create new pipelines and automatically scale without user intervention was a big plus for us, especially when we had to fill a pipeline to historical data processing.\u201d<\/p>\n<p style=\"text-align: justify;\">Cloud Dataflow&#039;s Apache Beam-based unified programming model eases the transition between batch and stream data processing, while its windowing and triggering functions allow for easy processing of slow incoming data.<\/p>\n<h2 style=\"text-align: justify;\"><span class=\"ez-toc-section\" id=\"400TB_du_lieu_duoc_luu_tru_thanh_cong\"><\/span><b>400TB of data successfully stored<\/b><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p style=\"text-align: justify;\">The Google Cloud Platform infrastructure is now managing large volumes quickly and well across the organization with over 99.9% end-to-end availability. Over 4TB of data per day is moved into the Cloud Pub\/Sub, while BigQuery stores around 400TB (about 500 billion lines) of data. Approximately 250TB of data resides in Cloud Storage, while 60,000 pools of work are executed per day. Cloud Dataflow processes about 2,500 jobs per day, while about 1,500 charts using BigQuery are generated with business intelligence tools.<\/p>\n<p style=\"text-align: justify;\">BigQuery Warehouse is also integral to changes in how Traveloka gives its product teams access to data. Imre Nagi, Software Engineer, Data Team, Traveloka said: \u201cIn the past, when a product team requested data from our data warehouse, we simply gave them direct read access to the groups. or the board they need\u201d.<\/p>\n<p style=\"text-align: justify;\">However, this approach requires the customer system to be tightly coupled to the data storage technology and format, which means that any change to the technology or format requires updating. system. Furthermore, because access is at the group level, the data team cannot be sure that product groups are not accessing columns they are not authorized to. Finally, the data team finds it difficult to track and check what users are doing with the data.<\/p>\n<h2 style=\"text-align: justify;\"><span class=\"ez-toc-section\" id=\"Phan_phat_du_lieu_chuan_hoa\"><\/span><b>Normalized data distribution<\/b><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p style=\"text-align: justify;\">\u201cBased on issues across the enterprise, we decided to build a standardized way to serve our data, which would later become our data provisioning API,\u201d said Nagi.<\/p>\n<p style=\"text-align: justify;\">The API currently delivers millions of records totaling several gigabytes from the BigQuery repository to the on-demand production system. Cloud Composer (<a href=\"https:\/\/cloud.google.com\/composer\/\" rel=\"nofollow noopener\" target=\"_blank\">https:\/\/cloud.google.com\/composer<\/a>) schedule BigQuery queries to convert raw data into summary and re-edit versions to pass into intermediate and final tables that have been processed.<\/p>\n<p style=\"text-align: justify;\">Cloud Storage provides temporary storage for query results and handles sending results to clients, and Cloud SQL keeps track of associations, state, and other metadata, while APIs are stored in Kubernetes clusters powered by Google Kubernetes. Management engine. Kubernetes Clusters (<a href=\"https:\/\/cloud.google.com\/kubernetes\/\" rel=\"nofollow noopener\" target=\"_blank\">https:\/\/cloud.google.com\/kubernetes<\/a>) communicates with Cloud Storage and Cloud SQL to store the results and job metadata of queries made by the requester.<\/p>\n<h2 style=\"text-align: justify;\"><span class=\"ez-toc-section\" id=\"Cac_van_de_da_duoc_giai_quyet\"><\/span><b>Issues solved<\/b><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p style=\"text-align: justify;\">\u201cWith Google Cloud Platform technology, our new data provisioning API has successfully solved a number of issues during data delivery,\u201d said Nagi. \u201cWe now have a clear contract API that standardizes how product teams access our data warehouse.\u201d<\/p>\n<p style=\"text-align: justify;\">Using the API means that product teams no longer access the physical layer of Traveloka&#039;s infrastructure, improving the data team&#039;s ability to audit data usage. Teams can also define column-level access controls, ensuring product teams use only the columns they need. Additionally, the API provides a standard yet flexible definition that other groups can use to query data. \u201cWe can now restrict how product teams access our data, while still allowing a wide variety of queries,\u201d says Nagi. \u201cOverall, we now have the flexibility, along with the security and control we need.\u201d<\/p>\n<p style=\"text-align: justify;\">\n\n\t\t<\/div>\n\t<\/div>\n<\/div><\/div><\/div><\/div>\n<\/section>","protected":false},"excerpt":{"rendered":"T\u00d3M T\u1eaeT B\u00e0i to\u00e1n doanh nghi\u1ec7p:\u00a0 C\u00e1c v\u1ea5n \u0111\u1ec1 debug trong c\u1ee5m Kafka t\u1ecf ra kh\u00f3 kh\u0103n v\u00e0 t\u1ed1n th\u1eddi gian Vi\u1ec7c th\u00eam nhi\u1ec1u nodes h\u01a1n v\u00e0o MongoDB y\u00eau c\u1ea7u m\u1ed9t qu\u00e1 tr\u00ecnh t\u00e1i c\u00e2n b\u1eb1ng k\u00e9o d\u00e0i \u2013 v\u00e0&hellip;","protected":false},"author":1,"featured_media":3162,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"footnotes":""},"categories":[96],"tags":[],"class_list":["post-3161","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-cau-chuyen-thanh-cong","entry","has-media"],"_links":{"self":[{"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/posts\/3161","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/comments?post=3161"}],"version-history":[{"count":0,"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/posts\/3161\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/media\/3162"}],"wp:attachment":[{"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/media?parent=3161"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/categories?post=3161"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/tags?post=3161"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}