{"id":9192,"date":"2022-07-06T11:07:14","date_gmt":"2022-07-06T04:07:14","guid":{"rendered":"https:\/\/gcloudvn.com\/?p=9192"},"modified":"2023-03-22T16:48:24","modified_gmt":"2023-03-22T09:48:24","slug":"gioi-thieu-firehose-cong-cu-ma-nguon-mo-cua-gojek","status":"publish","type":"post","link":"https:\/\/gcloudvn.com\/en\/kienthuc\/gioi-thieu-firehose-cong-cu-ma-nguon-mo-cua-gojek\/","title":{"rendered":"Introducing Firehose: An open source tool from Gojek for seamless data ingestion to BigQuery and Cloud Storage"},"content":{"rendered":"<p style=\"text-align: justify;\">Indonesia\u2019s largest hyperlocal company, Gojek has evolved from a motorcycle ride-hailing service into an on-demand mobile platform, providing a range of services that include transportation, logistics, food delivery, and payments. A total of 2 million driver-partners collectively cover an average distance of 16.5 million kilometers each day, making Gojek Indonesia\u2019s de-facto transportation partner.<\/p>\n<p style=\"text-align: justify;\">To continue supporting this growth, Gojek runs hundreds of microservices that communicate across multiple data centers. Applications are based on an event-driven architecture and produce billions of events every day. To empower data-driven decision-making, Gojek uses these events across products and services for analytics, machine learning, and more.<\/p>\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_80 counter-hierarchy ez-toc-counter ez-toc-grey ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of contents<\/p>\n<span class=\"ez-toc-title-toggle\"><a href=\"#\" class=\"ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle\" aria-label=\"Toggle Table of Content\"><span class=\"ez-toc-js-icon-con\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #999;color:#999\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewbox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #999;color:#999\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewbox=\"0 0 24 24\" version=\"1.2\" baseprofile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/span><\/a><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/gcloudvn.com\/en\/kienthuc\/gioi-thieu-firehose-cong-cu-ma-nguon-mo-cua-gojek\/#Cac_thach_thuc_khi_nhap_kho_du_lieu\" >Data warehouse ingestion challenges<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/gcloudvn.com\/en\/kienthuc\/gioi-thieu-firehose-cong-cu-ma-nguon-mo-cua-gojek\/#Duoi_day_la_cac_tinh_nang_chinh_cua_Firehose\" >Here are Firehose\u2019s key features:<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/gcloudvn.com\/en\/kienthuc\/gioi-thieu-firehose-cong-cu-ma-nguon-mo-cua-gojek\/#Cac_uu_diem_chinh\" >Key advantages<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/gcloudvn.com\/en\/kienthuc\/gioi-thieu-firehose-cong-cu-ma-nguon-mo-cua-gojek\/#Do_tin_cay\" >Reliability<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/gcloudvn.com\/en\/kienthuc\/gioi-thieu-firehose-cong-cu-ma-nguon-mo-cua-gojek\/#Nhap_truc_tuyen\" >Streaming ingestion<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/gcloudvn.com\/en\/kienthuc\/gioi-thieu-firehose-cong-cu-ma-nguon-mo-cua-gojek\/#Su_phat_trien_cua_schema\" >Schema evolution<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/gcloudvn.com\/en\/kienthuc\/gioi-thieu-firehose-cong-cu-ma-nguon-mo-cua-gojek\/#Co_so_ha_tang_dan_hoi\" >Elastic infrastructure<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-8\" href=\"https:\/\/gcloudvn.com\/en\/kienthuc\/gioi-thieu-firehose-cong-cu-ma-nguon-mo-cua-gojek\/#To_chuc_du_lieu_trong_luu_tru_dam_may\" >Organizing data in cloud storage<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-9\" href=\"https:\/\/gcloudvn.com\/en\/kienthuc\/gioi-thieu-firehose-cong-cu-ma-nguon-mo-cua-gojek\/#Ho_tro_nhieu_loai_phan_mem_nguon_mo\" >Supporting a wide range of open source software<\/a><\/li><\/ul><\/nav><\/div>\n<h2 style=\"text-align: justify;\"><span class=\"ez-toc-section\" id=\"Cac_thach_thuc_khi_nhap_kho_du_lieu\"><\/span>Data warehouse ingestion challenges<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p style=\"text-align: justify;\">To make sense of large amounts of data \u2013 and to better understand customers for application development, customer support, growth, and marketing purposes \u2013 data must first be imported into a data warehouse. Gojek uses <a href=\"https:\/\/gcloudvn.com\/en\/bigquery\/\">BigQuery<\/a> main data warehouse. But importing events at Gojek&#039;s scale, with its rapid changes, poses the following challenges:<\/p>\n<ul style=\"text-align: justify;\">\n<li>With many products and microservices on offer, Gojek releases new Kafka topics almost every day, and they need to be imported for analytics purposes. This can quickly lead to significant overhead for the data engineering team that is rolling out new work to load data into BigQuery and <a href=\"https:\/\/gcloudvn.com\/en\/cloud-storage\/\">Cloud Storage<\/a>.<\/li>\n<li>Frequent schema changes in Kafka topics require consumers of those topics to load the new schema to avoid data loss and capture more recent changes.<\/li>\n<li>Data volumes can vary and grow exponentially as people start building new products and logging new activities on top of a new topic. Each topic can also have a different load during peak business hours. Customers need to handle the rising volume of data to quickly scale per their business needs.<\/li>\n<\/ul>\n<p style=\"text-align: justify;\">To address these challenges, Gojek uses Firehose, a native cloud service to deliver real-time streaming data to destinations such as service endpoints, managed databases, data lakes, and data warehouses such as Cloud Storage and BigQuery. Firehose is part of the Open Data Ops Foundation (ODPF) and is completely open source. Gojek is one of the main contributors to ODPF.<\/p>\n<h2 style=\"text-align: justify;\"><span class=\"ez-toc-section\" id=\"Duoi_day_la_cac_tinh_nang_chinh_cua_Firehose\"><\/span>Here are Firehose\u2019s key features:<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<ul style=\"text-align: justify;\">\n<li><strong>Sinks<\/strong> - Firehose supports sinking stream data to the log console, HTTP, GRPC, PostgresDB (JDBC), InfluxDB, Elastic Search, Redis, Prometheus, MongoDB, GCS, and BigQuery.<\/li>\n<li><strong>Extensibility<\/strong> - Firehose allows users to add a custom sink with a clearly defined interface, or choose from existing sinks.<\/li>\n<li><strong>Scale<\/strong> - Firehose scales in an instant, both vertically and horizontally, for a high-performance streaming sink with zero data drops.<\/li>\n<li><strong>Runtime<\/strong> \u2013 Firehose can run inside containers or virtual machines in a fully managed runtime environment like <a href=\"https:\/\/gcloudvn.com\/en\/google-kubernetes-engine-gke\/\">Kubernetes<\/a>.<\/li>\n<li><strong>Metrics<\/strong> - Firehose always lets you know what\u2019s going on with your deployment, with built-in monitoring of throughput, response times, errors, and more.<\/li>\n<\/ul>\n<p style=\"text-align: justify;\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-9206\" src=\"https:\/\/gcloudvn.com\/wp-content\/uploads\/2022\/07\/Firehose.max-2000x2000-2-1024x663.png\" alt=\"Introducing Firehose: Gojek&#039;s open source tool for seamless ingesting data into BigQuery and Cloud Storage 1\" width=\"600\" height=\"389\" srcset=\"https:\/\/gcloudvn.com\/wp-content\/uploads\/2022\/07\/Firehose.max-2000x2000-2-1024x663.png 1024w, https:\/\/gcloudvn.com\/wp-content\/uploads\/2022\/07\/Firehose.max-2000x2000-2-300x194.png 300w, https:\/\/gcloudvn.com\/wp-content\/uploads\/2022\/07\/Firehose.max-2000x2000-2-768x497.png 768w, https:\/\/gcloudvn.com\/wp-content\/uploads\/2022\/07\/Firehose.max-2000x2000-2-1536x995.png 1536w, https:\/\/gcloudvn.com\/wp-content\/uploads\/2022\/07\/Firehose.max-2000x2000-2-18x12.png 18w, https:\/\/gcloudvn.com\/wp-content\/uploads\/2022\/07\/Firehose.max-2000x2000-2.png 2000w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><\/p>\n<h2 style=\"text-align: justify;\"><span class=\"ez-toc-section\" id=\"Cac_uu_diem_chinh\"><\/span>Key advantages<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p style=\"text-align: justify;\">Using Firehose for ingesting data in BigQuery and Cloud Storage has multiple advantages.<\/p>\n<h3 style=\"text-align: justify;\"><span class=\"ez-toc-section\" id=\"Do_tin_cay\"><\/span>Reliability<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p style=\"text-align: justify;\">Firehose is battle-tested for large-scale data ingestion. At Gojek, Firehose streams 600 Kafka topics in BigQuery and 700 Kafka topics in Cloud Storage. On average, 6 billion events are ingested daily in BigQuery, resulting in more than 10 terabytes of daily data ingestion.<\/p>\n<h3 style=\"text-align: justify;\"><span class=\"ez-toc-section\" id=\"Nhap_truc_tuyen\"><\/span>Streaming ingestion<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p style=\"text-align: justify;\">A single Kafka topic can produce up to billions of records in a day. Depending on the nature of the business, scalability and data freshness are key to ensuring the usability of that data, regardless of the load. Firehose uses BigQuery streaming ingestion to load data in near-real-time. This allows analysts to query data within five minutes of it being produced.<\/p>\n<h3 style=\"text-align: justify;\"><span class=\"ez-toc-section\" id=\"Su_phat_trien_cua_schema\"><\/span>Schema evolution<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p style=\"text-align: justify;\">With multiple products and microservices offered, new Kafka topics are released almost every day, and the schema of Kafka topics constantly evolves as new data is produced. A common challenge is ensuring that as these topics evolve, their schema changes are adjusted in BigQuery tables and Cloud Storage. Firehose tracks schema changes by integrating with Stencil, a cloud-native schema registry, and automatically updates the schema of BigQuery tables without human intervention. This reduces data errors and saves developers hundreds of hours.<\/p>\n<h3 style=\"text-align: justify;\"><span class=\"ez-toc-section\" id=\"Co_so_ha_tang_dan_hoi\"><\/span>Elastic infrastructure<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p style=\"text-align: justify;\">Firehose can be deployed on Kubernetes and runs as a stateless service. This allows Firehose to scale horizontally as data volumes vary.<\/p>\n<h3 style=\"text-align: justify;\"><span class=\"ez-toc-section\" id=\"To_chuc_du_lieu_trong_luu_tru_dam_may\"><\/span>Organizing data in cloud storage<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p style=\"text-align: justify;\">Firehose GCS Sink provides capabilities to store data based on specific timestamp information, allowing users to customize how their data is partitioned in Cloud Storage.<\/p>\n<h2 style=\"text-align: justify;\"><span class=\"ez-toc-section\" id=\"Ho_tro_nhieu_loai_phan_mem_nguon_mo\"><\/span>Supporting a wide range of open source software<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p style=\"text-align: justify;\">Built for flexibility and reliability, Google Cloud products like BigQuery and Cloud Storage are made to support a multi-cloud architecture. Open source software like Firehose is just one of many examples that can help developers and engineers optimize productivity. Taken together, these tools can deliver a seamless data ingestion process, with less maintenance and better automation.<\/p>\n<p style=\"text-align: justify;\"><span style=\"font-weight: 400;\">If your business is interested in the <a href=\"https:\/\/gcloudvn.com\/en\/google-cloud-platform\/\">Google Cloud Platform<\/a> Platform then you can connect to Gimasys - Google Premier Partner - for consulting solutions according to the unique needs of your business.<\/span><span style=\"font-weight: 400;\"> Contact now:<\/span><\/p>\n<ul style=\"text-align: justify;\">\n<li style=\"font-weight: 400;\" aria-level=\"1\"><b>Gimasys \u2013 Google Cloud Premier Partner<\/b><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><b>Hotline: <\/b><span style=\"font-weight: 400;\">Hanoi: <\/span><span style=\"font-weight: 400;\">0987 682 505<\/span><span style=\"font-weight: 400;\"> - Ho Chi Minh: <\/span><span style=\"font-weight: 400;\">0974 417 099<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><b>Email: <\/b>gcp@gimasys.com<\/li>\n<\/ul>\n<p style=\"text-align: right;\"><strong>Source: Gimasys<\/strong><\/p>","protected":false},"excerpt":{"rendered":"<p>L\u00e0 c\u00f4ng ty hyperlocal l\u1edbn nh\u1ea5t Indonesia, Gojek \u0111\u00e3 ph\u00e1t tri\u1ec3n t\u1eeb m\u1ed9t d\u1ecbch v\u1ee5 g\u1ecdi xe m\u00f4 t\u00f4 th\u00e0nh m\u1ed9t n\u1ec1n t\u1ea3ng di \u0111\u1ed9ng theo y\u00eau c\u1ea7u, cung c\u1ea5p m\u1ed9t lo\u1ea1t c\u00e1c d\u1ecbch v\u1ee5 bao g\u1ed3m v\u1eadn chuy\u1ec3n, h\u1eadu&hellip;<\/p>","protected":false},"author":2,"featured_media":9212,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"footnotes":""},"categories":[1,135],"tags":[],"class_list":["post-9192","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-kienthuc","category-google-cloud-platform","entry","has-media"],"_links":{"self":[{"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/posts\/9192","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/comments?post=9192"}],"version-history":[{"count":0,"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/posts\/9192\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/media\/9212"}],"wp:attachment":[{"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/media?parent=9192"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/categories?post=9192"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/tags?post=9192"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}