{"id":21241,"date":"2025-01-15T14:39:16","date_gmt":"2025-01-15T07:39:16","guid":{"rendered":"https:\/\/gcloudvn.com\/?p=21241"},"modified":"2025-01-20T09:28:10","modified_gmt":"2025-01-20T02:28:10","slug":"scaling-to-zero-on-google-kubernetes-engine-with-keda","status":"publish","type":"post","link":"https:\/\/gcloudvn.com\/en\/kienthuc\/scaling-to-zero-on-google-kubernetes-engine-with-keda\/","title":{"rendered":"Scaling to zero on Google Kubernetes Engine with KEDA"},"content":{"rendered":"<section class=\"wpb-content-wrapper\"><div class=\"vc_row wpb_row vc_row-fluid\"><div class=\"wpb_column vc_column_container vc_col-sm-12\"><div class=\"vc_column-inner\"><div class=\"wpb_wrapper\">\n\t<div class=\"wpb_text_column wpb_content_element\" >\n\t\t<div class=\"wpb_wrapper\">\n\t\t\t<p><span style=\"font-weight: 400;\">For developers and enterprises running applications on Google Kubernetes Engine (GKE), scaling deployment infrastructure to zero when idle can provide significant financial savings. GKE Cluster Autoscaler efficiently manages the size of a node cluster, but for applications that require a full shutdown and startup (scaling a node cluster all the way to zero and vice versa), you\u2019ll need an alternative, as GKE doesn\u2019t provide scale to zero functionality out of the box. This is important for applications with intermittent workloads or variable traffic patterns. <\/span><span style=\"font-weight: 400;\">In this blog post, Gimasys will help you learn how to integrate the open source Kubernetes Event-driven Autoscaler (KEDA) to achieve this. With KEDA, you can tie your costs directly to your needs, paying only for the resources you use.<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-21211\" src=\"https:\/\/gcloudvn.com\/wp-content\/uploads\/2025\/01\/Thang-72024-12.jpg\" alt=\"\" width=\"600\" height=\"375\" srcset=\"https:\/\/gcloudvn.com\/wp-content\/uploads\/2025\/01\/Thang-72024-12.jpg 600w, https:\/\/gcloudvn.com\/wp-content\/uploads\/2025\/01\/Thang-72024-12-18x12.jpg 18w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><\/p>\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_80 counter-hierarchy ez-toc-counter ez-toc-grey ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of contents<\/p>\n<span class=\"ez-toc-title-toggle\"><a href=\"#\" class=\"ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle\" aria-label=\"Toggle Table of Content\"><span class=\"ez-toc-js-icon-con\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #999;color:#999\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewbox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #999;color:#999\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewbox=\"0 0 24 24\" version=\"1.2\" baseprofile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/span><\/a><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/gcloudvn.com\/en\/kienthuc\/scaling-to-zero-on-google-kubernetes-engine-with-keda\/#Tai_sao_can_scale_to_zero\" >Why scale to zero?<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/gcloudvn.com\/en\/kienthuc\/scaling-to-zero-on-google-kubernetes-engine-with-keda\/#Gioi_thieu_KEDA_cho_GKE\" >Introducing KEDA for GKE<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/gcloudvn.com\/en\/kienthuc\/scaling-to-zero-on-google-kubernetes-engine-with-keda\/#Cac_truong_hop_su_dung\" >Use cases<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/gcloudvn.com\/en\/kienthuc\/scaling-to-zero-on-google-kubernetes-engine-with-keda\/#Bat_dau_voi_KEDA_tren_GKE\" >Get started with KEDA on GKE<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/gcloudvn.com\/en\/kienthuc\/scaling-to-zero-on-google-kubernetes-engine-with-keda\/#Ket_luan\" >Conclusion<\/a><\/li><\/ul><\/nav><\/div>\n<h2><span class=\"ez-toc-section\" id=\"Tai_sao_can_scale_to_zero\"><\/span><b>Why scale to zero?<\/b><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">Minimizing costs is a primary driver for scaling to zero, and applies to a wide variety of scenarios. For technical experts, this is particularly crucial when dealing with:<\/span><\/p>\n<ul>\n<li><span style=\"font-weight: 400;\">\u00a0GPU-intensive workloads: AI\/ML workloads often require powerful GPUs, which can be expensive to keep running even when idle.<\/span><\/li>\n<li><span style=\"font-weight: 400;\">\u00a0Applications with predictable downtime: Internal tools with specific usage hours \u2014 scale down resources for applications used only during business hours or specific days of the week.<\/span><\/li>\n<li><span style=\"font-weight: 400;\">\u00a0Seasonal applications: Scale to zero during the off-season for applications with predictable periods of low activity.<\/span><\/li>\n<li><span style=\"font-weight: 400;\">\u00a0On-demand staging environments: Replicate production environments for testing and validation, scaling them to zero after testing is complete.<\/span><\/li>\n<li><span style=\"font-weight: 400;\">\u00a0Development, demo and proof-of-concept environments:<\/span>\n<ul>\n<li><span style=\"font-weight: 400;\">\u00a0Short-term demonstrations: Showcase applications or features to clients or stakeholders, scaling down resources after the demonstration.<\/span><\/li>\n<li><span style=\"font-weight: 400;\">Temporary proof-of-concept deployments: Test new ideas or technologies in a live environment, scaling to zero after evaluation.<\/span><\/li>\n<li><span style=\"font-weight: 400;\">Development environment: Spin up resources for testing, code reviews, or feature branches and scale them down to zero when not needed, optimizing costs for temporary workloads.<\/span><\/li>\n<\/ul>\n<\/li>\n<li><span style=\"font-weight: 400;\">Event-driven applications:<\/span>\n<ul>\n<li><span style=\"font-weight: 400;\">Microservices with sporadic traffic: Scale individual services to zero when they are idle and automatically scale them up when requests arrive, optimizing resource utilization for unpredictable traffic patterns.<\/span><\/li>\n<li><span style=\"font-weight: 400;\">Serverless functions: Execute code in response to events without managing servers, automatically scaling to zero when inactive.<\/span><\/li>\n<\/ul>\n<\/li>\n<li><span style=\"font-weight: 400;\">Disaster recovery and business continuity: Maintain a minimal set of core resources in a standby state, ready to scale up rapidly in case of a disaster, minimizing costs while ensuring business continuity.<\/span><\/li>\n<\/ul>\n<h2><span class=\"ez-toc-section\" id=\"Gioi_thieu_KEDA_cho_GKE\"><\/span><b>Introducing KEDA for GKE<\/b><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">KEDA is an open-source, Kubernetes-native solution that enables you to scale deployments based on a variety of metrics and events. KEDA can trigger scaling actions based on external events such as message queue depth or incoming HTTP requests. And unlike the current implementation of Horizontal Pod Autoscaler (HPA), KEDA supports scaling workloads to zero, making it a strong choice for handling intermittent jobs or applications with fluctuating demand.<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-21210\" src=\"https:\/\/gcloudvn.com\/wp-content\/uploads\/2025\/01\/Thang-72024-13.jpg\" alt=\"\" width=\"600\" height=\"375\" srcset=\"https:\/\/gcloudvn.com\/wp-content\/uploads\/2025\/01\/Thang-72024-13.jpg 600w, https:\/\/gcloudvn.com\/wp-content\/uploads\/2025\/01\/Thang-72024-13-18x12.jpg 18w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/><\/p>\n<h2><span class=\"ez-toc-section\" id=\"Cac_truong_hop_su_dung\"><\/span><b>Use cases<\/b><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">Let's explore two common scenarios where KEDA's ability to shrink to zero benefits Gimasys:<\/span><\/p>\n<ul>\n<li><span style=\"font-weight: 400;\">Scaling a Pub\/Sub worker<\/span>\n<ul>\n<li><span style=\"font-weight: 400;\">Scenario: A deployment processes messages from a Pub\/Sub topic. When no messages are available, scaling down to zero saves resources and costs.<\/span><\/li>\n<li><span style=\"font-weight: 400;\">Solution: KEDA's Pub\/Sub scaler monitors the message queue and triggers scaling actions accordingly. By configuring a ScaledObject resource, you can specify that the deployment scales down to zero replicas when the queue is empty.<\/span><\/li>\n<\/ul>\n<\/li>\n<li><span style=\"font-weight: 400;\">Scaling a GPU-dependent workload, such as an Ollama deployment for LLM serving<\/span>\n<ul>\n<li><span style=\"font-weight: 400;\">Scenario: An Ollama-based large language model (LLM) performs inference tasks. To minimize GPU usage and costs, the deployment needs to scale down to zero when there are no inference requests.<\/span><\/li>\n<li><span style=\"font-weight: 400;\">Solution: Combining HTTP-KEDA (a beta feature of KEDA) with Ollama enables scale-to-zero functionality. HTTP-KEDA scales deployments based on HTTP request metrics, while Ollama serves the LLM.<\/span><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h2><span class=\"ez-toc-section\" id=\"Bat_dau_voi_KEDA_tren_GKE\"><\/span><b>Get started with KEDA on GKE<\/b><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">KEDA offers a powerful and flexible solution for achieving scale-to-zero functionality on GKE. By leveraging KEDA's event-driven scaling capabilities, you can optimize resource utilization, minimize costs, and improve the efficiency of your Kubernetes deployments. Please remember to validate usage scenarios as scale to zero mechanism can influence workload performance. Scaling to zero can increase latency due to cold starts. When an application scales to zero, it means there are no running instances. When a request comes in, a new instance has to be started, increasing latency. <\/span><span style=\"font-weight: 400;\">There are also considerations about state management. When instances are terminated, any in-memory state is lost.<\/span><\/p>\n<h2><span class=\"ez-toc-section\" id=\"Ket_luan\"><\/span>Conclusion<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Auto-scaling to zero on Google Kubernetes Engine using KEDA not only optimizes resource utilization but also delivers significant economic benefits. By automatically shutting down pods when there is no load, enterprises can reduce operational costs and increase resource utilization. KEDA has proven to be a useful tool for building flexible, adaptive, and cost-effective cloud-native systems.<\/p>\n\n\t\t<\/div>\n\t<\/div>\n<div class=\"templatera_shortcode\"><div class=\"vc_row wpb_row vc_row-fluid\"><div class=\"wpb_column vc_column_container vc_col-sm-12\"><div class=\"vc_column-inner\"><div class=\"wpb_wrapper\"><div class=\"vc_message_box vc_message_box-standard vc_message_box-rounded vc_color-blue\" ><div class=\"vc_message_box-icon\"><i class=\"vc-mono vc-mono-technorati\"><\/i><\/div><p><a href=\"https:\/\/gcloudvn.com\/en\/main-logo-1\/\"><img loading=\"lazy\" decoding=\"async\" class=\"alignleft wp-image-664\" src=\"https:\/\/gcloudvn.com\/wp-content\/uploads\/2021\/06\/main-logo-1.png\" alt=\"\" width=\"221\" height=\"72\" srcset=\"https:\/\/gcloudvn.com\/wp-content\/uploads\/2021\/06\/main-logo-1.png 214w, https:\/\/gcloudvn.com\/wp-content\/uploads\/2021\/06\/main-logo-1-18x6.png 18w, https:\/\/gcloudvn.com\/wp-content\/uploads\/2021\/06\/main-logo-1-183x60.png 183w\" sizes=\"auto, (max-width: 221px) 100vw, 221px\" \/><\/a>As a senior partner of Google in Vietnam, Gimasys has more than 10+ years of experience, consulting on implementing digital transformation for 2000+ domestic corporations. Some typical customers Jetstar, Dien Quan Media, Heineken, Jollibee, Vietnam Airline, HSC, SSI...<\/p>\n<p>Gimasys is currently a strategic partner of many major technology companies in the world such as Salesforce, Oracle Netsuite, Tableau, Mulesoft.<\/p>\n<p>Contact Gimasys - Google Cloud Premier Partner for advice on strategic solutions suitable to the specific needs of your business:<\/p>\n<ul>\n<li>Email: gcp@gimasys.com<\/li>\n<li>Hotline: 0974 417 099<\/li>\n<\/ul>\n<\/div><\/div><\/div><\/div><\/div><\/div><\/div><\/div><\/div><\/div>\n<\/section>","protected":false},"excerpt":{"rendered":"\u0110\u1ed1i v\u1edbi c\u00e1c nh\u00e0 ph\u00e1t tri\u1ec3n v\u00e0 doanh nghi\u1ec7p ch\u1ea1y \u1ee9ng d\u1ee5ng tr\u00ean Google Kubernetes Engine (GKE), vi\u1ec7c thu nh\u1ecf h\u1ea1 t\u1ea7ng tri\u1ec3n khai xu\u1ed1ng c\u00f2n 0 khi kh\u00f4ng ho\u1ea1t \u0111\u1ed9ng c\u00f3 th\u1ec3 mang l\u1ea1i l\u1ee3i \u00edch ti\u1ebft ki\u1ec7m t\u00e0i&hellip;","protected":false},"author":2,"featured_media":21210,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"footnotes":""},"categories":[1,135],"tags":[],"class_list":["post-21241","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-kienthuc","category-google-cloud-platform","entry","has-media"],"_links":{"self":[{"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/posts\/21241","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/comments?post=21241"}],"version-history":[{"count":0,"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/posts\/21241\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/media\/21210"}],"wp:attachment":[{"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/media?parent=21241"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/categories?post=21241"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/tags?post=21241"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}