{"id":25086,"date":"2026-04-13T14:38:47","date_gmt":"2026-04-13T07:38:47","guid":{"rendered":"https:\/\/gcloudvn.com\/?p=25086"},"modified":"2026-04-13T14:38:47","modified_gmt":"2026-04-13T07:38:47","slug":"introducing-gemma-4-on-google-cloud-our-most-capable-open-models-yet","status":"publish","type":"post","link":"https:\/\/gcloudvn.com\/en\/introducing-gemma-4-on-google-cloud-our-most-capable-open-models-yet\/","title":{"rendered":"Introducing Gemma 4 on Google Cloud: Our most capable open models yet"},"content":{"rendered":"<p><b>Today, we are releasing Gemma 4 on Google Cloud.<\/b><\/p>\n<p><a href=\"https:\/\/gcloudvn.com\/en\/ransomware-detection-and-file-restoration-for-google-drive-now-generally-available\/attachment\/gemma_4_cloud_blog_header-max-2000x2000\/\" rel=\"attachment wp-att-25019\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-25019\" src=\"https:\/\/gcloudvn.com\/wp-content\/uploads\/2026\/04\/Gemma_4_Cloud_Blog_Header.max-2000x2000-1.png\" alt=\"\" width=\"1920\" height=\"1080\" srcset=\"https:\/\/gcloudvn.com\/wp-content\/uploads\/2026\/04\/Gemma_4_Cloud_Blog_Header.max-2000x2000-1.png 1920w, https:\/\/gcloudvn.com\/wp-content\/uploads\/2026\/04\/Gemma_4_Cloud_Blog_Header.max-2000x2000-1-768x432.png 768w, https:\/\/gcloudvn.com\/wp-content\/uploads\/2026\/04\/Gemma_4_Cloud_Blog_Header.max-2000x2000-1-1536x864.png 1536w, https:\/\/gcloudvn.com\/wp-content\/uploads\/2026\/04\/Gemma_4_Cloud_Blog_Header.max-2000x2000-1-18x10.png 18w\" sizes=\"auto, (max-width: 1920px) 100vw, 1920px\" \/><\/a>What\u2019s new: It is, byte for byte, the most capable family of open models. Built from the same research as Gemini 3 and released under a commercially permissive Apache 2.0 license, these models move beyond chat. With context windows up to 256K, native vision and audio processing, and fluency in over 140 languages, they excel at complex logic, offline code generation, and agentic workflows. Learn more about the model here.<\/p>\n<p>Why it matters for your business: Enterprise AI requires models that execute complex logic while keeping data within secure boundaries. Gemma 4 gives you this balance. Organizations can deploy these models across Google Cloud to meet strict compliance guarantees, including Sovereign Cloud solutions. This provides a foundation for digital sovereignty, granting teams complete control over their data, infrastructure, and models.<\/p>\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_83 counter-hierarchy ez-toc-counter ez-toc-grey ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of contents<\/p>\n<span class=\"ez-toc-title-toggle\"><a href=\"#\" class=\"ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle\" aria-label=\"Toggle Table of Content\"><span class=\"ez-toc-js-icon-con\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #999;color:#999\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewbox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #999;color:#999\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewbox=\"0 0 24 24\" version=\"1.2\" baseprofile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/span><\/a><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1' ><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/gcloudvn.com\/en\/introducing-gemma-4-on-google-cloud-our-most-capable-open-models-yet\/#Ban_co_the_bat_dau_su_dung_Gemma_4_o_dau\" >Where you can get started with Gemma 4<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/gcloudvn.com\/en\/introducing-gemma-4-on-google-cloud-our-most-capable-open-models-yet\/#Hay_bat_dau_ngay_hom_nay\" >Get started today<\/a><\/li><\/ul><\/nav><\/div>\n<h3><span class=\"ez-toc-section\" id=\"Ban_co_the_bat_dau_su_dung_Gemma_4_o_dau\"><\/span><b>Where you can get started with Gemma 4<\/b><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><b>Vertex AI<\/b><span style=\"font-weight: 400;\"><br \/>\n<\/span><span style=\"font-weight: 400;\">Deploy Gemma 4 to your own Vertex AI endpoints. Select the model from Model Garden and provision the specific compute resources your application requires. This self-deployment model gives you direct control over your serving infrastructure and costs while keeping your data within your Google Cloud environment.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">You can also fine-tune Gemma 4 using<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0<\/span><a href=\"https:\/\/docs.cloud.google.com\/vertex-ai\/docs\/training\/training-clusters\/overview\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">Vertex AI Training Clusters<\/span><\/a><span style=\"font-weight: 400;\"> (VTC), which offer optimized SFT recipes and high-scale resiliency through NVIDIA NeMo Megatron. This ensures you can efficiently adapt any variant, from the effective 2B (E2B) model for edge tasks to the 31B dense model for complex enterprise orchestration. Here\u2019s an end to end guide for efficient fine-tuning and serving of the Gemma 4 31B model on Vertex AI.\u00a0<\/span><\/p>\n<p>Additionally, we're committed to empowering customer choice and innovation through our curated collection of first-party, open, and third-party models available on Vertex AI. That\u2019s why we're thrilled to announce that Gemma 4 26B MoE model will be available as fully managed and serverless on Model Garden over the coming days.<\/p>\n<p><b>Agent Development Kit (ADK) <\/b><span style=\"font-weight: 400;\"><br \/>\n<\/span>ADK is a flexible and modular open-source framework for developing and deploying AI agents. Gemma 4 offers advanced agentic capabilities, including reasoning, function calling, code generation, and structured output. ADK helps you build fully functional AI agents with Gemma 4. Start building AI agents with Gemma 4 and Google ADK today.<\/p>\n<p><b>Cloud Run<\/b><span style=\"font-weight: 400;\"><br \/>\n<\/span><span style=\"font-weight: 400;\">You can now run demanding Gemma 4 inference workloads efficiently on Cloud Run, leveraging the power of NVIDIA RTX PRO 6000 (Blackwell) GPUs. With 96GB of vGPU memory, you can easily deploy models like Gemma-4-31B-it on serverless GPUs.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Cloud Run handles the underlying infrastructure, allowing you to focus on your applications. Your models scale to zero when inactive and dynamically adjust with demand, ensuring optimized costs as you only pay for what you use. Plus, you have the flexibility to tailor CPU and memory configurations for each inference workload. Try it out now, on demand with no reservations, in us-central1 or europe-west4.<\/span><\/p>\n<p><b>Google Kubernetes Engine (GKE) <\/b><b><br \/>\n<\/b><span style=\"font-weight: 400;\">GKE provides a highly scalable and customizable environment for deploying Gemma 4, perfect for teams that require fine-grained control over their AI infrastructure. By managing your own infrastructure on GKE, you gain the flexibility to tailor compute resources, select specific GPU or TPU accelerators, and implement custom autoscaling metrics that match your exact traffic patterns.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">This level of control also ensures your AI workloads can seamlessly integrate with your existing microservices while adhering to your organization's strict security and data compliance requirements.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Starting today, you can efficiently serve Gemma 4 models on GKE using vLLM, a high-throughput and memory-efficient LLM serving engine. By leveraging GKE, you can seamlessly scale your inference workloads from zero to peak demand while optimizing your resource utilization and costs. To help you get started, check out our newly updated tutorial on how to serve <\/span><a href=\"https:\/\/docs.cloud.google.com\/kubernetes-engine\/docs\/tutorials\/serve-gemma-gpu-vllm\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">Gemma 4 on GKE.<\/span><\/a><span style=\"font-weight: 400;\">.<\/span><\/p>\n<p>Looking ahead, Gemma 4 is uniquely positioned to power the next generation of agentic applications on Google Cloud. Pairing Gemma 4\u2019s multi-step planning capabilities with the new GKE Agent Sandbox, developers can safely execute LLM-generated code and tool calls within highly isolated, Kubernetes-native environments that offer sub-second cold starts with up to 300 sandboxes per second for secure, efficient multi-step planning.<\/p>\n<p><span style=\"font-weight: 400;\">Furthermore, by leveraging the GKE Inference Gateway and advanced distributed inference features in llm-d like predicted-latency-based scheduling, these complex workflows benefit from intelligent routing that dynamically balances cache reuse and server load. GKE Inference Gateway with Predictive Latency Boost can cut time-to-first-token (TTFT) latency by up to 70% by replacing heuristic guesswork with real-time capacity-aware routing, no manual tuning required.<\/span><\/p>\n<p><b>Google Cloud TPUs<\/b><span style=\"font-weight: 400;\"><br \/>\n<\/span><span style=\"font-weight: 400;\">Gemma 4 will be available on TPUs across Google Cloud through GKE, GCE, and Vertex AI. Starting today, you can now use a number of popular open source TPU projects to serve, pretrain, and post-train Gemma-4-31B dense and Gemma-4-26B-A4B MoE.<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\">For pretraining and post-training experimentation, you can leverage MaxText and perform post training to customize for text analysis and generation, reasoning and image analysis use cases.<\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\">For online serving and batch inference, you\u2019ll be able to use vLLM TPU for your production workloads using our prebuilt docker containers, quickstart vision, and text demo tutorials.<\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">Stay tuned for community-contributed SGLang-JAX tutorials.<\/span><\/p>\n<p><b>Sovereign Cloud<\/b><span style=\"font-weight: 400;\"><br \/>\n<\/span>Gemma 4 will be available across all our Sovereign Cloud offerings, including public cloud with Data Boundary, Google Cloud Dedicated (such as S3NS in France), and Google Distributed Cloud for air-gapped and on-premises deployments. This expansion reinforces our commitment to an open, sovereign digital world where organizations maintain total control over their data, encryption, and operational environment.<\/p>\n<p><span style=\"font-weight: 400;\">By providing open weights, Gemma 4 empowers developers to build specialized solutions for highly sensitive environments. Enterprise and government agencies can now deploy localized services that respect regional nuances and domain expertise while meeting strict data residency and sovereignty rules. This approach ensures that organizations can innovate rapidly with AI while remaining fully compliant with national and industry requirements.<\/span><\/p>\n<h3><span class=\"ez-toc-section\" id=\"Hay_bat_dau_ngay_hom_nay\"><\/span><b>Get started today<\/b><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><span style=\"font-weight: 400;\">From Vertex AI to Sovereign Cloud, you can start building with Gemma 4 today. By choosing Gemma 4 on Google Cloud, enterprises and sovereign organizations gain a trusted, transparent foundation that delivers state-of-the-art capabilities while meeting the highest standards for security and reliability.<\/span><\/p>","protected":false},"excerpt":{"rendered":"<p>H\u00f4m nay, Google ch\u00ednh th\u1ee9c ph\u00e1t h\u00e0nh Gemma 4 tr\u00ean Google Cloud. C\u00f3 g\u00ec m\u1edbi:\u00a0 X\u00e9t tr\u00ean t\u1eebng byte, \u0111\u00e2y l\u00e0 d\u00f2ng m\u00f4 h\u00ecnh m\u00e3 ngu\u1ed3n m\u1edf m\u1ea1nh m\u1ebd nh\u1ea5t. \u0110\u01b0\u1ee3c x\u00e2y d\u1ef1ng d\u1ef1a tr\u00ean c\u00f9ng m\u1ed9t nghi\u00ean c\u1ee9u v\u1edbi&hellip;<\/p>","protected":false},"author":2,"featured_media":25020,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"footnotes":""},"categories":[1,135],"tags":[],"class_list":["post-25086","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-kienthuc","category-google-cloud-platform","entry","has-media"],"_links":{"self":[{"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/posts\/25086","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/comments?post=25086"}],"version-history":[{"count":0,"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/posts\/25086\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/media\/25020"}],"wp:attachment":[{"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/media?parent=25086"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/categories?post=25086"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/tags?post=25086"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}