{"id":20175,"date":"2024-09-26T10:21:48","date_gmt":"2024-09-26T03:21:48","guid":{"rendered":"https:\/\/gcloudvn.com\/?p=20175"},"modified":"2024-10-01T09:44:09","modified_gmt":"2024-10-01T02:44:09","slug":"accelerate-ray-in-production-with-new-ray-operator-on-gke","status":"publish","type":"post","link":"https:\/\/gcloudvn.com\/en\/kienthuc\/accelerate-ray-in-production-with-new-ray-operator-on-gke\/","title":{"rendered":"Accelerate Ray in production with new Ray Operator on GKE"},"content":{"rendered":"<p><b>L\u0129nh v\u1ef1c AI li\u00ean t\u1ee5c ph\u00e1t tri\u1ec3n,<\/b> D<span style=\"font-weight: 400;\">\u1eb7c bi\u1ec7t v\u1edbi nh\u1eefng ti\u1ebfn b\u1ed9 g\u1ea7n \u0111\u00e2y trong tr\u00ed tu\u1ec7 nh\u00e2n t\u1ea1o sinh (generative AI), c\u00e1c m\u00f4 h\u00ecnh ng\u00e0y c\u00e0ng l\u1edbn v\u00e0 ph\u1ee9c t\u1ea1p h\u01a1n, bu\u1ed9c c\u00e1c t\u1ed5 ch\u1ee9c ph\u1ea3i ph\u00e2n ph\u1ed1i hi\u1ec7u qu\u1ea3 c\u00e1c t\u00e1c v\u1ee5 tr\u00ean nhi\u1ec1u m\u00e1y h\u01a1n. M\u1ed9t c\u00e1ch ti\u1ebfp c\u1eadn m\u1ea1nh m\u1ebd l\u00e0 ch\u1ea1y ray.io, m\u1ed9t khung ph\u00e2n t\u00e1n m\u00e3 ngu\u1ed3n m\u1edf cho kh\u1ed1i l\u01b0\u1ee3ng c\u00f4ng vi\u1ec7c AI\/ML ph\u00e2n t\u00e1n, tr\u00ean Google Kubernetes Engine (GKE), d\u1ecbch v\u1ee5 orchestration container \u0111\u01b0\u1ee3c qu\u1ea3n l\u00fd c\u1ee7a Google Cloud. \u0110\u1ec3 bi\u1ebfn m\u00f4 h\u00ecnh n\u00e0y tr\u1edf n\u00ean si\u00eau d\u1ec5 d\u00e0ng \u0111\u1ec3 tri\u1ec3n khai, gi\u1edd \u0111\u00e2y b\u1ea1n c\u00f3 th\u1ec3 b\u1eadt c\u00e1c API khai b\u00e1o \u0111\u1ec3 qu\u1ea3n l\u00fd c\u00e1c c\u1ee5m Ray tr\u00ean GKE ch\u1ec9 v\u1edbi m\u1ed9t t\u00f9y ch\u1ecdn c\u1ea5u h\u00ecnh!<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Ray cung c\u1ea5p m\u1ed9t API \u0111\u01a1n gi\u1ea3n \u0111\u1ec3 ph\u00e2n ph\u1ed1i v\u00e0 song song h\u00f3a c\u00e1c t\u00e1c v\u1ee5 h\u1ecdc m\u00e1y m\u1ed9t c\u00e1ch li\u1ec1n m\u1ea1ch, trong khi GKE cung c\u1ea5p m\u1ed9t n\u1ec1n t\u1ea3ng c\u01a1 s\u1edf h\u1ea1 t\u1ea7ng c\u00f3 th\u1ec3 m\u1edf r\u1ed9ng v\u00e0 linh ho\u1ea1t gi\u00fap \u0111\u01a1n gi\u1ea3n h\u00f3a qu\u1ea3n l\u00fd \u1ee9ng d\u1ee5ng v\u00e0 c\u1ea3i thi\u1ec7n vi\u1ec7c s\u1eed d\u1ee5ng t\u00e0i nguy\u00ean. C\u00f9ng nhau, GKE v\u00e0 Ray cung c\u1ea5p kh\u1ea3 n\u0103ng m\u1edf r\u1ed9ng, ch\u1ecbu l\u1ed7i v\u00e0 d\u1ec5 s\u1eed d\u1ee5ng \u0111\u1ec3 x\u00e2y d\u1ef1ng, tri\u1ec3n khai v\u00e0 qu\u1ea3n l\u00fd c\u00e1c \u1ee9ng d\u1ee5ng Ray. H\u01a1n n\u1eefa, Ray Operator t\u00edch h\u1ee3p s\u1eb5n tr\u00ean GKE gi\u00fap \u0111\u01a1n gi\u1ea3n h\u00f3a thi\u1ebft l\u1eadp ban \u0111\u1ea7u v\u00e0 h\u01b0\u1edbng d\u1eabn ng\u01b0\u1eddi d\u00f9ng t\u1edbi c\u00e1c th\u1ef1c ti\u1ec5n t\u1ed1t nh\u1ea5t \u0111\u1ec3 ch\u1ea1y Ray trong m\u00f4i tr\u01b0\u1eddng s\u1ea3n xu\u1ea5t. N\u00f3 \u0111\u01b0\u1ee3c x\u00e2y d\u1ef1ng v\u1edbi c\u00e1c ho\u1ea1t \u0111\u1ed9ng ng\u00e0y th\u1ee9 2, v\u1edbi h\u1ed7 tr\u1ee3 t\u00edch h\u1ee3p cho Cloud Logging v\u00e0 Cloud Monitoring \u0111\u1ec3 n\u00e2ng cao kh\u1ea3 n\u0103ng quan s\u00e1t c\u00e1c \u1ee9ng d\u1ee5ng Ray c\u1ee7a b\u1ea1n tr\u00ean GKE.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\"Ray Operator on GKE has transformed our workflow. We've slashed maintenance time and now spin up Ray clusters in 30 minutes \u2013 a task that used to take days. It's a game-changer.\" - Mengliao (Mike) Wang, Geotab<\/span><\/p>\n<p><b>Getting started<\/b><\/p>\n<p><span style=\"font-weight: 400;\">In the Google Cloud console, when creating a new GKE Cluster, select the feature checkbox to \u2018Enable Ray Operator.\u2019 With a GKE Autopilot Cluster, this can be found in \u2018Advanced Settings\u2019 under \u2018AI and Machine Learning.\u2019<\/span><\/p>\n<p><span style=\"font-weight: 400;\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-20176\" src=\"https:\/\/gcloudvn.com\/wp-content\/uploads\/2024\/09\/Untitled-design-11.jpg\" alt=\"\" width=\"600\" height=\"600\" srcset=\"https:\/\/gcloudvn.com\/wp-content\/uploads\/2024\/09\/Untitled-design-11.jpg 600w, https:\/\/gcloudvn.com\/wp-content\/uploads\/2024\/09\/Untitled-design-11-300x300.jpg 300w, https:\/\/gcloudvn.com\/wp-content\/uploads\/2024\/09\/Untitled-design-11-12x12.jpg 12w\" sizes=\"auto, (max-width: 600px) 100vw, 600px\" \/>With a Standard Cluster, you can find the Enable Ray Operator feature checkbox in the \u2018Features\u2019 Menu under \u2018AI and Machine Learning.\u2019<\/span><\/p>\n<p><span style=\"font-weight: 400;\">To use the gcloud CLI, you can set an addons flag as following:<\/span><\/p>\n<p><span style=\"font-weight: 400;\">gcloud container clusters create CLUSTER_NAME \\<\/span><\/p>\n<p><span style=\"font-weight: 400;\">&#8212; cluster-version=VERSION \\<\/span><\/p>\n<p><span style=\"font-weight: 400;\">&#8212; addons=RayOperator<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u0110\u1ec3 s\u1eed d\u1ee5ng Terraform, b\u1ea1n c\u00f3 th\u1ec3 k\u00edch ho\u1ea1t addon nh\u01b0 sau:<\/span><\/p>\n<p><span style=\"font-weight: 400;\">resource &#8220;google_container_cluster&#8221; &#8220;ray-cluster&#8221; {<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0name \u00a0 \u00a0 = &#8220;gke-standard-regional-ray-operator&#8221;<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0location = &#8220;us-west1&#8221;<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u200b<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0initial_node_count = 1<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u200b<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0release_channel {<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0\u00a0\u00a0channel = &#8220;RAPID&#8221;<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0}<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u200b<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0addons_config {<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0\u00a0\u00a0ray_operator_config {<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0\u00a0\u00a0\u00a0\u00a0enabled = true<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0\u00a0\u00a0\u00a0\u00a0ray_cluster_logging_config {<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0enabled = true<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0\u00a0\u00a0\u00a0\u00a0}<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0\u00a0\u00a0\u00a0\u00a0ray_cluster_monitoring_config {<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0enabled = true<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0\u00a0\u00a0\u00a0\u00a0}<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0\u00a0\u00a0}<\/span><\/p>\n<p><span style=\"font-weight: 400;\">\u00a0}<\/span><\/p>\n<p><span style=\"font-weight: 400;\">}<\/span><\/p>\n<p>&nbsp;<\/p>\n<p><span style=\"font-weight: 400;\">Once enabled, GKE hosts and manages the Ray Operator on your behalf. Your cluster will be ready to create Ray Clusters and run Ray applications after cluster creation.<\/span><b>.<\/b><\/p>\n<p><span style=\"font-weight: 400;\">You can find examples for using Ray to serve large language models in our documentation. Or see how to train a model with Ray and Pytorch.\u00a0<\/span><\/p>\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_80 counter-hierarchy ez-toc-counter ez-toc-grey ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of contents<\/p>\n<span class=\"ez-toc-title-toggle\"><a href=\"#\" class=\"ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle\" aria-label=\"Toggle Table of Content\"><span class=\"ez-toc-js-icon-con\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #999;color:#999\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewbox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #999;color:#999\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewbox=\"0 0 24 24\" version=\"1.2\" baseprofile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/span><\/a><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1' ><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/gcloudvn.com\/en\/kienthuc\/accelerate-ray-in-production-with-new-ray-operator-on-gke\/#Logging_and_monitoring\" >Logging and monitoring<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/gcloudvn.com\/en\/kienthuc\/accelerate-ray-in-production-with-new-ray-operator-on-gke\/#TPU_support\" >TPU support<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/gcloudvn.com\/en\/kienthuc\/accelerate-ray-in-production-with-new-ray-operator-on-gke\/#Decrease_startup_latency\" >Decrease startup latency<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/gcloudvn.com\/en\/kienthuc\/accelerate-ray-in-production-with-new-ray-operator-on-gke\/#Scale_Ray_in_production_today\" >Scale Ray in production today<\/a><\/li><\/ul><\/li><\/ul><\/nav><\/div>\n<h3><span class=\"ez-toc-section\" id=\"Logging_and_monitoring\"><\/span><b>Logging and monitoring<\/b><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><span style=\"font-weight: 400;\">Effective logging and metrics are essential when deploying Ray in production. The GKE Ray Operator offers optional features that automate the collection of logs and metrics, seamlessly storing them in Cloud Logging and Cloud Monitoring for easy access and analysis.<\/span><\/p>\n<p><b>Enabling log collection<\/b><span style=\"font-weight: 400;\"> ensures Ray logs are automatically captured and stored in Cloud Logging, encompassing all logs from both the Ray cluster Head node and Worker nodes. This feature centralizes log aggregation across all your Ray clusters, ensuring that even if the Ray cluster is shut down \u2014 intentionally or unexpectedly \u2014 the generated logs are preserved and readily searchable.<\/span><\/p>\n<p><b>Enabling metrics collection<\/b><span style=\"font-weight: 400;\"> allows GKE to gather all system metrics exported by Ray by leveraging Managed Service for Prometheus. System metrics are vital for monitoring the performance of your resources and quickly identifying errors. This comprehensive visibility is particularly crucial when dealing with expensive infrastructure such as GPUs. Cloud Monitoring makes it simple to create dashboards and set alerts, keeping you informed about the health of your Ray resources.<\/span><\/p>\n<h2><span class=\"ez-toc-section\" id=\"TPU_support\"><\/span>TPU support<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><span style=\"font-weight: 400;\">Tensor Processing Units (TPUs) are purpose-built hardware accelerators that dramatically speed up the training and inference of large machine learning models. With our AI Hypercomputer architecture, it\u2019s easy to combine Ray with TPUs, to seamlessly scale your high-performance ML applications.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The GKE Ray Operator streamlines TPU integration by managing admission webhooks for TPU Pod scheduling and adding the necessary TPU environment variables for frameworks like JAX. It also supports autoscaling for both single-host and multi-host Ray clusters.<\/span><\/p>\n<h3><span class=\"ez-toc-section\" id=\"Decrease_startup_latency\"><\/span><b>Decrease startup latency<\/b><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><span style=\"font-weight: 400;\">Minimizing start-up latency is crucial when running AI workloads in production, both for maintaining uptime and maximizing the use of costly hardware accelerators. The GKE Ray Operator, combined with other GKE features, can dramatically reduce this start-up time.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Hosting your Ray images on Artifact Registry and enabling image streaming can lead to substantial reductions in the time it takes to pull images for your Ray clusters. Large dependencies, often necessary for machine learning, can result in bulky container images that take several minutes to pull. Image streaming can cut this image pull time significantly, see Use Image streaming to pull container images for more details.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">You can also enable GKE secondary boot disks to preload model weights or container images onto new nodes. This capability, in combination with image streaming, can lead to a 29X faster start-up time for your Ray applications, leading to better utilization of your hardware accelerators. See Use secondary boot disks to preload data or container images for more details.<\/span><\/p>\n<h3><span class=\"ez-toc-section\" id=\"Scale_Ray_in_production_today\"><\/span><b>Scale Ray in production today<\/b><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><span style=\"font-weight: 400;\">Keeping up with the rapid advancements in AI requires a platform that scales alongside your workloads while offering a streamlined Pythonic experience that your AI developers are familiar with. Ray on GKE delivers this powerful combination of usability, scalability, and reliability. With the GKE Ray Operator, getting started and implementing best practices for scaling Ray in production is easier than ever.<\/span><\/p>","protected":false},"excerpt":{"rendered":"<p>L\u0129nh v\u1ef1c AI li\u00ean t\u1ee5c ph\u00e1t tri\u1ec3n, \u0111\u1eb7c bi\u1ec7t v\u1edbi nh\u1eefng ti\u1ebfn b\u1ed9 g\u1ea7n \u0111\u00e2y trong tr\u00ed tu\u1ec7 nh\u00e2n t\u1ea1o sinh (generative AI), c\u00e1c m\u00f4 h\u00ecnh ng\u00e0y c\u00e0ng l\u1edbn v\u00e0 ph\u1ee9c t\u1ea1p h\u01a1n, bu\u1ed9c c\u00e1c t\u1ed5 ch\u1ee9c ph\u1ea3i ph\u00e2n ph\u1ed1i&hellip;<\/p>","protected":false},"author":2,"featured_media":20177,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"footnotes":""},"categories":[1,135],"tags":[],"class_list":["post-20175","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-kienthuc","category-google-cloud-platform","entry","has-media"],"_links":{"self":[{"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/posts\/20175","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/comments?post=20175"}],"version-history":[{"count":0,"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/posts\/20175\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/media\/20177"}],"wp:attachment":[{"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/media?parent=20175"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/categories?post=20175"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/gcloudvn.com\/en\/wp-json\/wp\/v2\/tags?post=20175"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}