Accelerate Ray in production with new Ray Operator on GKE

26/09/2024

Lĩnh vực AI liên tục phát triển, Dặc biệt với những tiến bộ gần đây trong trí tuệ nhân tạo sinh (generative AI), các mô hình ngày càng lớn và phức tạp hơn, buộc các tổ chức phải phân phối hiệu quả các tác vụ trên nhiều máy hơn. Một cách tiếp cận mạnh mẽ là chạy ray.io, một khung phân tán mã nguồn mở cho khối lượng công việc AI/ML phân tán, trên Google Kubernetes Engine (GKE), dịch vụ orchestration container được quản lý của Google Cloud. Để biến mô hình này trở nên siêu dễ dàng để triển khai, giờ đây bạn có thể bật các API khai báo để quản lý các cụm Ray trên GKE chỉ với một tùy chọn cấu hình!

Ray cung cấp một API đơn giản để phân phối và song song hóa các tác vụ học máy một cách liền mạch, trong khi GKE cung cấp một nền tảng cơ sở hạ tầng có thể mở rộng và linh hoạt giúp đơn giản hóa quản lý ứng dụng và cải thiện việc sử dụng tài nguyên. Cùng nhau, GKE và Ray cung cấp khả năng mở rộng, chịu lỗi và dễ sử dụng để xây dựng, triển khai và quản lý các ứng dụng Ray. Hơn nữa, Ray Operator tích hợp sẵn trên GKE giúp đơn giản hóa thiết lập ban đầu và hướng dẫn người dùng tới các thực tiễn tốt nhất để chạy Ray trong môi trường sản xuất. Nó được xây dựng với các hoạt động ngày thứ 2, với hỗ trợ tích hợp cho Cloud Logging và Cloud Monitoring để nâng cao khả năng quan sát các ứng dụng Ray của bạn trên GKE.

"Ray Operator on GKE has transformed our workflow. We've slashed maintenance time and now spin up Ray clusters in 30 minutes – a task that used to take days. It's a game-changer." - Mengliao (Mike) Wang, Geotab

Getting started

In the Google Cloud console, when creating a new GKE Cluster, select the feature checkbox to ‘Enable Ray Operator.’ With a GKE Autopilot Cluster, this can be found in ‘Advanced Settings’ under ‘AI and Machine Learning.’

With a Standard Cluster, you can find the Enable Ray Operator feature checkbox in the ‘Features’ Menu under ‘AI and Machine Learning.’

To use the gcloud CLI, you can set an addons flag as following:

gcloud container clusters create CLUSTER_NAME \

— cluster-version=VERSION \

— addons=RayOperator

Để sử dụng Terraform, bạn có thể kích hoạt addon như sau:

resource “google_container_cluster” “ray-cluster” {

name = “gke-standard-regional-ray-operator”

location = “us-west1”

initial_node_count = 1

release_channel {

channel = “RAPID”

}

addons_config {

ray_operator_config {

enabled = true

ray_cluster_logging_config {

enabled = true

}

ray_cluster_monitoring_config {

enabled = true

}

Once enabled, GKE hosts and manages the Ray Operator on your behalf. Your cluster will be ready to create Ray Clusters and run Ray applications after cluster creation..

You can find examples for using Ray to serve large language models in our documentation. Or see how to train a model with Ray and Pytorch.

Table of contents

Logging and monitoring

Effective logging and metrics are essential when deploying Ray in production. The GKE Ray Operator offers optional features that automate the collection of logs and metrics, seamlessly storing them in Cloud Logging and Cloud Monitoring for easy access and analysis.

Enabling log collection ensures Ray logs are automatically captured and stored in Cloud Logging, encompassing all logs from both the Ray cluster Head node and Worker nodes. This feature centralizes log aggregation across all your Ray clusters, ensuring that even if the Ray cluster is shut down — intentionally or unexpectedly — the generated logs are preserved and readily searchable.

Enabling metrics collection allows GKE to gather all system metrics exported by Ray by leveraging Managed Service for Prometheus. System metrics are vital for monitoring the performance of your resources and quickly identifying errors. This comprehensive visibility is particularly crucial when dealing with expensive infrastructure such as GPUs. Cloud Monitoring makes it simple to create dashboards and set alerts, keeping you informed about the health of your Ray resources.

TPU support

Tensor Processing Units (TPUs) are purpose-built hardware accelerators that dramatically speed up the training and inference of large machine learning models. With our AI Hypercomputer architecture, it’s easy to combine Ray with TPUs, to seamlessly scale your high-performance ML applications.

The GKE Ray Operator streamlines TPU integration by managing admission webhooks for TPU Pod scheduling and adding the necessary TPU environment variables for frameworks like JAX. It also supports autoscaling for both single-host and multi-host Ray clusters.

Decrease startup latency

Minimizing start-up latency is crucial when running AI workloads in production, both for maintaining uptime and maximizing the use of costly hardware accelerators. The GKE Ray Operator, combined with other GKE features, can dramatically reduce this start-up time.

Hosting your Ray images on Artifact Registry and enabling image streaming can lead to substantial reductions in the time it takes to pull images for your Ray clusters. Large dependencies, often necessary for machine learning, can result in bulky container images that take several minutes to pull. Image streaming can cut this image pull time significantly, see Use Image streaming to pull container images for more details.

You can also enable GKE secondary boot disks to preload model weights or container images onto new nodes. This capability, in combination with image streaming, can lead to a 29X faster start-up time for your Ray applications, leading to better utilization of your hardware accelerators. See Use secondary boot disks to preload data or container images for more details.

Scale Ray in production today

Keeping up with the rapid advancements in AI requires a platform that scales alongside your workloads while offering a streamlined Pythonic experience that your AI developers are familiar with. Ray on GKE delivers this powerful combination of usability, scalability, and reliability. With the GKE Ray Operator, getting started and implementing best practices for scaling Ray in production is easier than ever.

Logging and monitoring

TPU support

Decrease startup latency

Scale Ray in production today

Related Posts