Introduction to Data on EKS (DoEKS)
Overview
Data on Amazon EKS, often abbreviated as DoEKS, is a comprehensive platform designed to build, scale, and optimize both Data and AI/ML solutions on Amazon Elastic Kubernetes Service (EKS). The platform is equipped with a variety of Terraform Blueprints and best practices to deploy data solutions efficiently. It caters to a broad spectrum of data workloads, from distributed data processing to real-time stream processing and advanced machine learning tasks.
Main Features
-
Data Analytics on EKS: DoEKS provides tools and blueprints for running data analytics on Amazon EKS using frameworks like Apache Spark for distributed data processing and Apache Flink for real-time data streaming.
-
AI/ML on EKS: Leveraging the capabilities of the Ray ecosystem, the platform allows for the management of AI/ML workloads with enhanced distributed computing capabilities. DoEKS also supports NVIDIA Triton Server for serving AI models and uses AWS specialized hardware like AWS Trainium and AWS Inferentia for efficient model training and inference.
-
Streaming Platforms: For high-throughput messaging, DoEKS integrates Apache Kafka for managing data streams efficiently on EKS.
-
Scheduler Workflow Platforms: The platform supports the automation of complex workflows with tools like Apache Airflow, allowing users to orchestrate data analytics and processing tasks seamlessly.
-
Distributed Databases & Query Engine: DoEKS also supports the deployment of distributed databases and query engines, enabling scalable data storage and querying capabilities.
Getting Started
DoEKS offers various blueprints to kickstart building your own Data/ML platforms on Amazon EKS clusters. These blueprints provide guidance through examples like deploying EMR on EKS with Karpenter, managing Spark jobs using Apache YuniKorn, setting up self-managed Airflow, and deploying Kafka with Strimzi Kafka operator.
Architecture
The architecture of DoEKS incorporates a multitude of open-source data tools, Kubernetes operators, and frameworks. It seamlessly integrates AWS's managed data analytics services with these open-source offerings to provide a robust and scalable platform for users.
Motivation
The core motivation behind DoEKS is to address the challenges faced by users in deploying and scaling data and AI workloads on Kubernetes. By offering open-source blueprints, DoEKS simplifies the deployment process, enabling users to efficiently manage Kubernetes environments without getting bogged down by complexities.
Support & Community
DoEKS is not an AWS service but is supported by AWS Solution Architects on a best-effort basis. It encourages feedback and participation from the community, emphasizing collective growth and development.
Conclusion
Data on EKS is a powerful, open-source platform that simplifies and optimizes the deployment of data and AI workloads on Amazon EKS. Its extensive range of tools and integrations provides users with the flexibility to scale and manage complex data environments efficiently. Whether you are new or experienced in data and AI workloads, DoEKS offers tools and infrastructure to enhance your productivity and operational performance.
For more details and to explore deployment blueprints, visit the Data on EKS website.