Introducing Caelus
Caelus is a powerful tool developed to optimize the utilization of computing resources in a Kubernetes environment. It is specifically designed to take advantage of underutilized resources on nodes by running extra batch jobs during low-demand periods, such as when online services experience less traffic.
Key Features
-
Metrics Collection: Caelus can gather a variety of metrics on node resources, cgroup resources, and the latency of online jobs. This helps in assessing the current resource usage and performance.
-
Versatile Job Compatibility: The system supports running batch jobs on YARN or Kubernetes, providing flexibility depending on the user's setup.
-
Resource Usage Prediction: The tool forecasts the total resource usage of a node, taking into account various components like online applications and system kernel modules including slab.
-
Dynamic Resource Management: It actively manages several resource isolation mechanisms such as CPU, memory, and disk space to prevent any conflicts between online and batch jobs.
-
Abnormality Detection: Caelus dynamically assesses metrics for anomalies, such as unusual spikes in CPU usage or latency.
-
Intervention Mechanisms: If interference from batch jobs is detected, it can throttle or even terminate these jobs to ensure the performance of online services remains stable.
-
Integration & Support: Supports Prometheus metrics for monitoring and alerts users with built-in alarm systems.
Usage Overview
Detailed usage instructions can be found in the Tutorial.md. Below are some associated tools and instructions for running Caelus:
nm_operator
The nm_operator
tool allows for the execution of YARN commands through remote APIs, aiding in seamless management of batch tasks.
Getting Started
Building and running Caelus is straightforward:
Build
Run the following commands to build the binaries and images, and to perform unit tests:
# Produce binary files under _output/bin/
$ make build
# Create docker image
$ make image
# Execute unit tests
$ make test
Run
To operate Caelus effectively, it's best run on nodes where the kubelet
process resides. Ensure the kubelet’s "root-dir" is configured correctly in the kubelet_root_dir
setting in caelus.json.
# Setting up configuration
$ mkdir -p /etc/caelus/
$ cp hack/config/rules.json /etc/caelus/
# For YARN batch jobs, configure as follows
$ caelus --config=hack/config/caelus.json --v=2
# For Kubernetes batch jobs with optional kubeconfig
$ caelus --config=hack/config/caelus.json --hostname-override=xxx --v=2 --kubeconfig=xxx
# Running in a container
$ docker run -it --cap-add SYS_ADMIN ... ccr.ccs.tencentyun.com/caelus/caelus:v1.0.0 /bin/bash
# Deploying on Kubernetes
$ kubectl create -f hack/yaml/caelus.yaml
$ kubectl label node colation=true
$ kubectl -n kube-system get daemonset
Further Information
For comprehensive guidance on starting with Caelus, refer to the DETAIL.
Contributing to Caelus
For those interested in contributing to the project, instructions for submitting issues or pull requests can be found in Contributing to Caelus.
License
Caelus is distributed under the Apache License 2.0. For more information, see the License file.