Event-Driven Autoscaling with KEDA: Scaling Kubernetes Applications in Real Time with AWS SQS, Kafka, and CPU Metrics

8 min readSep 14, 2024

KEDA General Architecture —Diagram 1.0.0

In today’s dynamic application environments, scaling based on real-time demand is critical. Traditional autoscaling methods in Kubernetes, which typically rely on CPU or memory usage, might not always be sufficient, especially when workload demand is driven by external events like incoming messages, metrics, or data streams. This is where KEDA (Kubernetes-based Event Driven Autoscaling) steps in.

KEDA is a powerful Kubernetes operator that extends the native autoscaling capabilities of Kubernetes by allowing applications to scale based on external events. Whether you’re dealing with message queues, event streams, or custom metrics, KEDA ensures that your applications can automatically scale in response to these triggers, optimizing resource usage and ensuring responsiveness.

In this article, we’ll explore why KEDA is a game-changer for Kubernetes environments. We’ll cover its installation, delve into real-world use cases such as AWS SQS, Apache Kafka, Prometheus, and Kubernetes workloads, and demonstrate how KEDA can be seamlessly integrated into your scaling strategy. By the end of this article, you’ll be equipped with the knowledge to implement event-driven autoscaling in your Kubernetes clusters effectively.

Prerequisites

Familiarity with Kubernetes fundamentals
Experience with Helm
Familarity with AWS SQS and Kafka

Road Map

Key Reasons to Use KEDA
Reviewing KEDA Architecture, CRDs of KEDA and Use Cases
Setting Up the Environment
Implementing Use Case — AWS SQS
Implementing Use Case — Kafka
Implementing Use Case — CPU Utilization

Key Reasons to Use KEDA

Event-Driven Scaling: KEDA allows your applications to scale up or down in response to external events, ensuring they can handle changes in demand as they happen. For example, if a large number of messages suddenly appear in a queue, KEDA can automatically increase the number of pods processing those messages to keep up with the demand.
Custom Scaling Triggers: Unlike traditional autoscaling methods that rely only on CPU or memory usage, KEDA enables you to define custom triggers based on the specific needs of your application. Whether it’s the length of a message queue, a metric from a monitoring system, or an event from a cloud service, KEDA lets you set up scaling based on what really matters to your application.
Optimized Resource Usage: By scaling only when necessary and based on the actual demands of your application, KEDA helps you optimize resource usage. This means that during periods of low demand, your application can scale down, saving resources and reducing costs.
Broad Event Source Support: KEDA supports a wide range of event sources, from cloud services like AWS SQS and Azure Queue Storage to on-premises solutions like Apache Kafka and Prometheus. This flexibility allows you to use KEDA in various environments and scenarios, making it a versatile tool for managing application scaling.
Seamless Integration with Kubernetes: KEDA works alongside Kubernetes’ native autoscaling features, providing a seamless experience. It extends the functionality of Kubernetes without requiring you to change how you manage your applications, making it easy to integrate into your existing infrastructure.

Using KEDA in your Kubernetes environment ensures that your applications are not only responsive to changes in demand but also run efficiently, making the most of your resources.

Reviewing KEDA Architecture, CRDs of KEDA and Use Cases

Architecture

Based on the architecture (Diagram-1.0.0), KEDA performs three key roles in a Kubernetes cluster:

Agent: KEDA enables applications to scale down to zero when there are no events and automatically scales them back up when events occur. This functionality is managed by the keda-operator container, which runs as part of KEDA's deployment.
Metrics: KEDA acts as a metrics server within Kubernetes, providing important event-based data like queue lengths or stream lags to the Horizontal Pod Autoscaler (HPA). This data helps drive scaling decisions. The actual event processing is handled by the application, preserving direct integration with the event sources. The metrics serving role is performed by the keda-operator-metrics-apiserver container.
Admission Webhooks: KEDA uses admission webhooks to automatically validate configuration changes and prevent errors. For example, it can block multiple ScaledObject resources from targeting the same application, ensuring configurations are optimized and best practices are followed.

Key KEDA CRDs

In our environment, we’ll be implementing three use cases to showcase how KEDA can handle different event sources and scaling scenarios:

AWS SQS: Autoscaling based on the number of messages in an AWS SQS queue.
Kafka: Autoscaling based on the lag in a Kafka topic.
CPU Utilization: Scaling workloads based on CPU usage to handle heavy computational loads.

To achieve this, KEDA provides several important Custom Resource Definitions (CRDs) that define how the scaling will work.

ScaledObject: This is the primary resource used in KEDA. It connects your workload (like a Deployment) to an external event source and defines the scaling behavior. For each use case, we’ll create a ScaledObject that links to AWS SQS, Kafka, and CPU metrics. It includes settings like the minimum and maximum number of replicas and the specific scaling triggers.
TriggerAuthentication: This resource is used to securely provide credentials and secrets for external services (e.g., AWS SQS, Kafka) that your application will interact with. Instead of hard coding credentials into your ScaledObject, you can reference a TriggerAuthentication resource to safely manage sensitive information.

Use Cases

In our environment, we’ll be implementing three use cases to showcase how KEDA can handle different event sources and scaling scenarios:

AWS SQS: Autoscaling based on the number of messages in an AWS SQS queue.
Kafka: Autoscaling based on the lag in a Kafka topic.
CPU Utilization: Scaling workloads based on CPU usage to handle heavy computational loads.

To achieve this, KEDA provides several important Custom Resource Definitions (CRDs) that define how the scaling will work.

Setting Up the Environment

To get started with event-driven autoscaling using KEDA, we’ll be working with a demo repository that contains everything you need to simulate real-world use cases. You can find the repository at the following link: KEDA Scaling Demo.

This repository is built using Python and includes two main scripts:

Producer Script: This script is responsible for generating data and sending it to AWS SQS and Apache Kafka. It simulates an incoming stream of messages that our applications will consume.
Consumer Script: The consumer script handles reading data from both AWS SQS and Kafka. It also includes a CPU load test feature, which allows us to scale the application based on CPU usage, simulating a heavy workload scenario.

In addition to the Python scripts, the repository contains all the necessary Kubernetes YAML files to set up and test the use cases. These include resources for AWS SQS, Apache Kafka, and scaling based on CPU load.

Steps to Follow:

Checkout the repository.

git clone https://github.com/menendes/keda-scaling-demo.git
cd keda-scaling-demo

2. Installing KEDA

helm repo add kedacore https://kedacore.github.io/charts
helm repo update
helm install keda kedacore/keda --namespace keda --create-namespace

Implementing Use Case — AWS SQS

Steps to Implement AWS SQS Scaling:

Configure AWS SQS Queue: First, ensure that you have an AWS SQS queue created in your AWS account. You will need the SQS queue URL and AWS credentials for the setup. Be sure that your account is granted to read and write data to SQS otherwise you can not access to AWS SQS.
In kubernetes/config-map.yaml file; set the SQS_QUEUE_URL, AWS_REGION, and RUN_TYPE parameters. RUN_TYPE parameter value should be “sqs” for this use case.
Set your AWS credentials within the kubernetes/sqs-secrets.yaml file. It should be base64 format. To convert credentials to base64 format follow the command in below.

echo <access_key_id> | base64
echo <secret_access_key> | base64

4. Now you are ready to simulate the scaling scenario. Apply resources with kustomization file and lets observe the result.

kubectl apply -k .

Scaled Object Resource Observation When Event Start and End

Initially, according to the first line, the Active status of our ScaledObject is 'False,' meaning the scaling mechanism is not active at that moment. After some time, the status changes to 'True,' indicating that the scaling mechanism has been activated. Now, let's continue by reviewing the other resources, including the HPA (HorizontalPodAutoscaler) and Pods.

HPA Resource Observation When Event Start and End

The HPA resource is automatically created and managed by KEDA. Since we specified a minimum pod count of 1 and a maximum pod count of 10 in the ScaledObject resource, the HPA configuration is adjusted based on this. After some time, you'll observe that the replica count increases and eventually reaches the maximum pod count of 10. As the load in the SQS decreases, you'll also notice that the replica count reduces accordingly. Congratulations! You've built an event-driven autoscaling mechanism :)

Pods Observation When Event Start and End

New pods are created as data is being loaded into SQS, and they are terminated once the load decreases. Be aware that the creation and termination steps are adjusted by the HPA, which is managed by KEDA through the ScaledObject.

Implementing Use Case — Kafka

For the Kafka scenario, just change the RUN_TYPE parameter value in the config-map.yaml file to 'kafka' and reapply the resources using the command below.

kubectl apply -k .

When you adjust the producer’s replica count, for example by increasing it, you will observe that the consumer pods automatically scale up. Similarly, when you reduce the data production to Kafka, the number of consumer pods will decrease as well. The overall behavior will be similar to the AWS SQS example.

Implementing Use Case — CPU Utilization

To simulate the autoscaling mechanism based on CPU utilization, change the RUN_TYPE value to 'cpu' in the config-map.yaml file and reapply the resources using the same command. Note that in this scenario, we don't need the producer, so you can delete the producer deployment.

Since we are monitoring CPU utilization to manage autoscaling, you will need a metrics server to observe CPU usage for your target resource. The ScaledObject reads data from the metrics server and scales up or down based on that data. If you are using Minikube, you can easily install the metrics server with the command below.

minikube addons enable metrics-server

After applying the kustomization.yaml file and observing the HPA resource, your output should look like the screenshot below. The replica count will increase based on CPU utilization.

HPA Observation When CPU Load Event Occured

Conclusion

We explored how KEDA enables event-driven autoscaling in Kubernetes, allowing applications to dynamically adjust based on real-world demands. By integrating with external systems like AWS SQS, Apache Kafka, and leveraging CPU utilization metrics, KEDA ensures that your Kubernetes workloads scale efficiently and responsively. Through practical examples, we demonstrated how to set up KEDA in your environment and how to configure scaling for different use cases. As businesses grow and applications evolve, event-driven autoscaling with KEDA is a powerful tool that provides both flexibility and resource optimization in modern cloud-native environments.

Github : https://github.com/menendes

Linkedin : https://www.linkedin.com/in/ibrahim-halil-koyuncu-b1030516a/

KEDA

Application autoscaling made simple

keda.sh