Apache Kafka benchmarks
This tutorial shows you how to run OpenMessaging benchmarks for Apache Kafka. You can currently deploy to the following platforms:
- Amazon Web Services (AWS)
Initial setup
To being, you’ll need to clone the benchmark
repo from the openmessaging
organization on GitHub:
$ git clone https://github.com/openmessaging/openmessaging-benchmark
$ cd openmessaging-benchmark
You’ll also need to have Maven installed.
Create local artifacts
The current Kafka client version is
2.8.1
. In order to downgrade it edit the./driver-kafka/pom.xml
file.
Once you have the repo cloned locally, you can create all the artifacts necessary to run the benchmarks with a single Maven command:
$ mvn install
Deploy a Kafka cluster on Amazon Web Services
You can deploy a Kafka cluster on AWS (for benchmarking purposes) using Terraform and Ansible. You’ll need to have both of those tools installed as well as the terraform-inventory
plugin for Terraform.
In addition, you’ll need to:
- Create an AWS account (or use an existing account)
- Install the
aws
CLI tool - Configure the
aws
CLI tool
SSH keys
Once you’re all set up with AWS and have the necessary tools installed locally, you’ll need to create both a public and a private SSH key at ~/.ssh/kafka_aws
(private) and ~/.ssh/kafka_aws.pub
(public), respectively.
$ ssh-keygen -f ~/.ssh/kafka_aws
When prompted to enter a passphrase, simply hit Enter twice. Then, make sure that the keys have been created:
$ ls ~/.ssh/kafka_aws*
Create resources using Terraform
With SSH keys in place, you can create the necessary AWS resources using just a few Terraform commands:
$ cd driver-kafka/deploy
$ terraform init
$ terraform apply
This will install the following EC2 instances (plus some other resources, such as a Virtual Private Cloud (VPC)):
Resource | Description | Count |
---|---|---|
Kafka instances | The VMs on which a Kafka broker will run | 3 |
ZooKeeper instances | The VMs on which a ZooKeeper node will run | 3 |
Client instance | The VM from which the benchmarking suite itself will be run | 4 |
When you run terraform apply
, you will be prompted to type yes
. Type yes
to continue with the installation or anything else to quit.
Once the installation is complete, you will see a confirmation message listing the resources that have been installed.
Variables
There’s a handful of configurable parameters related to the Terraform deployment that you can alter by modifying the defaults in the terraform.tfvars
file.
Variable | Description | Default |
---|---|---|
region |
The AWS region in which the Kafka cluster will be deployed | us-west-2 |
public_key_path |
The path to the SSH public key that you’ve generated | ~/.ssh/kafka_aws.pub |
ami |
The Amazon Machine Image (AWI) to be used by the cluster’s machines | ami-9fa343e7 |
instance_types |
The EC2 instance types used by the various components | i3.4xlarge (Kafka brokers), t2.small (ZooKeeper), c4.8xlarge (benchmarking client) |
If you modify the public_key_path
, make sure that you point to the appropriate SSH key path when running the Ansible playbook.
Running the Ansible playbook
With the appropriate infrastructure in place, you can install and start the Kafka cluster using Ansible with just one command:
$ ansible-playbook \
--user ec2-user \
--inventory `which terraform-inventory` \
deploy.yaml
If you’re using an SSH private key path different from ~/.ssh/kafka_aws
, you can specify that path using the --private-key
flag, for example --private-key=~/.ssh/my_key
.
SSHing into the client host
In the output produced by Terraform, there’s a client_ssh_host
variable that provides the IP address for the client EC2 host from which benchmarks can be run. You can SSH into that host using this command:
$ ssh -i ~/.ssh/kafka_aws ec2-user@$(terraform output client_ssh_host)
Running the benchmarks from the client hosts
The benchmark scripts can be run from the /opt/benchmark working directory.
Once you’ve successfully SSHed into the client host, you can run any of the existing benchmarking workloads by specifying the YAML file for that workload when running the benchmark
executable. All workloads are in the workloads
folder. Here’s an example:
$ sudo bin/benchmark \
--drivers driver-kafka/kafka.yaml \
workloads/1-topic-16-partitions-1kb.yaml
Although benchmarks are run from a specific client host, the benchmarks are run in distributed mode, across multiple client hosts.
There are multiple Kafka “modes” for which you can run benchmarks. Each mode has its own YAML configuration file in the driver-kafka
folder.
Mode | Description | Config file |
---|---|---|
Standard | Kafka with message idempotence disabled (at-least-once semantics) | kafka.yaml |
Exactly once | Kafka with message idempotence enabled (“exactly-once” semantics) | kafka-exactly-once.yaml |
Sync | Kafka with durability enabled (all published messages synced to disk) | kafka-sync.yaml |
The example used the “standard” mode as configured in driver-kafka/kafka.yaml
. Here are some examples for the other modes:
# Exactly once
$ sudo bin/benchmark \
--drivers driver-kafka/kafka-exactly-once.yaml \
workloads/1-topic-16-partitions-1kb.yaml
# Sync
$ sudo bin/benchmark \
--drivers driver-kafka/kafka-sync.yaml \
workloads/1-topic-16-partitions-1kb.yaml
Specify client hosts
By default, benchmarks will be run from the set of hosts created by Terraform. You can also specify a comma-separated list of client hosts using the --workers
flag (or -w
for short):
$ sudo bin/benchmark \
--drivers driver-kafka/kafka-exactly-once.yaml \
--workers 1.2.3.4:8080,4.5.6.7:8080 \ # or -w 1.2.3.4:8080,4.5.6.7:8080
workloads/1-topic-16-partitions-1kb.yaml
Downloading your benchmarking results
The OpenMessaging benchmarking suite stores results in JSON files in the /opt/benchmark
folder on the client host from which the benchmarks are run. You can download those results files onto your local machine using scp
. You can download all generated JSON results files using this command:
$ scp -i ~/.ssh/kafka_aws ec2-user@$(terraform output client_ssh_host):/opt/benchmark/*.json .
Tearing down your benchmarking infrastructure
Once you’re finished running your benchmarks, you should tear down the AWS infrastructure you deployed for the sake of saving costs. You can do that with one command:
$ terraform destroy -force
Make sure to let the process run to completion (it could take several minutes). Once the tear down is complete, all AWS resources that you created for the Kafka benchmarking suite will have been removed.
Documentation
Benchmarks