Emran Talukder emrantalukder

A method for orchestrating distributed Kafka benchmarks using Ansible

Kafka benchmarks are typically run using a single producer and consumer against a single topic, and the producer and consumer are run at close to maximum write/read speeds. In the real world, a Kafka cluster is more often serving many lower throughput producers and consumers. Ansible allows for a benchmarking method that sets up any number of topics and many producers and consumers.

Ansible playbooks allow us to run a number of tasks against a distributed set of clients both synchronously and asynchronously.

Topic setup

Before we can run tests we need topics to test against. This play sets up a number of topics with various partition configurations:

- name : Setup

	kafka-avro-console-producer \
	--bootstrap-server $BOOTSTRAP_SERVER \
	--producer.config client.config \
	--topic orders-avro \
	--property value.subject.name.strategy=io.confluent.kafka.serializers.subject.RecordNameStrategy \
	--property auto.register.schemas=true \
	--property schema.registry.url=$SR_URL \
	--property basic.auth.credentials.source=USER_INFO \
	--property basic.auth.user.info=SR_KEY:SR_SECERT \
	--property value.schema.file=orders-avro-schema.json < sample-data.json

	#!/bin/bash

	curl -X POST http://localhost:8083/connectors \
	-H 'Content-Type: application/json' \
	-d @- << EOF
	{
	"name": "replicator",
	"config": {
	"connector.class": "io.confluent.connect.replicator.ReplicatorSourceConnector",
	"topic.whitelist": "demo-topic-1",

	wget -qO - https://packages.confluent.io/deb/7.4/archive.key \| sudo apt-key add -
	sudo add-apt-repository "deb [arch=amd64] https://packages.confluent.io/deb/7.4 stable main"
	sudo add-apt-repository "deb https://packages.confluent.io/clients/deb $(lsb_release -cs) main"

	sudo apt-get update
	sudo apt-get install confluent-kafka-mqtt

	package io.confluent.developer;

	import java.util.Properties;

	import org.apache.kafka.clients.producer.KafkaProducer;
	import org.apache.kafka.clients.producer.ProducerConfig;
	import org.apache.kafka.clients.producer.ProducerRecord;

	import com.fasterxml.jackson.databind.JsonNode;
	import com.fasterxml.jackson.databind.ObjectMapper;

	import random
	from pprint import pprint

	n = 100
	event = 10

	data = []
	seen_events = 0

	for i in range(n):

	#!/usr/bin/env bash

	NAMESPACE=$1

	# fetch hostnames
	cnfl_hosts=$(kubectl get pods -n $NAMESPACE --selector clusterId=operator -o=json \| jq '.items[].spec.hostname')

	# output yaml string
	YAML_OUTPUT="cnfl_hosts:"
	CUSTOM_TAB=' '

	d3.sankey = function() {
	var sankey = {},
	nodeWidth = 24,
	nodePadding = 8,
	size = [1, 1],
	nodes = [],
	links = [];

	sankey.nodeWidth = function(_) {
	if (!arguments.length) return nodeWidth;

	import com.amazonaws.services.s3.AmazonS3Client
	import com.amazonaws.services.s3.transfer.TransferManager


	/** upload directory after ETL job completes */
	def uploadDirectory(bucketName: String, bucketPath: String, path: String) = {
	val s3client = new AmazonS3Client()
	val transferManager = new TransferManager(s3client)
	transferManager.uploadDirectory(bucketName, bucketPath, Paths.get(path).toFile, true)
	}

	-- show locks
	SELECT *
	FROM pg_locks l
	JOIN pg_class t ON l.relation = t.oid AND t.relkind = 'r';

	-- kill pid
	SELECT pg_terminate_backend('the_pid');