Skip to content

Instantly share code, notes, and snippets.

View knutin's full-sized avatar

Knut Nesheim knutin

View GitHub Profile
libaio_queue_depth_512: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=512
libaio_queue_depth_512_numjobs_2: (g=1): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=512
libaio_queue_depth_512_numjobs_2: (g=1): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=512
pread_numjobs_1: (g=2): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=1
pread_numjobs_8: (g=3): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=1
...
pread_numjobs_8: (g=3): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=1
pread_numjobs_16: (g=4): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=1
...
pread_numjobs_16: (g=4): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=psync, iodepth=1
i3.2xlarge, 250M keys, 1400-1600 bytes values (so avg is ~1500)
The writes are just updating a value in place (so the file stays the same size)
Throughput is pretty stable around 29k writes per second. With O_DIRECT and aligned reads/writes
we need to first read the value to get the padding, then we can write the new value with the old padding.
CPU usage for the aio thread is pretty stable around 35% system, 7.5% user. Adding more client connections
does not make protostore handle more writes. Using fio (https://gist.github.com/knutin/a8de6443514222fa7ab276bfc853c61d)
I saw this instance doing 14K writes per second, at 1536 bytes value size.
$ fio random-readwrite.fio
libaio_32: (g=0): rw=randread, bs=1536-1536/1536-1536/1536-1536, ioengine=libaio, iodepth=32
libaio_256: (g=1): rw=randread, bs=1536-1536/1536-1536/1536-1536, ioengine=libaio, iodepth=256
libaio_write_32: (g=2): rw=randwrite, bs=1536-1536/1536-1536/1536-1536, ioengine=libaio, iodepth=32
fio-2.1.5
Starting 3 threads
libaio_32: Laying out IO file(s) (1 file(s) / 73728MB)
libaio_256: Laying out IO file(s) (1 file(s) / 73728MB)
libaio_write_32: Laying out IO file(s) (1 file(s) / 73728MB)
Jobs: 1 (f=1): [__w] [60.1% done] [0KB/21535KB/0KB /s] [0/14.4K/0 iops] [eta 04m:00s]
[global]
time_based
runtime=120s
rw=randread
size=72g
directory=/mnt/fio/
blocksize=1536
invalidate=1
thread
Similar benchmark to protostore, same client, but hacked to speak to Redis.
10M keys, randomly distributed between 1024 and 4096. Generated with mk_redis_data.py
using Pythons random.randrange. I think the average size is higher than in previous benchmarks.
I need to redo the benchmark with similar distribution.
Server runs on r3.4xlarge. Redis uses 27G of RAM. With 50 clients it ends up using 100% CPU on one core,
although it semes we are also limited by bandwidth at this point. 243MB/s is the max. With two redis
server instances on the same machine, each with 50 clients, we no longer use all the CPU,
but cannot go higher than 243MB/s. The test client is also running on r3.4xlarge, so I should redo the
test with more client instances, maybe it's possible to squeeze more network throughput out of this machine.
i3.2xlarge, 250M keys, values randomly distributed between 1024 and 4096 bytes. 596GB on NVMe drive.
Two r3.4xlarge clients are able to fully saturate the CPU on the server. iotop shows ~500MB/s read from the drive, iftop shows the same sent over the network.
With a thread for each physical core (i3.2xlarge has 4 cores and 8 processing units in total):
============================
Concurrent clients: 100
Runtime: 120 s
Total requests: 9468513
@knutin
knutin / results.txt
Last active March 24, 2017 00:47
Protostore results, i3.2xlarge, 250M keys, 5GB toc, 596GB data, key size randomly distributed 1k-4k. Client is r3.4xlarge, same az. Best of three runs.
============================
Concurrent clients: 1
Runtime: 60 s
Total requests: 279127
Avg rps: 4652.12
Bytes transferred: 681.34 MB
Bytes per second: 11.36 MB
Roundtrip latencies: 50th: 207us 75th: 218us 90th: 235us 95th: 245us 99th: 266us 99.9: 398us
@knutin
knutin / fio-i3.2xlarge-output.txt
Last active February 24, 2017 12:11
i3.2xlarge
libaio_16: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=16
libaio_128: (g=1): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=128
libaio_128_numjobs2: (g=2): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=128
...
libaio_256: (g=3): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=256
libaio_512: (g=4): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=512
libaio_1024: (g=5): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=1024
libaio_4096: (g=6): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=4096
libaio_16384: (g=7): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=16384
mmap: (g=8): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=mmap, iodepth=1
@knutin
knutin / fio_output.txt
Created February 21, 2017 16:48
i2.xlarge
ubuntu@ip-172-18-219-174:~$ fio random-read.fio
libaio: (g=0): rw=randread, bs=4K-100K/4K-100K/4K-100K, ioengine=libaio, iodepth=16
mmap: (g=1): rw=randread, bs=4K-100K/4K-100K/4K-100K, ioengine=mmap, iodepth=1
pread: (g=2): rw=randread, bs=4K-100K/4K-100K/4K-100K, ioengine=psync, iodepth=1
pread_direct: (g=3): rw=randread, bs=4K-100K/4K-100K/4K-100K, ioengine=psync, iodepth=1
fio-2.2.10
Starting 4 threads
libaio: Laying out IO file(s) (1 file(s) / 73728MB)
mmap: Laying out IO file(s) (1 file(s) / 73728MB)
pread: Laying out IO file(s) (1 file(s) / 73728MB)
@knutin
knutin / genc.ex
Last active April 21, 2016 11:06
defmodule Foo do
use GenC
@spec sum([SomeInt], u64) :: integer
defcfun sum(array, n) do
i = uint64_t(0)
sum = uint64_t(0)
for 1..n do
splice do