Benchmarks¶

The Benchmarking Suite comes with a set of third-party benchmarking tools, each of them with a set of different test configurations ready to be executed. The tools are:

CFD: a tool realized in the CloudPerfect EU project [1] that uses OpenFOAM to run a waterbox simulation. Can be configured with different solvers, number of iterations and write to disk strategies. It is primarily a CPU intensive benchmark;
DaCapo: a tool for Java benchmarking simulating real world applications with non-trivial memory loads. It is mainly a CPU and memory intensive benchmark;
Filebench: a powerful and flexible tool able to generate and execute a variety of filesystem workloads to simulate applications like Web servers, File servers, Video services. It is mainly a Disk intensive benchmark;
Iperf: is a tool for active measurements of the maximum achievable bandwidth on IP networks;
Sysbench: a tool to test CPU, memory, file I/O, mutex performance and MySQL on Linux systems;
YCSB: a tool for database benchmarking that supports several database technologies. In the Benchmarking Suite, tests for Mysql and MongoDB are provided. It is primarily a Disk intensive benchmark;
WebFrameworks: tests common web frameworks workloads like fetching and inserting data in a database or create/parse json objects. It is mainly a Memory and Network intensive benchmark;

The following table summarizes the tools available and their compatibility with different operating system.

Test-OS compatibility matrix¶
Tool	Version	CentOS	Ubuntu 14	Ubuntu 16
CFD	1.0	✗	✓	✗
DaCapo	9.12	✓	✓	✗
Filebench	1.4.9.1	✓	✓	✓
Iperf	2.0.5	✗	✓	✓
Sysbench	2.1.0	✗	✓	✓
YCSB-MySQL	0.12.0	✓	✓	✗
YCSB-MongoDB	0.11.0	✓	✓	✗
WebFrameworks	master	✗	✓	✓

CFD¶

The CFD benchmarking tool has been realized in the context of the CloudPerfect EU project [1] and released open source on GitHub [2]. The tool executes a CFD simulation on a waterbox geometry allowing to customize several parameters in order to simulate different simulations.

The following combination of parameters is used in the Benchmarking Suite tests:

100iterGAMG	100 iterations using the GAMG solver
100iterWriteAtLast	100 iterations using the GAMG solver and not writing intermediate results on the disk
500iterGAMG	500 iterations using the GAMG solver
500iterGAMGWriteAtLast	500 iterations using the GAMG solver and not writing intermediate results on the disk
500iterICCG	500 iterations using the ICCG solver
500iterPCG	500 iterations using the PCG solver

All the tests uses all the CPUs available in the machine.

Metrics¶

Metric	Unit	Description
duration	s	The overall duration of the simulation

DaCapo¶

DaCapo [3] as a tool for Java benchmarking by the programming language, memory management and computer architecture communities. It consists of a set of open source, real world applications with non-trivial memory loads. Tests implemented by the tool are:

DaCapo tests (source: http://www.dacapobench.org/)¶
avrora	simulates a number of programs run on a grid of AVR microcontrollers
batik	produces a number of Scalable Vector Graphics (SVG) images based on the unit tests in Apache Batik
eclipse	executes some of the (non-gui) jdt performance tests for the Eclipse IDE
fop	takes an XSL-FO file, parses it and formats it, generating a PDF file.
h2	executes a JDBCbench-like in-memory benchmark, executing a number of transactions against a model of a banking application, replacing the hsqldb benchmark
jython	inteprets a the pybench Python benchmark
luindex	Uses lucene to indexes a set of documents; the works of Shakespeare and the King James Bible
lusearch	Uses lucene to do a text search of keywords over a corpus of data comprising the works of Shakespeare and the King James Bible
pmd	analyzes a set of Java classes for a range of source code problems
sunflow	renders a set of images using ray tracing
tomcat	runs a set of queries against a Tomcat server retrieving and verifying the resulting webpages
tradebeans	runs the daytrader benchmark via a Jave Beans to a GERONIMO backend with an in memory h2 as the underlying database
tradesoap	runs the daytrader benchmark via a SOAP to a GERONIMO backend with in memory h2 as the underlying database
xalan	transforms XML documents into HTML

Each test is executed multiple times, until the exectuions duration converge (variance is <= 3.0 in the latest 3 executions).

Metrics¶

Metric	Unit	Description
timed_duration	ms	the duration of the latest execution
warmup_iters	num	the number of executions that were necessary to converge

Filebench¶

Filebench [4] is a very powerful tool able to generate a variety of filesystem- and storage-based workloads. It implements a set of basic primitives like createfile, readfile, mkdir, fsync, … and provide a language (the Workload Model Language - WML) to combine these primitives in complex workloads.

In the Benchmarking Suite, a set of pre-defined workloads have been used to simulate different services:

Filebench workloads (source: https://github.com/filebench/filebench/wiki/Predefined-personalities)¶
fileserver	Emulates simple file-server I/O activity. This workload performs a sequence of creates, deletes, appends, reads, writes and attribute operations on a directory tree. 50 threads are used by default. The workload generated is somewhat similar to SPECsfs.
webproxy	Emulates I/O activity of a simple web proxy server. A mix of create-write-close, open-read-close, and delete operations of multiple files in a directory tree and a file append to simulate proxy log. 100 threads are used by default.
webserver	Emulates simple web-server I/O activity. Produces a sequence of open-read-close on multiple files in a directory tree plus a log file append. 100 threads are used by default.
videoserver	This workloads emulates a video server. It has two filesets: one contains videos that are actively served, and the second one has videos that are available but currently inactive. One thread is writing new videos to replace no longer viewed videos in the passive set. Meanwhile $nthreads threads are serving up videos from the active video fileset.
varmail	Emulates I/O activity of a simple mail server that stores each e-mail in a separate file (/var/mail/ server). The workload consists of a multi-threaded set of create-append-sync, read-append-sync, read and delete operations in a single directory. 16 threads are used by default. The workload generated is somewhat similar to Postmark but multi-threaded.

Metrics¶

Metric	Unit	Description
duration	s	The overall duration of the test
ops	num	The sum of all operations (of any type) executed
ops_throughput	ops/s	The average number of operations executed per second
throughput	MB/s	The average number of MBs written/read during the test
cputime	µs	The average cpu time taken by each operation
latency_avg	µs	The average duration of each operation

Iperf¶

IPerf [5] is a benchmarking tool to measure the maximum achievable bandwidth on IP networks. It provides statistics both for TCP and UDP protocols.

In the Benchmarking Suite, the following pre-defined workloads have been created:

tcp_10_1	transfer data over a single TCP connections for 10 seconds
tcp_10_10	transfer data over 10 parallel TCP connections for 10 seconds
udp_10_1_1	transfer UDP packets over a single connection with a maximum bandwidth limited at 1MBit/s
udp_10_1_10	transfer UDP packets over a single connection with a maximum bandwidth limited at 10MBit/s
udp_10_10_10	transfer UDP packets over 10 parallel connections with a maximum bandwidth limited at 1MBit/s

Metrics¶

For the TCP workloads:

Metric	Unit	Description
duration	s	The overall duration of the test
transferred_x	bytes	data transferred for the connection x
bandwidth_x	bit/s	bandwidth fo the connection x
transferred_sum	bytes	sum of data transferred in all connections
bandwidth_sum	bit/s	sum of bandwidth of all connections

For the UDP workloads:

Metric	Unit	Description
duration	s	The overall duration of the test
transferred_x	bytes	data transferred over connection x
bandwidth_x	bit/s	bandwidth of connection x
total_datagrams_x	num	number of UDP packets sent over connection x
lost_datagrams_x	num	number of lost UDP packets over connection x
jitter_x	ms	latency of connection x
outoforder_x	num	number of packets received by the server in the wrong order
transferred_avg	bytes	average data transferred by each connection
bandwidth	bit/s	average bandwidth of each connection
total_datagrams_avg	num	average number of packets sent over each connection
lost_datagrams_avg	num	average number of packets lost for each connection
jitter_avg	ms	average latency
outoforder_avg	num	average number of packets received in the wrong order

Sysbench¶

SysBench [6] is a modular, cross-platform and multi-threaded benchmark tool for evaluating CPU, memory, file I/O, mutex performance, and even MySQL benchmarking. At the moment, in the Benchmarking Suite only the CPU benchmarking capabilities are integrated.

cpu_10000

Verifies prime numbers between 0 and 20000 by doing standard division of the number by all numbers between 2 and the square root of the number. This is repeated 1000 times and using 1, 2, 4, 8, 16 and 32 threads

Metrics¶

Metric	Unit	Description
events_rate_X	num/s	the number of times prime numbers between 0 and 20000 are verified each second with X threads
total_time_X	s	total number of seconds it took to execute the 1000 cycles with X threads
latency_min_X	ms	minimum time it took for a cycle
latency_max_X	ms	maximum time it took for a cycle
latency_avg_X	ms	average time the 1000 cycles took. It gives a good measure of the cpu speed
latency_95_X	ms	95th percentile of the latency times.

YCSB¶

YCSB [7] is a database benchmarking tool. It has the support for several database technologies and provides a configuration mechanism to simulate different usages.

In the Benchmarking Suite, YCSB is used to benchmark two of the most popular database servers: MySQL and MongoDB.

For each database, the following workloads are executed:

workloada	Simulates an application that performs read and update operations with a ratio of 50/50 (e.g. recent actions recording)
workloadb	Simulates an application that performs read and update operations with a ratio of 95/5 (e.g. photo tagging)
workloadc	Simulates a read-only databases (100% read operations)
workloadd	Simulates an application that performs read and insert operations with a ratio of 95/5 (e.g. user status update)
workloade	Simulates an application that performs scan and insert operations with a ratio of 95/5 (e.g. threaded conversations)
workloadf	Simulates an application that performs read and read-modify-write operations with a ratio of 50/50 (e.g. user database)

Metrics¶

Metric	Unit	Description
duration	s	The overall duration of the test
read_ops	num	THe number of read operations executed
read_latency_avg	µs	The average latency of the read operations
read_latency_min	µs	The minimum latency of the read operations
read_latency_max	µs	The maximum latency of the read operations
read_latency_95	µs	The maximum latency for the 95% of the read operations
read_latency_99	µs	The maximum latency for the 99% of the read operations
insert_ops	num	THe number of insert operations executed
insert_latency_avg	µs	The average latency of the insert operations
insert_latency_min	µs	The minimum latency of the insert operations
insert_latency_max	µs	The maximum latency of the insert operations
insert_latency_95	µs	The maximum latency for the 95% of the insert operations
insert_latency_99	µs	The maximum latency for the 99% of the insert operations
update_ops	num	THe number of update operations executed
update_latency_avg	µs	The average latency of the update operations
update_latency_min	µs	The minimum latency of the update operations
update_latency_max	µs	The maximum latency of the update operations
update_latency_95	µs	The maximum latency for the 95% of the update operations
update_latency_99	µs	The maximum latency for the 99% of the update operations

WebFrameworks¶

This is an open source tool [8] used to compare many web application frameworks executing fundamental tasks such as JSON serialization, database access, and server-side template composition. The tool has been developed and it is used to run the tests that generate the results available at: https://www.techempower.com/benchmarks/.

Currently, in the Benchmarking Suite the framework supported are: Django, Spring, CakePHP, Flask, FastHttp and NodeJS.

For each framework the following tests are executed:

Test types (source: https://www.techempower.com/benchmarks/#section=code&hw=ph)¶
json	This test exercises the framework fundamentals including keep-alive support, request routing, request header parsing, object instantiation, JSON serialization, response header generation, and request count throughput.
query	This test exercises the framework’s object-relational mapper (ORM), random number generator, database driver, and database connection pool.
fortunes	This test exercises the ORM, database connectivity, dynamic-size collections, sorting, server-side templates, XSS countermeasures, and character encoding.
db	This test uses a testing World table. Multiple rows are fetched to more dramatically punish the database driver and connection pool. At the highest queries-per-request tested (20), this test demonstrates all frameworks’ convergence toward zero requests-per-second as database activity increases.
plaintext	This test is an exercise of the request-routing fundamentals only, designed to demonstrate the capacity of high-performance platforms in particular. Requests will be sent using HTTP pipelining.
update	This test exercises the ORM’s persistence of objects and the database driver’s performance at running UPDATE statements or similar. The spirit of this test is to exercise a variable number of read-then-write style database operations.

For the types json, query, fortunes and db the tool executes six different burst of requests. Each burst last 15 seconds and have a different concurrency level (number of requests done concurrently): 16, 32, 64, 128, 256 and 512.

For the type plaintext, the tool executes four burst of 15 seconds each with the following concurrency levels: 256, 1024, 4096 and 16384.

For the type update, the tool executes five burst of 15 seconds each with a 512 concurrency level, but different number of queries to perform: 1, 5, 10, 15 and 20.

Metrics¶

Metric	Unit	Description
duration	s	The overall duration of the test
duration_N	s	The overall duration for the N concurrency level*. It is fixed to 15 seconds by default
totalRequests_N	num	The overall number of requests processed during the 15 seconds test at the N concurrency level*
timeout_N	num	The number of requests that went in timeout for the N concurrency level*
latencyAvg_N	s	the average latency between a request and its response for the N concurrency level*
latencyMax_N	s	the maximum latency between a request and its response for the N concurrency level*
latencyStdev_N	s	the standard deviation measure for the latency for the N concurrency level*

Adding a new benchmarking tool¶

In addition to the benchmarking tests coming with the standard Benchmarking Suite release, it is possible to add new benchmarking tools by providing a configuration file to instruct the Benchmarking Suite how to install, configure and execute the tool.

The configuration file must contain on section [DEFAULT] with the commands to install and execute the benchmarking tool, plus one or more sections that define different sets of input parameters to the tool. In this way, it is possible to execute the same tool to generate multiple workloads.

[DEFAULT]
class = benchsuite.stdlib.benchmark.vm_benchmark.BashCommandBenchmark


#
# install, install_ubuntu, install_centos_7 are all valid keys
install_<platform> =
    echo "these are the..."
    echo "...install %(option1)s commands"

execute_<platform> =
    echo "execute commands"

cleanup =
    echo "commands to cleanup the %(option2)s environment"



[workload_1]
option1 = value1
option2 = value

[workload_n]
option1 = value1
option2 = valueN

For instance, a very minimal configuration file to integrate the Sysbench [6] benchmarking tool is shown below:

[DEFAULT]
class = benchsuite.stdlib.benchmark.vm_benchmark.BashCommandBenchmark

install =
    curl -s https://packagecloud.io/install/repositories/akopytov/sysbench/script.deb.sh | sudo bash
    sudo apt-get -yq install sysbench
    sysbench %(test)s prepare %(options)s

execute =
    sysbench %(test)s run %(options)s --time=0

cleanup =
    sysbench %(test)s cleanup %(options)s

[cpu_workload1]
    test = cpu
    options = --cpu-max-prime=20000 --events=10000

Configuration files of the benchmarks included in the Benchmarking Suite releases can be used as starting point and are available here [10]

Managing benchmarking tools through the GUI¶

The Benchmarking Suite comes with a set of third-party, widely-known, open source benchmarking workloads (e.g. SysBench, FileBench, DaCapo, Web Framework Benchmarking). These workloads are available to any registered user and are encouraged to be used to enable comparability of results along time and across providers and users. However, to support specific user requirements, custom workloads can be defined and, according to the user choice, shared with others or kept private. As with the CLI, workloads registered in the Benchmarking Suite are typically a wrapper around existing benchmarking applications for which the registration process should provide installation, execution and results parsing capabilities.

When the Benchmarking Suite is used through the web interface, benchmarking tools (a.k.a. workloads) can be added and edited as well.

From the ‘Workload’ panel, a new workload can be added via the ‘New Workload’ button which shows a form asking for metadata already presented in the previous section.

Similarly, once a workload has been selected, it can be modified, cloned into a new one or deleted (provided you have enough permissions on it).

Workload metadata¶

Workload name is a name, not necessarily unique, given to the workload;
Tool name is the name of the tool providing the given workload;
Workload ID is a unique identifier provided by the system, and is not modifiable;
Description is to tell what the workload does, what parameter is measured, and any other useful detail about the the workload;
Categories allows to specify some tags to ease search;
Abstract is to mark workload ‘templates’ not meant for execution but only for specialization;
Parent workload is a base workload definition from which properties/commands are inherited. The current workload only needs to specialize some of them. A typical usage of this feature is to define multiple workloads provided by a single tool;

Workload execution¶

Install scripts are executed just after the provisioning of the virtual machine, usually to download and install a benchmarking tool;
Post-create scripts are executed after the provisioning of the virtual machine to perform some general initialization (e.g. configure the DNS);
Execute scripts are executed to perform the benchmark of the environment;
Cleanup scripts are executed for multi-benchmark execution to ensure a clean environment for following benchmarks;
User and Support scripts and Workload parameters are meant to be used in above scripts for readability and better coding style;

Install, post-create, execute and cleanup scripts can be specialized for a specific operating system, as well as for specific version of the operating system. This is achieved through a mechanism matching the environment on the target VM with the name of the script.

As an example, when executing the workload on a VM running Debian 10.8, the following install scripts are searched:

install_debian_10_8
install_debian_10
install_debian
install

The first matched script (i.e. the most-specific one) is executed; the others are ignored.

Exporting workloads¶

A JSON representation of the workload can be generated for offline inspection/editing. This can be done for all visible workloads (hit the ‘Export All’ button) or for individual workloads (hit the ‘Export’ button when viewing the workload).

When exported workloads are linked by some inheritance relationship, you can decide to export them so that:

the hierarchy is preserved;
inherited properties are collapsed into most-specific workloads (i.e. the hierarchy is lost).

[1]	(1, 2) CloudPerect project homepage: http://cloudperfect.eu/

[2]	CFD Benchmark Case code: https://github.com/benchmarking-suite/cfd-benchmark-case

[3]	DaCapo homepage: http://www.dacapobench.org/

[4]	Filebench homepage: https://github.com/filebench/filebench/wiki

[5]	IPerf homepage: https://iperf.fr/

[6]	(1, 2) Sysbench homepage: https://github.com/akopytov/sysbench

[7]	YCSB homepage: https://github.com/brianfrankcooper/YCSB/wiki

[8]	Web Framewoks Benchmarking code: https://github.com/TechEmpower/FrameworkBenchmarks

[10]	Benchmark configuartion files: https://github.com/benchmarking-suite/benchsuite-stdlib/tree/master/data/benchmarks

Benchmarks¶

CFD¶

Metrics¶

DaCapo¶

Metrics¶

Filebench¶

Metrics¶

Iperf¶

Metrics¶

Sysbench¶

Metrics¶

YCSB¶

Metrics¶

WebFrameworks¶

Metrics¶

Adding a new benchmarking tool¶

Managing benchmarking tools through the GUI¶

Workload metadata¶

Workload execution¶

Sharing workloads¶

Exporting workloads¶