Benchmarks¶
The Benchmarking Suite comes with a set of third-party benchmarking tools, each of them with a set of different test configurations ready to be executed. The tools are:
- CFD: a tool realized in the CloudPerfect EU project [1] that uses OpenFOAM to run a waterbox simulation. Can be configured with different solvers, number of iterations and write to disk strategies. It is primarily a CPU intensive benchmark;
- DaCapo: a tool for Java benchmarking simulating real world applications with non-trivial memory loads. It is mainly a CPU and memory intensive benchmark;
- Filebench: a powerful and flexible tool able to generate and execute a variety of filesystem workloads to simulate applications like Web servers, File servers, Video services. It is mainly a Disk intensive benchmark;
- Iperf: is a tool for active measurements of the maximum achievable bandwidth on IP networks;
- Sysbench: a tool to test CPU, memory, file I/O, mutex performance and MySQL on Linux systems;
- YCSB: a tool for database benchmarking that supports several database technologies. In the Benchmarking Suite, tests for Mysql and MongoDB are provided. It is primarily a Disk intensive benchmark;
- WebFrameworks: tests common web frameworks workloads like fetching and inserting data in a database or create/parse json objects. It is mainly a Memory and Network intensive benchmark;
The following table summarizes the tools available and their compatibility with different operating system.
Tool | Version | CentOS | Ubuntu 14 | Ubuntu 16 | Ubuntu 18 | Ubuntu 20 | Debian |
---|---|---|---|---|---|---|---|
CFD | 1.0 | ✗ | ✓ | ✗ | |||
DaCapo | 9.12 | ✓ | ✓ | ✗ | |||
Filebench | 1.4.9.1 | ✓ | ✓ | ✓ | |||
Iperf | 2.0.5 | ✗ | ✓ | ✓ | |||
Sysbench | 2.1.0 | ✗ | ✓ | ✓ | |||
YCSB-MySQL | 0.12.0 | ✓ | ✓ | ✗ | |||
YCSB-MongoDB | 0.11.0 | ✓ | ✓ | ✗ | |||
WebFrameworks | master | ✗ | ✓ | ✓ |
CFD¶
The CFD benchmarking tool has been realized in the context of the CloudPerfect EU project [1] and released open source on GitHub [2]. The tool executes a CFD simulation on a waterbox geometry allowing to customize several parameters in order to simulate different simulations.
The following combination of parameters is used in the Benchmarking Suite tests:
100iterGAMG | 100 iterations using the GAMG solver |
100iterWriteAtLast | 100 iterations using the GAMG solver and not writing intermediate results on the disk |
500iterGAMG | 500 iterations using the GAMG solver |
500iterGAMGWriteAtLast | 500 iterations using the GAMG solver and not writing intermediate results on the disk |
500iterICCG | 500 iterations using the ICCG solver |
500iterPCG | 500 iterations using the PCG solver |
All the tests uses all the CPUs available in the machine.
Metrics¶
Metric | Unit | Description |
---|---|---|
duration | s | The overall duration of the simulation |
DaCapo¶
DaCapo [3] as a tool for Java benchmarking by the programming language, memory management and computer architecture communities. It consists of a set of open source, real world applications with non-trivial memory loads. Tests implemented by the tool are:
avrora | simulates a number of programs run on a grid of AVR microcontrollers |
batik | produces a number of Scalable Vector Graphics (SVG) images based on the unit tests in Apache Batik |
eclipse | executes some of the (non-gui) jdt performance tests for the Eclipse IDE |
fop | takes an XSL-FO file, parses it and formats it, generating a PDF file. |
h2 | executes a JDBCbench-like in-memory benchmark, executing a number of transactions against a model of a banking application, replacing the hsqldb benchmark |
jython | inteprets a the pybench Python benchmark |
luindex | Uses lucene to indexes a set of documents; the works of Shakespeare and the King James Bible |
lusearch | Uses lucene to do a text search of keywords over a corpus of data comprising the works of Shakespeare and the King James Bible |
pmd | analyzes a set of Java classes for a range of source code problems |
sunflow | renders a set of images using ray tracing |
tomcat | runs a set of queries against a Tomcat server retrieving and verifying the resulting webpages |
tradebeans | runs the daytrader benchmark via a Jave Beans to a GERONIMO backend with an in memory h2 as the underlying database |
tradesoap | runs the daytrader benchmark via a SOAP to a GERONIMO backend with in memory h2 as the underlying database |
xalan | transforms XML documents into HTML |
Each test is executed multiple times, until the exectuions duration converge (variance is <= 3.0 in the latest 3 executions).
Metrics¶
Metric | Unit | Description |
---|---|---|
timed_duration | ms | the duration of the latest execution |
warmup_iters | num | the number of executions that were necessary to converge |
Filebench¶
Filebench [4] is a very powerful tool able to generate a variety of filesystem- and storage-based workloads. It implements a set of basic primitives like createfile, readfile, mkdir, fsync, … and provide a language (the Workload Model Language - WML) to combine these primitives in complex workloads.
In the Benchmarking Suite, a set of pre-defined workloads have been used to simulate different services:
fileserver | Emulates simple file-server I/O activity. This workload performs a sequence of creates, deletes, appends, reads, writes and attribute operations on a directory tree. 50 threads are used by default. The workload generated is somewhat similar to SPECsfs. |
webproxy | Emulates I/O activity of a simple web proxy server. A mix of create-write-close, open-read-close, and delete operations of multiple files in a directory tree and a file append to simulate proxy log. 100 threads are used by default. |
webserver | Emulates simple web-server I/O activity. Produces a sequence of open-read-close on multiple files in a directory tree plus a log file append. 100 threads are used by default. |
videoserver | This workloads emulates a video server. It has two filesets: one contains videos that are actively served, and the second one has videos that are available but currently inactive. One thread is writing new videos to replace no longer viewed videos in the passive set. Meanwhile $nthreads threads are serving up videos from the active video fileset. |
varmail | Emulates I/O activity of a simple mail server that stores each e-mail in a separate file (/var/mail/ server). The workload consists of a multi-threaded set of create-append-sync, read-append-sync, read and delete operations in a single directory. 16 threads are used by default. The workload generated is somewhat similar to Postmark but multi-threaded. |
Metrics¶
Metric | Unit | Description |
---|---|---|
duration | s | The overall duration of the test |
ops | num | The sum of all operations (of any type) executed |
ops_throughput | ops/s | The average number of operations executed per second |
throughput | MB/s | The average number of MBs written/read during the test |
cputime | µs | The average cpu time taken by each operation |
latency_avg | µs | The average duration of each operation |
Iperf¶
IPerf [5] is a benchmarking tool to measure the maximum achievable bandwidth on IP networks. It provides statistics both for TCP and UDP protocols.
In the Benchmarking Suite, the following pre-defined workloads have been created:
tcp_10_1 | transfer data over a single TCP connections for 10 seconds |
tcp_10_10 | transfer data over 10 parallel TCP connections for 10 seconds |
udp_10_1_1 | transfer UDP packets over a single connection with a maximum bandwidth limited at 1MBit/s |
udp_10_1_10 | transfer UDP packets over a single connection with a maximum bandwidth limited at 10MBit/s |
udp_10_10_10 | transfer UDP packets over 10 parallel connections with a maximum bandwidth limited at 1MBit/s |
Metrics¶
For the TCP workloads:
Metric | Unit | Description |
---|---|---|
duration | s | The overall duration of the test |
transferred_x | bytes | data transferred for the connection x |
bandwidth_x | bit/s | bandwidth fo the connection x |
transferred_sum | bytes | sum of data transferred in all connections |
bandwidth_sum | bit/s | sum of bandwidth of all connections |
For the UDP workloads:
Metric | Unit | Description |
---|---|---|
duration | s | The overall duration of the test |
transferred_x | bytes | data transferred over connection x |
bandwidth_x | bit/s | bandwidth of connection x |
total_datagrams_x | num | number of UDP packets sent over connection x |
lost_datagrams_x | num | number of lost UDP packets over connection x |
jitter_x | ms | latency of connection x |
outoforder_x | num | number of packets received by the server in the wrong order |
transferred_avg | bytes | average data transferred by each connection |
bandwidth | bit/s | average bandwidth of each connection |
total_datagrams_avg | num | average number of packets sent over each connection |
lost_datagrams_avg | num | average number of packets lost for each connection |
jitter_avg | ms | average latency |
outoforder_avg | num | average number of packets received in the wrong order |
Sysbench¶
SysBench [6] is a modular, cross-platform and multi-threaded benchmark tool for evaluating CPU, memory, file I/O, mutex performance, and even MySQL benchmarking. At the moment, in the Benchmarking Suite only the CPU benchmarking capabilities are integrated.
cpu_10000 | Verifies prime numbers between 0 and 20000 by doing standard division of the number by all numbers between 2 and the square root of the number. This is repeated 1000 times and using 1, 2, 4, 8, 16 and 32 threads |
Metrics¶
Metric | Unit | Description |
---|---|---|
events_rate_X | num/s | the number of times prime numbers between 0 and 20000 are verified each second with X threads |
total_time_X | s | total number of seconds it took to execute the 1000 cycles with X threads |
latency_min_X | ms | minimum time it took for a cycle |
latency_max_X | ms | maximum time it took for a cycle |
latency_avg_X | ms | average time the 1000 cycles took. It gives a good measure of the cpu speed |
latency_95_X | ms | 95th percentile of the latency times. |
YCSB¶
YCSB [7] is a database benchmarking tool. It has the support for several database technologies and provides a configuration mechanism to simulate different usages.
In the Benchmarking Suite, YCSB is used to benchmark two of the most popular database servers: MySQL and MongoDB.
For each database, the following workloads are executed:
workloada | Simulates an application that performs read and update operations with a ratio of 50/50 (e.g. recent actions recording) |
workloadb | Simulates an application that performs read and update operations with a ratio of 95/5 (e.g. photo tagging) |
workloadc | Simulates a read-only databases (100% read operations) |
workloadd | Simulates an application that performs read and insert operations with a ratio of 95/5 (e.g. user status update) |
workloade | Simulates an application that performs scan and insert operations with a ratio of 95/5 (e.g. threaded conversations) |
workloadf | Simulates an application that performs read and read-modify-write operations with a ratio of 50/50 (e.g. user database) |
Metrics¶
Metric | Unit | Description |
---|---|---|
duration | s | The overall duration of the test |
read_ops | num | THe number of read operations executed |
read_latency_avg | µs | The average latency of the read operations |
read_latency_min | µs | The minimum latency of the read operations |
read_latency_max | µs | The maximum latency of the read operations |
read_latency_95 | µs | The maximum latency for the 95% of the read operations |
read_latency_99 | µs | The maximum latency for the 99% of the read operations |
insert_ops | num | THe number of insert operations executed |
insert_latency_avg | µs | The average latency of the insert operations |
insert_latency_min | µs | The minimum latency of the insert operations |
insert_latency_max | µs | The maximum latency of the insert operations |
insert_latency_95 | µs | The maximum latency for the 95% of the insert operations |
insert_latency_99 | µs | The maximum latency for the 99% of the insert operations |
update_ops | num | THe number of update operations executed |
update_latency_avg | µs | The average latency of the update operations |
update_latency_min | µs | The minimum latency of the update operations |
update_latency_max | µs | The maximum latency of the update operations |
update_latency_95 | µs | The maximum latency for the 95% of the update operations |
update_latency_99 | µs | The maximum latency for the 99% of the update operations |
WebFrameworks¶
This is an open source tool [8] used to compare many web application frameworks executing fundamental tasks such as JSON serialization, database access, and server-side template composition. The tool has been developed and it is used to run the tests that generate the results available at: https://www.techempower.com/benchmarks/.
Currently, in the Benchmarking Suite the framework supported are: Django, Spring, CakePHP, Flask, FastHttp and NodeJS.
For each framework the following tests are executed:
json | This test exercises the framework fundamentals including keep-alive support, request routing, request header parsing, object instantiation, JSON serialization, response header generation, and request count throughput. |
query | This test exercises the framework’s object-relational mapper (ORM), random number generator, database driver, and database connection pool. |
fortunes | This test exercises the ORM, database connectivity, dynamic-size collections, sorting, server-side templates, XSS countermeasures, and character encoding. |
db | This test uses a testing World table. Multiple rows are fetched to more dramatically punish the database driver and connection pool. At the highest queries-per-request tested (20), this test demonstrates all frameworks’ convergence toward zero requests-per-second as database activity increases. |
plaintext | This test is an exercise of the request-routing fundamentals only, designed to demonstrate the capacity of high-performance platforms in particular. Requests will be sent using HTTP pipelining. |
update | This test exercises the ORM’s persistence of objects and the database driver’s performance at running UPDATE statements or similar. The spirit of this test is to exercise a variable number of read-then-write style database operations. |
For the types json, query, fortunes and db the tool executes six different burst of requests. Each burst last 15 seconds and have a different concurrency level (number of requests done concurrently): 16, 32, 64, 128, 256 and 512.
For the type plaintext, the tool executes four burst of 15 seconds each with the following concurrency levels: 256, 1024, 4096 and 16384.
For the type update, the tool executes five burst of 15 seconds each with a 512 concurrency level, but different number of queries to perform: 1, 5, 10, 15 and 20.
Metrics¶
Metric | Unit | Description |
---|---|---|
duration | s | The overall duration of the test |
duration_N | s | The overall duration for the N concurrency level*. It is fixed to 15 seconds by default |
totalRequests_N | num | The overall number of requests processed during the 15 seconds test at the N concurrency level* |
timeout_N | num | The number of requests that went in timeout for the N concurrency level* |
latencyAvg_N | s | the average latency between a request and its response for the N concurrency level* |
latencyMax_N | s | the maximum latency between a request and its response for the N concurrency level* |
latencyStdev_N | s | the standard deviation measure for the latency for the N concurrency level* |
Adding a new benchmarking tool¶
In addition to the benchmarking tests coming with the standard Benchmarking Suite release, it is possible to add new benchmarking tools by providing a configuration file to instruct the Benchmarking Suite how to install, configure and execute the tool.
The configuration file must contain on section [DEFAULT]
with the commands to install and execute the benchmarking tool, plus one or more sections that define different sets of input parameters to the tool. In this way, it is possible to execute the same tool to generate multiple workloads.
[DEFAULT]
class = benchsuite.stdlib.benchmark.vm_benchmark.BashCommandBenchmark
#
# install, install_ubuntu, install_centos_7 are all valid keys
install_<platform> =
echo "these are the..."
echo "...install %(option1)s commands"
execute_<platform> =
echo "execute commands"
cleanup =
echo "commands to cleanup the %(option2)s environment"
[workload_1]
option1 = value1
option2 = value
[workload_n]
option1 = value1
option2 = valueN
For instance, a very minimal configuration file to integrate the Sysbench [6] benchmarking tool is shown below:
[DEFAULT]
class = benchsuite.stdlib.benchmark.vm_benchmark.BashCommandBenchmark
install =
curl -s https://packagecloud.io/install/repositories/akopytov/sysbench/script.deb.sh | sudo bash
sudo apt-get -yq install sysbench
sysbench %(test)s prepare %(options)s
execute =
sysbench %(test)s run %(options)s --time=0
cleanup =
sysbench %(test)s cleanup %(options)s
[cpu_workload1]
test = cpu
options = --cpu-max-prime=20000 --events=10000
Configuration files of the benchmarks included in the Benchmarking Suite releases can be used as starting point and are available here [10]
Managing benchmarking tools through the GUI¶
The Benchmarking Suite comes with a set of third-party, widely-known, open source benchmarking workloads (e.g. SysBench, FileBench, DaCapo, Web Framework Benchmarking). These workloads are available to any registered user and are encouraged to be used to enable comparability of results along time and across providers and users. However, to support specific user requirements, custom workloads can be defined and, according to the user choice, shared with others or kept private. As with the CLI, workloads registered in the Benchmarking Suite are typically a wrapper around existing benchmarking applications for which the registration process should provide installation, execution and results parsing capabilities.
When the Benchmarking Suite is used through the web interface, benchmarking tools (a.k.a. workloads) can be added and edited as well.
From the ‘Workload’ panel, a new workload can be added via the ‘New Workload’ button which shows a form asking for metadata already presented in the previous section.
Similarly, once a workload has been selected, it can be modified, cloned into a new one or deleted (provided you have enough permissions on it).
Workload metadata¶
- Workload name is a name, not necessarily unique, given to the workload;
- Tool name is the name of the tool providing the given workload;
- Workload ID is a unique identifier provided by the system, and is not modifiable;
- Description is to tell what the workload does, what parameter is measured, and any other useful detail about the the workload;
- Categories allows to specify some tags to ease search;
- Abstract is to mark workload ‘templates’ not meant for execution but only for specialization;
- Parent workload is a base workload definition from which properties/commands are inherited. The current workload only needs to specialize some of them. A typical usage of this feature is to define multiple workloads provided by a single tool;
Workload execution¶
- Install scripts are executed just after the provisioning of the virtual machine, usually to download and install a benchmarking tool;
- Post-create scripts are executed after the provisioning of the virtual machine to perform some general initialization (e.g. configure the DNS);
- Execute scripts are executed to perform the benchmark of the environment;
- Cleanup scripts are executed for multi-benchmark execution to ensure a clean environment for following benchmarks;
- User and Support scripts and Workload parameters are meant to be used in above scripts for readability and better coding style;
Install, post-create, execute and cleanup scripts can be specialized for a specific operating system, as well as for specific version of the operating system. This is achieved through a mechanism matching the environment on the target VM with the name of the script.
As an example, when executing the workload on a VM running Debian 10.8, the following install scripts are searched:
- install_debian_10_8
- install_debian_10
- install_debian
- install
The first matched script (i.e. the most-specific one) is executed; the others are ignored.
Sharing workloads¶
- Sharing sets the level of visibility for the workload. A workload can be private to its creator or publicly visible and, thus, executable. Note that this applies to the workload only and not to infrastructure being benchmarked nor to produced results which have their own visibility levels;
Exporting workloads¶
A JSON representation of the workload can be generated for offline inspection/editing. This can be done for all visible workloads (hit the ‘Export All’ button) or for individual workloads (hit the ‘Export’ button when viewing the workload).
When exported workloads are linked by some inheritance relationship, you can decide to export them so that:
- the hierarchy is preserved;
- inherited properties are collapsed into most-specific workloads (i.e. the hierarchy is lost).
[1] | (1, 2) CloudPerect project homepage: http://cloudperfect.eu/ |
[2] | CFD Benchmark Case code: https://github.com/benchmarking-suite/cfd-benchmark-case |
[3] | DaCapo homepage: http://www.dacapobench.org/ |
[4] | Filebench homepage: https://github.com/filebench/filebench/wiki |
[5] | IPerf homepage: https://iperf.fr/ |
[6] | (1, 2) Sysbench homepage: https://github.com/akopytov/sysbench |
[7] | YCSB homepage: https://github.com/brianfrankcooper/YCSB/wiki |
[8] | Web Framewoks Benchmarking code: https://github.com/TechEmpower/FrameworkBenchmarks |
[10] | Benchmark configuartion files: https://github.com/benchmarking-suite/benchsuite-stdlib/tree/master/data/benchmarks |