/*! likwid-bench likwid-bench
likwid-bench is a benchmark suite for low-level (assembly) benchmarks to measure bandwidths and instruction throughput for specific instruction code on x86 systems. The currently included benchmark codes include common data access patterns like load and store but also calculations like vector triad and sum. likwid-bench includes architecture specific benchmarks for x86, x86_64 and x86 for Intel Xeon Phi coprocessors. The performance values can either be calculated by likwid-bench or measured using hardware performance counters by using likwid-perfctr as a wrapper to likwid-bench. This requires to build likwid-bench with instrumentation enabled in config.mk (INSTRUMENT_BENCH).
| Option | Description | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| -h | Print help message | ||||||||||
| -a | List all available benchmarks | ||||||||||
| -p | List all available thread affinity domains | ||||||||||
| -i <iters> | Use <iters> iterations of the benchmark kernel | ||||||||||
| -d <delim> | Use <delim> instead of ‘,’ for the output of -p | ||||||||||
| -l <test> | List characteristics of <test> like number of streams, data used per loop iteration, … | ||||||||||
| -t <test> | Perform assembly benchmark <test> | ||||||||||
| -s <min_time> | 
Minimal time in seconds to run the benchmark. Using this time, the iteration count is determined automatically to provide reliable results. Default is 1. If the determined iteration count is below 10, it is normalized to 10.  | 
||||||||||
| -w <workgroup> | 
Set a workgroup for the benchmark. A workgroup can have different formats:
  | 
likwid-bench -t copy -w S0:100kBcopy using all threads in affinity domain S0. The input and output stream of the copy benchmark sum up to 100kB placed in affinity domain S0. The iteration count is calculated automatically.
likwid-bench -t triad -i 100 -w S0:1GB:2:1:2triad using 2 threads in affinity domain S0. Assuming S0 = 0,4,1,5 the threads are pinned to CPUs 0 and 1, hence skipping of one thread during selection. The streams of the triad benchmark sum up to 1GB placed in affinity domain S0. The number of iteration is explicitly set to 100
likwid-bench -t update -w S0:100kB -w S1:100kBupdate using all threads in affinity domain S0 and S1. The threads scheduled on S0 use stream that sum up to 100kB. Similar to S1 the threads are placed there working only on their socket-local streams. The results of both workgroups are combined.
likwid-perfctr -c E:S0:4 -g MEM -m likwid-bench -t update -w S0:100kB:4update using 4 threads in affinity domain S0. The input and output stream of the copy benchmark sum up to 100kB placed in affinity domain S0. The benchmark execution is measured using the Marker_API. It measures the MEM performance group on the first four CPUs of the S0 affinity domain. For further information about hardware performance counters see likwid-perfctr likwid-perfctr. The pinning is done by likwid-bench
likwid-bench -t copy -w S0:1GB:2:1:2-0:S1,1:S1copy using 2 threads in affinity domain S0 skipping one thread during selection. The two streams used in the copy benchmark have the IDs 0 and 1 and a summed up size of 1GB. Both streams are placed in affinity domain S1.
*/