SSC - Tasks | GWDG - IT in Science

Participating teams will have to provide the best solutions and IO-performance for the following tasks:

Task	Description
IO500	The IO500 benchmark is a comprehensive performance evaluation tool designed to assess the efficiency and scalability of HPC storage systems. It consists of several tests, including IOR for sequential read/write performance and mdtest for metadata operations, ensuring a thorough analysis of storage subsystems. The benchmark helps organizations and researchers to identify bottlenecks, compare different storage solutions, and guide the development of optimized storage architectures. Regularly updated results and rankings foster a competitive environment, encouraging continuous innovation in HPC storage technologies.
MD-Workbench	The MD-Workbench benchmark is a specialized benchmark designed to evaluate the performance of metadata operations in HPC filesystems. By simulating a range of metadata-intensive workloads, such as file creation, deletion, and attribute modification, MD-Workbench provides detailed insights into the efficiency and scalability of file system metadata handling.
Co-locating IO500/mdworkbench	Run IO500 and MD Workbench in parallel A hint how to do this can be found in here Scoring base on results
Elbencho	Elbencho is a benchmark designed to evaluate the performance of storage systems under various workloads. The benchmark assesses the read and write capabilities of file systems, including network file systems, by generating synthetic workloads that simulate real-world usage patterns. Elbencho supports both single-threaded and multi-threaded operations, allowing for comprehensive performance analysis across different configurations and scales.
Secret Task	mlperf storage bench v2 resnet50

Scoring for tasks

The scoring for the SSC is outlined in the table below. Any Scores that are tied to benchmarking will result in more points for higher scores. The maximum points are the number of teams in the competition, meaning for 5 teams the point spread is as follows:

5 Points for highest
4 points for second best
…
1 Point for last team in the rankings
0 Points if not run the Benchmark

The acronym DLIO refers to the “Deep learning IO Benchmark” and is used as a reference here.

Application	Task	Points
IO500	Submission to IO500 Webpage (Research section) Full submission, partly missing description 2 points, 1 point for reproducibility questionnaire	3
	10 Client setup Scoring based on results	5
	Description of the configurations measured and performance improvement made At least 5 different node/process combinations with reasoning on one summary page	5
MD Workbench	Lowest (maximum latency) for the fixed configuration Scoring base on results	5
Colocation	Run IO500 and MD Workbench in parallel Scoring base on results	5
Elbencho	Run for arbitrary number of client nodes, 100 KByte files (same file size as DLIO) The goal is to find out if a larger number of clients is automatically faster and a smaller number of clients is automatically slower or not. Submit write and read results. Ranking based on read result.	2
	Run 10 Client nodes, 100 KByte file (same file size as DLIO) vs. 100 KByte random reads in shared files The goal is to compare the read performance difference for reading 100KB files directly versus randomly reading 100KB records from one or multiple large files. Submit write and read results. Ranking based on read result.	5
	Run 1 client node, single thread 100 KByte random reads in large files The goal is to see how many read IOPS a single thread can achieve. Submit write and read results.	5
	HINT: See the built-in help pages “elbencho –help-large” and “elbencho –help-multi” to get examples for working with large shared files and many small files. See “elbencho –help-dist” to get examples for working with multiple clients. The “–dryrun” option of elbencho can be helpful to see how big the resulting dataset will be, especially for distributed runs where each thread creates many small files.
	NOTE: In all cases, make the dataset large enough so that the read test runs for at 20 seconds without using test parameters to read the same data multiple times like “–infloop” or “–iterations”. The goal is to try measuring drive access performance instead of RAM cache performance. Large file means 1GB file size or more. Generate the large files by doing sequential writes within each file (i.e. write without the “–rand” parameter).
MLPerf Resnet50 benchmark	1.get mlperf storage bench v2 resnet50 running on SSC node (1P)	5
	2. get ResNet50 read test pass: 1 node/4H100 with over 90% GPU acceleration (2P)
	3. pass same as 2 but 1node/8H100 (2P)
Performance analysis	Written report Comparison between benchmarks and theoretical maximum,1-2 pages of description, analysis, reasoning, and conclusion	10