HPCW 3.0
|
ICON requires multiple input files to execute correctly. The files are downloaded during the build step and stored in an object store created via CMake's ExternalData
. However, some of these tests require in excess of 100GB of data. Therefore, the larger test cases are turned off by default when building ICON with HPCW. As with the other tests in HPCW, the ICON test cases can be categorized according to their resource usage, with small (1 node; good "health check" for the built model and to familiarize yourself with ICON), medium (10-20 nodes) and large tests (>50 nodes).
The tests that are included by default are the following:
icon-test-nwp-R02B04N06multi
(small; a numerical weather prediction test case with nested domains, ideal as a "health check" for the build; ~160km)icon-atm-tracer-Hadley
(small; an idealized tracer transport test case; ~60km)icon-aes-physics
(small; a technical setup with land and atmosphere component, designed to fully utilize approximately one node; ~40km)icon-LAM
(large; an operational weather forecast setup covering Germany; ~2km with two nests (1km and 0.5km))Although icon-LAM
is considered a large test case, you can run it on about 15 nodes as well (using nodes with 256 GB of main memory as reference). However, the throughput suffers significantly in these cases. Extending the number of nodes and/or incorporating GPUs improves the throughput significantly. Therefore CPU runs should only be done with a large node count for this test case. The classification as "large" is supposed to reflect this fact. GPU-based tests with icon-LAM
can already achieve good throughput with 15-20 nodes.
Certain test cases require significantly larger inputs that the other cases. If you want to run these tests as well, you need to set ENABLE_icon_large_inputs
to ON
, either via the command line arguments to CMake or in the toolchain file. The test cases that will be enabled this way are the following:
icon-NextGEMS-R2B8-2020
(medium; a high-resolution NextGEMS setup; ~10km)All tests, except for the small NWP test, have also been tested on GPUs. For advice on how to adjust the input parameters to properly run ICON on GPUs, consult the section "Adjusting the Input Files".
At build time, ICON input files are put into ${CMAKE_BINARY_DIR}/inputs/icon/experiments
, which contains a directory per experiment. Note though that the input files in these directories are only symlinks to the actual files in the object store that was created by ExternalData
at build time.
Usually, the test cases consist of three phases:
The first phase creates a new directory (different one from ${CMAKE_BINARY_DIR}/inputs/icon/experiments
) for the given test. Per default, the directory creation step copies the input files from ${CMAKE_BINARY_DIR}/inputs/icon/experiments
WITHOUT following the symlinks. If you wish to change the input files (i.e. to adjust some namelist parameters), there are multiple options. The first option is to adjust the inputs in ${CMAKE_BINARY_DIR}/inputs/icon/experiments
. However, be aware that this also changes the input in the object store, which could be re-used later on. If you wish to isolate the changes from the object store, you can use the second (recommended) option. For this option, you have to set the CMake option COPY_ICON_INPUTS
to ON
at the time of building the tests. When this option is set, the symlinks are dereferenced and the actual files are copied into the corresponding directories. After the directory creation step you can then adjust the input files as desired, without worrying about the downloaded inputs in the object store.
You also do not need to be worried about the directory creation step deleting any changes. Per default, the copy command during this phase does not clobber already existing files, meaning the changes will persist across different runs.
If you plan to change the files, you can run the directory creation phase in isolation first before running the actual test. For this, pass the CTest flag -R <test_name>-directory-creation
(if you are using the build wrapper, pass it to the --ctest-flags
argument). After adjusting the inputs as desired, you can then run the test normally, without the need to explicitly exclude the directory-creation
step due to this step not clobbering your inputs, as previously mentioned.
If you run the stages of a given test via the build wrapper in separate commands, it is advised to supply the option --no-analysis
to all invocations, except for the final one that runs the tests. This prevents the execution of the analysis script until there are actual logs to analyze.
ICON is controlled via namelists. Most of the parameters can be left as is, but if you want to tune or experiment with a specific test case, there are a few notable exceptions.
The experiment start and stop dates are set in the icon_master.namelist
, as part of the master_time_control_nml
group. If you want to extend the run time of an experiment, you can adjust the experimentStopDate
variable accordingly. The model time step is part of a different group, run_nml
, which is typically defined in the namelist for the atmosphere component (NAMELIST_<...>_atm
).
Be aware that the output start and end dates are controlled separately. The output is controlled in groups called output_nml
, with their own output_start
and output_end
variables. The frequency of the output is controlled via the output_interval
variable.
An important tuning parameter is the value of nproma
, part of the parallel_nml
group, which controls the length of loop chunks. For standard x86 processors, this should be set to a smaller value, like 16 or 32, depending on cache sizes. When running on GPUs, this value should be considerably larger. It is recommended to set the value equal to the number of grid points handled by each MPI process, with the goal of having only a single block looping over all the cells. Instead of setting this value manually, there is also the option to control nproma
with a different variable, namely nblocks_c
, which controls the number of looping chunks. Setting this variable to 1 has the previously described effect regarding the single block, but depending on the setup, you may run out of memory. In these cases set nblocks_c
to a slightly larger value. Note that nproma
and nblocks_c
are mutually exclusive, meaning if one is set to a value that is not 0, the other one must be set to 0. Otherwise ICON is going to crash. The same namelist group also contains two similar variables, nproma_sub
and nblocks_sub
. These are used to control loop chunk sizes in the radiation code.
Not all parts of ICON support GPU execution. This is relevant for the icon-LAM
test case for example. Per default, it uses a radiation solver that only support execution on CPUs. If you try to run this setup as-is on GPUs, ICON will crash and inform you of this circumstance as well. In order to change the solver, adjust the value of the namelist parameter ecrad_isolver
and set it to 2. You may refer to this script as an example of how to change this namelist parameter. It also provides a similar example for a change to the nproma
values in the context of a GPU run.
Another important namelist parameter is num_io_procs
, with which you control the number of processes that exlusively do I/O. Setting this properly and placing the processes accordingly is especially important for GPU tests. Consult this script for an example of how to set the value of num_io_procs
. For execution on Levante, we also have an example script that places the processes according to their purpose (computation vs I/O). Consult the Levante job launcher for examples of how to use the previously mentioned process placement script.
An all-encompassing explanation of all namelist parameters and their effects is beyond the scope of this documentation. For a thorough explanation of the parameters and everything else regarding the model, consult this document.
If you want to run a certain ICON test with different resource configurations or slightly altered inputs at the same time instead of doing so sequentially, you can pass an additional input file to HPCW at build time. Pass the ABSOLUTE (~
is not expanded) path to this input file via the CMake option ICON_TESTCASE_JSON
. In this file, you may specify how many versions of a test case can be run in parallel in the JSON format. An example input file may look like this:
This input would signal to HPCW that 3 runs of the icon-test-nwp-R02B04N06multi
test may run in parallel. To enable this, the previously mentioned three phases of each specified test case are duplicated. For the previous example input, the tests that would be available after the build are:
icon-test-nwp-R02B04N06multi_1
icon-test-nwp-R02B04N06multi_2
icon-test-nwp-R02B04N06multi_3
Each test will be run in its own directory, which enables you to change the input files for each individual test. When passing this additional input file, make sure to accommodate for all available runs for a given test in your job script. This job launcher contains an example for icon-test-nwp-R02B04N06multi
, providing one configuration for the first test, and then a catch-all for any additional versions of that test.