Running AceCAST

Before attempting to run make sure you have installed AceCAST properly and acquired a valid license to use the software (see Installation Guide).

Input Data

The first step in any AceCAST/WRF workflow is to generate input data for the model. AceCAST intentionally uses the same exact namelist.input, wrfbdy*, wrfinput*, etc. files that are used by the standard CPU-WRF model. The only restrictions are that they must be compatible with WRF version 3.8.1 and the namelist options must be supported by AceCAST (see Generating Input Data and Creating A Namelist). Although you can use your own test data, to keep things simple for this exercise it is highly recommended to start with the input data from one of our benchmark test cases. In this example we will use the Easter500 benchmark, which is a good test case for a small number of GPUs and for convenience we download and unpack the benchmark data in the AceCAST/benchmarks directory:

[acecast-user@localhost]$ cd AceCAST/benchmarks/
[acecast-user@localhost]$ wget https://tqi-public.s3.us-east-2.amazonaws.com/datasets/easter500.tar.gz
...
[acecast-user@localhost]$ tar -xf easter500.tar.gz
[acecast-user@localhost]$ ls
easter500  easter500.tar.gz
[acecast-user@localhost]$ tree easter500
easter500
├── namelist.input
├── namelist.wps
├── wrfbdy_d01
└── wrfinput_d01

Setting Up the Simulation Run Directory

Although you are free to run your simulations in the AceCAST/run directory, we suggest you create a new directory and link/copy any files required at runtime to that directory. This includes the following:

AceCAST executable (acecast.exe)

AceCAST license (something like acecast-trial.lic)

MPI wrapper script (gpu_launch.sh)

static runtime data files (CCN_ACTIVATE.BIN, GENPARM.TBL, LANDUSE.TBL, etc.)

namelist file (namelist.input)

input data (wrfbdy*, wrfinput*, etc.)

Tip

We consider it best practice to create a new directory for each simulation you run. This can help you avoid common mistakes when running large numbers of simulations and also allows you to run multiple simulations simultaneously if you have the compute resources to do so.

For our example we will be running with 4 GPUs on a single compute node (localhost). Given this, we create an appropriately named simulation run directory at AceCAST/test/easter500-4GPU and link the necessary runtime files into this directory:

[acecast-user@localhost]$ cd AceCAST
[acecast-user@localhost]$ mkdir -p test/easter500-4GPU
[acecast-user@localhost]$ cd test/easter500-4GPU/
[acecast-user@localhost]$ ln -s ../../run/* .
[acecast-user@localhost]$ ln -s ../../benchmarks/easter500/* .

At this point your easter500-4GPU directory contents should look something like this:

easter500-4GPU
├── acecast-advisor.sh -> ../../run/acecast-advisor.sh
├── acecast.exe -> ../../run/acecast.exe
├── acecast-trial.lic -> ../../run/acecast-trial.lic
├── CCN_ACTIVATE.BIN -> ../../run/CCN_ACTIVATE.BIN
├── diffwrfs -> ../../run/diffwrfs
├── GENPARM.TBL -> ../../run/GENPARM.TBL
├── gpu-launch.sh -> ../../run/gpu-launch.sh
├── LANDUSE.TBL -> ../../run/LANDUSE.TBL
├── namelist.input -> ../../benchmarks/easter500/namelist.input
├── namelist.wps -> ../../benchmarks/easter500/namelist.wps
├── ndown.exe -> ../../run/ndown.exe
├── ozone.formatted -> ../../run/ozone.formatted
├── ozone_lat.formatted -> ../../run/ozone_lat.formatted
├── ozone_plev.formatted -> ../../run/ozone_plev.formatted
├── README.namelist -> ../../run/README.namelist
├── real.exe -> ../../run/real.exe
├── RRTMG_LW_DATA -> ../../run/RRTMG_LW_DATA
├── RRTMG_SW_DATA -> ../../run/RRTMG_SW_DATA
├── SOILPARM.TBL -> ../../run/SOILPARM.TBL
├── VEGPARM.TBL -> ../../run/VEGPARM.TBL
├── wrfbdy_d01 -> ../../benchmarks/easter500/wrfbdy_d01
├── wrf.exe -> ../../run/wrf.exe
└── wrfinput_d01 -> ../../benchmarks/easter500/wrfinput_d01

Setting Up Your Runtime Environment

Prior to running AceCAST, we need to setup the runtime environment by sourcing the ~/tqi-build/20.7/env.sh script that was generated by the install_deps.sh script during the Installation:

[acecast-user@localhost]$ source ~/tqi-build/20.7/env.sh

Note

If you installed the AceCAST dependencies in a non-default location, the env.sh script will be located in the directory you specified during the installation.

This modifies your PATH and LD_LIBRARY_PATH variables so that acecast.exe can properly link with the shared libraries for NetCDF, HDF5, etc..

Launching AceCAST with MPI

AceCAST uses MPI to enable it to run on multiple GPUs just like WRF does (when compiled for dmpar) to run on multiple CPU cores. The standard AceCAST distribution uses an OpenMPI build that is included with the NVIDIA HPC SDK installation (see Installation) and typically use the associated mpirun launcher to run acecast.exe.

Note

In some cases the NVIDIA HPC SDK build of OpenMPI may not be compatible with your system. If you run into any MPI-related issues or poor multi-GPU performance, please contact support@tempoquest.com to discuss alternative builds or other solutions.

General AceCAST usage can be summarized as follows:

Usage:  mpirun [MPIRUN_OPTIONS] ./gpu-launch.sh ./acecase.exe

We always recommend that you use one MPI task per each GPU you intend to run on. This is accomplished through the proper choice of MPIRUN_OPTIONS as well as the gpu-launch.sh MPI wrapper script. The goal of the former is to launch the correct number of MPI tasks on each node. The gpu-launch.sh script (note that this is run by each MPI task independently) then sets the ACC_DEVICE_NUM environment variable (see NVHPC Environment Variables) for each task to ensure the one-to-one mapping of GPUs to their respective tasks. For the majority of users the gpu-launch.sh can be used as-is but there are some cases where this may need to be modified (example: running 4 simulations simultaneously each on their own GPU on a single node), in which case users can find more information in Modifying the gpu-launch.sh Script.

Warning

Currently, AceCAST doesn’t prevent you from running with multiple MPI tasks per GPU, which can degrade performance as well as cause significant GPU memory limitations. It is important to make sure you are using a single GPU per MPI task.

Note that although the multi-node usage can vary significantly from system to system, the single node use case can nearly always be generalized to:

Single Node Usage: mpirun -n <NUM_GPUS> ./gpu-launch.sh ./acecast.exe

For our example we are run with 4 GPUs on a single node and can therefore follow this single node usage pattern.

[acecast-user@localhost]$ mpirun -n 4 ./gpu-launch.sh ./acecast.exe
 starting wrf task             0  of             4
 starting wrf task             1  of             4
 starting wrf task             2  of             4
 starting wrf task             3  of             4

If the run was successful, you should see a message stating SUCCESS COMPLETE WRF near the end of the rsl.error.0000 file.

[acecast-user@localhost]$ tail rsl.error.0000
Timing for main: time 2020-04-12_23:59:12 on domain   1:    0.13889 elapsed seconds
Timing for main: time 2020-04-12_23:59:24 on domain   1:    0.13829 elapsed seconds
Timing for main: time 2020-04-12_23:59:36 on domain   1:    0.13934 elapsed seconds
Timing for main: time 2020-04-12_23:59:48 on domain   1:    0.13824 elapsed seconds
Timing for main: time 2020-04-13_00:00:00 on domain   1:    0.14919 elapsed seconds
Timing for Writing wrfout_d01_2020-04-13_00_00_00 for domain        1:    1.76981 elapsed seconds
Timing for Writing restart for domain        1:    7.45465 elapsed seconds
d01 2020-04-13_00:00:00 wrf: SUCCESS COMPLETE WRF
Checking-in/releasing AceCAST Licenses
Successfully checked-in/released AceCAST Licenses.

Summary and Next Steps

In this section we covered the basics of running AceCAST through an example where we ran the Easter500 benchmark test case with 4 GPUs on a single node. By using input data from one of our benchmark test cases, we were able to focus on the fundamental mechanics of running the AceCAST software before moving on to other critical topics such as generating input data and choosing a namelist. These will be covered in the next sections Generating Input Data and Creating A Namelist.