======================= Summit (Decommissioned) ======================= Summit_ was an IBM AC922 system located at the Oak Ridge Leadership Computing Facility (OLCF). Each of the approximately 4,600 compute nodes on Summit contained two IBM POWER9 processors and six NVIDIA Volta V100 accelerators. Summit featured three tiers of nodes: login, launch, and compute nodes. Users on login nodes submit batch runs to the launch nodes. Batch scripts and interactive sessions run on the launch nodes. Only the launch nodes can submit MPI runs to the compute nodes via ``jsrun``. These docs are maintained to guide libEnsemble's usage on three-tier systems and/or `jsrun` systems similar to Summit. Configuring Python ------------------ Begin by loading the Python 3 Anaconda module:: $ module load python You can now create and activate your own custom conda_ environment:: conda create --name myenv python=3.10 export PYTHONNOUSERSITE=1 # Make sure get python from conda env . activate myenv If you are installing any packages with extensions, ensure that the correct compiler module is loaded. If using mpi4py_, this must be installed from source, referencing the compiler. Currently, mpi4py must be built with gcc:: module load gcc With your environment activated, run :: CC=mpicc MPICC=mpicc pip install mpi4py --no-binary mpi4py Installing libEnsemble ---------------------- Obtaining libEnsemble is now as simple as ``pip install libensemble``. Your prompt should be similar to the following line: .. code-block:: console (my_env) user@login5:~$ pip install libensemble .. note:: If you encounter pip errors, run ``python -m pip install --upgrade pip`` first Or, you can install via ``conda``: .. code-block:: console (my_env) user@login5:~$ conda config --add channels conda-forge (my_env) user@login5:~$ conda install -c conda-forge libensemble See :doc:`here<../advanced_installation>` for more information on advanced options for installing libEnsemble. Special note on resource sets and Executor submit options --------------------------------------------------------- When using the portable MPI run configuration options (e.g., num_nodes) to the :doc:`MPIExecutor<../executor/mpi_executor>` ``submit`` function, it is important to note that, due to the resource sets used on Summit, the options refer to resource sets as follows: - num_procs (int, optional) – The total number resource sets for this run. - num_nodes (int, optional) – The number of nodes on which to submit the run. - procs_per_node (int, optional) – The number of resource sets per node. It is recommended that the user defines a resource set as the minimal configuration of CPU cores/processes and GPUs. These can be added to the ``extra_args`` option of the *submit* function. Alternatively, the portable options can be ignored and everything expressed in ``extra_args``. For example, the following *jsrun* line would run three resource sets, each having one core (with one process), and one GPU, along with some extra options:: jsrun -n 3 -a 1 -g 1 -c 1 --bind=packed:1 --smpiargs="-gpu" To express this line in the ``submit`` function may look something like the following:: exctr = Executor.executor task = exctr.submit(app_name="mycode", num_procs=3, extra_args="-a 1 -g 1 -c 1 --bind=packed:1 --smpiargs="-gpu"" app_args="-i input") This would be equivalent to:: exctr = Executor.executor task = exctr.submit(app_name="mycode", extra_args="-n 3 -a 1 -g 1 -c 1 --bind=packed:1 --smpiargs="-gpu"" app_args="-i input") The libEnsemble resource manager works out the resources available to each worker, but unlike some other systems, ``jsrun`` on Summit dynamically schedules runs to available slots across and within nodes. It can also queue tasks. This allows variable size runs to easily be handled on Summit. If oversubscription to the `jsrun` system is desired, then libEnsemble's resource manager can be disabled in the calling script via:: libE_specs["disable_resource_manager"] = True In the above example, the task being submitted used three GPUs, which is half those available on a Summit node, and thus two such tasks may be allocated to each node (from different workers), if they were running at the same time. Job Submission -------------- Summit used LSF_ for job management and submission. For libEnsemble, the most important command is ``bsub`` for submitting batch scripts from the login nodes to execute on the launch nodes. It is recommended to run libEnsemble on the launch nodes (assuming workers are submitting MPI applications) using the ``local`` communications mode (multiprocessing). Interactive Runs ^^^^^^^^^^^^^^^^ You can run interactively with ``bsub`` by specifying the ``-Is`` flag, similarly to the following:: $ bsub -W 30 -P [project] -nnodes 8 -Is This will place you on a launch node. .. note:: You will need to reactivate your conda virtual environment. Batch Runs ^^^^^^^^^^ Batch scripts specify run settings using ``#BSUB`` statements. The following simple example depicts configuring and launching libEnsemble to a launch node with multiprocessing. This script also assumes the user is using the ``parse_args()`` convenience function from libEnsemble's :doc:`tools module<../utilities>`. .. code-block:: bash #!/bin/bash -x #BSUB -P #BSUB -J libe_mproc #BSUB -W 60 #BSUB -nnodes 128 #BSUB -alloc_flags "smt1" # --- Prepare Python --- # Load conda module and gcc. module load python module load gcc # Name of conda environment export CONDA_ENV_NAME=my_env # Activate conda environment export PYTHONNOUSERSITE=1 source activate $CONDA_ENV_NAME # --- Prepare libEnsemble --- # Name of calling script export EXE=calling_script.py # Communication Method export COMMS="--comms local" # Number of workers. export NWORKERS="--nworkers 128" hash -r # Check no commands hashed (pip/python...) # Launch libE python $EXE $COMMS $NWORKERS > out.txt 2>&1 With this saved as ``myscript.sh``, allocating, configuring, and queueing libEnsemble on Summit is achieved by running :: $ bsub myscript.sh Example submission scripts are also given in the :doc:`examples`. Launching User Applications from libEnsemble Workers ---------------------------------------------------- Only the launch nodes can submit MPI runs to the compute nodes via ``jsrun``. This can be accomplished in user simulator functions directly. However, it is highly recommended that the :doc:`Executor<../executor/ex_index>` interface be used inside the simulator or generator, because this provides a portable interface with many advantages including automatic resource detection, portability, launch failure resilience, and ease of use. .. _conda: https://conda.io/en/latest/ .. _LSF: https://www.olcf.ornl.gov/wp-content/uploads/2018/12/summit_workshop_fuson.pdf .. _mpi4py: https://mpi4py.readthedocs.io/en/stable/ .. _Summit: https://www.olcf.ornl.gov/olcf-resources/compute-systems/summit/