====================== libEnsemble with SLURM ====================== SLURM is a popular open-source workload manager. libEnsemble can read SLURM node lists and partition these to workers. By default this is done by :ref:`reading an environment variable`. Example SLURM submission scripts for various systems are given in the :doc:`examples`. Further examples are given in some of the specific platform guides (e.g., :doc:`Perlmutter guide`) By default, the :doc:`MPIExecutor<../executor/mpi_executor>` uses ``mpirun`` as a priority over ``srun`` as it works better in some cases. If ``mpirun`` does not work well, then try telling the MPIExecutor to use ``srun`` when it is initiated in the calling script:: from libensemble.executors.mpi_executor import MPIExecutor exctr = MPIExecutor(custom_info={"mpi_runner":"srun"}) Common Errors ------------- SLURM systems can have various configurations which may affect what is required when assigning more than one worker to any given node. .. dropdown:: **srun: Job \*\*\*\*\*\* step creation temporarily disabled, retrying (Requested nodes are busy)** You may also see: ``srun: Job ****** step creation still disabled, retrying (Requested nodes are busy)`` It is recommended to add these to submission scripts to prevent resource conflicts:: export SLURM_EXACT=1 export SLURM_MEM_PER_NODE=0 Alternatively, the ``--exact`` `option to srun`_, along with other relevant options can be given on any ``srun`` lines, including the ``MPIExecutor`` submission lines via the ``extra_args`` option (from version 0.10.0, these are added automatically). Secondly, while many configurations are possible, it is recommended to **avoid** using ``#SBATCH`` commands that may limit resources to srun job steps such as:: #SBATCH --ntasks-per-node=4 #SBATCH --gpus-per-task=1 Instead provide these to sub-tasks via the ``extra_args`` option to the :doc:`MPIExecutor<../executor/mpi_executor>` ``submit`` function. .. dropdown:: **GTL_DEBUG: [0] cudaHostRegister: no CUDA-capable device is detected** If using the environment variable ``MPICH_GPU_SUPPORT_ENABLED``, then ``srun`` commands may expect an option for allocating GPUs (e.g., ``--gpus-per-task=1`` would allocate one GPU to each MPI task of the MPI run). It is recommended that tasks submitted via the :doc:`MPIExecutor<../executor/mpi_executor>` specify this in the ``extra_args`` option to the ``submit`` function (rather than using an ``#SBATCH`` command). If running the libEnsemble calling script with ``srun``, then it is recommended that ``MPICH_GPU_SUPPORT_ENABLED`` is set in the user ``sim_f`` or ``gen_f`` function where GPU runs will be submitted, instead of in the batch script. For example:: os.environ["MPICH_GPU_SUPPORT_ENABLED"] = "1" Note on Resource Binding ------------------------ .. note:: Update: From version 0.10.0, it is recommended that GPUs are assigned automatically by libEnsemble. See the :doc:`forces_gpu<../tutorials/forces_gpu_tutorial>` tutorial as an example. Note that the use of ``CUDA_VISIBLE_DEVICES`` and other environment variables is often a highly portable way of assigning specific GPUs to workers, and has been known to work on some systems when other methods do not. See the libEnsemble regression test `test_persistent_sampling_CUDA_variable_resources.py`_ for an example of setting CUDA_VISIBLE_DEVICES in the imported simulator function (``CUDA_variable_resources``). On other systems, like Perlmutter, using an option such as ``--gpus-per-task=1`` or ``-gres=gpu:1`` in ``extra_args`` is sufficient to allow SLURM to find the free GPUs. Note that the ``srun`` options such as:: --gpu-bind=map_gpu:2,3 do not necessarily provide absolute GPU slots when there are more than one concurrent job steps (``sruns``) running on a node. If desired, such options could be set using the :doc:`worker resources<../resource_manager/worker_resources>` module in a similar manner to how ``CUDA_VISIBLE_DEVICES`` is set in the example. Some useful commands -------------------- Find SLURM version:: scontrol --version Find SLURM system configuration:: scontrol show config Find SLURM partition configuration for a partition called "gpu":: scontrol show partition gpu .. _option to srun: https://docs.nersc.gov/systems/perlmutter/running-jobs/#single-gpu-tasks-in-parallel .. _test_persistent_sampling_CUDA_variable_resources.py: https://github.com/Libensemble/libensemble/blob/develop/libensemble/tests/functionality_tests/test_persistent_sampling_CUDA_variable_resources.py