The SLURM Resource Manager
TotalView supports the SLURM resource manager. Here is some information copied from the SLURM website (https://hpc.llnl.gov/documentation/tutorials/livermore-computing-linux-commodity-clusters-overview-part-one).
SLURM is an open-source resource manager designed for Linux clusters of all sizes. It provides three key functions. First it allocates exclusive and/or non-exclusive access to resources (computer nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (typically a parallel job) on a set of allocated nodes. Finally, it arbitrates conflicting requests for resources by managing a queue of pending work.
SLURM is not a sophisticated batch system, but it does provide an Applications Programming Interface (API) for integration with external schedulers such as the Maui Scheduler. While other resource managers do exist, SLURM is unique in several respects:
*Its source code is freely available under the GNU General Public License.
*It is designed to operate in a heterogeneous cluster with up to thousands of nodes.
*It is portable; written in C with a GNU autoconf configuration engine. While initially written for Linux, other UNIX-like operating systems should be easy porting targets. A plugin mechanism exists to support various interconnects, authentication mechanisms, schedulers, etc.
*SLURM is highly tolerant of system failures, including failure of the node executing its control functions.
*It is simple enough for the motivated end user to understand its source and add functionality.