PLEIADES

User documentation for the PLEIADES cluster at the University of Wuppertal

Getting Started: Monitoring

Our monitoring service can be accessed at zabbix.pleiades.uni-wuppertal.de with the credentials “pleiades” and “pleiades”. Zabbix collects various information regarding current and past resource usage and quotas.

Zabbix is only accessible from the university network! If you are outside and need access, use the ZIMs webvpn.

Dashboard

After login you are typically greeted by the user dashboard: Dashboard overview

There are also multiple sub-pages available, covering an overview and login-node-specific metrics: Dashboard overview

The overview dashboard presents aggregated metrics and you can select the time frame of presented information on the top right. The visible widgets are:

More detailed pages about each class of login nodes provide information about:

You can change the displayed dashboard through Monitoring > Dashboard > All dashboards > User dashboard:

Dashboard overview

All available user dashboards are:

Dashboard overview

GPU Dashboard

Another dashboard shows detailed information of all five gpu nodes, gpu2100[1-5]: Dashboard overview

You can select separate pages for each gpu node, which provide detailed information: Dashboard overview

These pages can help you answer questions like:

If you know which GPUs your job is using, or if you use a whole node exclusively, this approach can help to assess your software performance.

Details of Specific Hosts

It is possible to show detailed information for specific hosts. Start on the “Monitoring > Hosts” sub page: Host overview

Here you can search certain hosts, e.g. a node which is currently involved in processing your Slurm job. You can use squeue -u $USER or scontrol show job <jobid> to get the list of nodes that are processing your job.

Keep in mind that nodes are shared between jobs! If you need an exact performance assessment, use the --exclusive flag during job submission to disallow concurrent jobs at the cost of longer waiting times and billing the whole node(s).

For each host, you can list all available data, show all predefined diagrams, or a show simple dashboard: Host options

This approach can tell you how well your job utilizes available CPU, memory, InfiniBand, Ethernet, or GPU resources.