Task division#

This page describes tasks for RPHC administrators and the people responsible for them.

QoS Administration#

Task: Monitoring the request channel and asking admins for opinions whether requests are reasonable. Then changing the QoS settings accordingly and setting a reminder to revert it.

People responsible: Lishan, Tianyu and Samuele

Project/home/data folder administration#

Task: Monitoring storage space allocation in project/home/data folders and snapshot management.

People responsible: Vanessa and Marek

Tape backups#

Task: Monitoring tape backups and ensuring that they are running correctly.

People responsible: Joren and Marek

Slurm quirk/crash investigations#

Task: Investigating slurm quirk/crash.

  1. kill task failed

  2. Jobs on nearly-full machines not getting allocated

People responsible: No one assigned yet

Minor ansible tasks#

Task: Minor ansible tasks.

People responsible: Marek and Joren

Patch Monday tasks#

Task: All tasks that are associated with patch Monday.

  1. Reservations

  2. Reviving cluster

  3. Wipe /processing disks

  4. Make playbook?

  5. Ask Ameer about patching workflow

  6. Iptables

People responsible: Eduardo and Vanessa

GPU utilization monitoring#

Task: Monitoring GPU utilization and contacting users upon suboptimal usage.

People responsible: Tianyu and Joren

Documentation updates#

Task: Updating documentation when necessary.

People responsible: Iris

DeepOps upstream merge#

Task: DeepOps upstream merge and testing.

People responsible: Joren & rest