Task division
Contents
Task division#
This page describes tasks for RPHC administrators and the people responsible for them.
QoS Administration#
Task: Monitoring the request channel and asking admins for opinions whether requests are reasonable. Then changing the QoS settings accordingly and setting a reminder to revert it.
People responsible: Lishan, Tianyu and Samuele
Project/home/data folder administration#
Task: Monitoring storage space allocation in project/home/data folders and snapshot management.
People responsible: Vanessa and Marek
Tape backups#
Task: Monitoring tape backups and ensuring that they are running correctly.
People responsible: Joren and Marek
Slurm quirk/crash investigations#
Task: Investigating slurm quirk/crash.
kill task failed
Jobs on nearly-full machines not getting allocated
People responsible: No one assigned yet
Minor ansible tasks#
Task: Minor ansible tasks.
People responsible: Marek and Joren
Patch Monday tasks#
Task: All tasks that are associated with patch Monday.
Reservations
Reviving cluster
Wipe /processing disks
Make playbook?
Ask Ameer about patching workflow
Iptables
People responsible: Eduardo and Vanessa
GPU utilization monitoring#
Task: Monitoring GPU utilization and contacting users upon suboptimal usage.
People responsible: Tianyu and Joren
Documentation updates#
Task: Updating documentation when necessary.
People responsible: Iris
DeepOps upstream merge#
Task: DeepOps upstream merge and testing.
People responsible: Joren & rest