Start date
End date
The infiniband fabric is up again and access to our clusters is restored.
We are experiencing problems with the infiniband fabric. The connection from the compute nodes to the storage is lost. Therefore, it is currently not possible to log...
We finished the firmware update on our storage system. A few nodes of jess failed to reboot cleanly and we are working on fixing these step by step.
We managed to get access to all disk arrays back. We proceed with the upgrade procedure now.
Status of firmware upgrade
Firmware update of Lustre storage and repair works on jess and gorm.
Gorm is back on-line. A leaf switch on the infiniband network needed to be restarted.
We currently experience infiniband failures on gorm. We are working on the problem to bring gorm back up.
Our compute clusters are up and running again. It looks like the batch system did not take the crash too well. You might need to restart some jobs. There was no loss...
The cooling system in our server room failed today. All servers were shut down due to (the danger of) overheating. We might have lost a couple of nodes. The storage...