<?xml version="1.0"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>News and Service Announcements</title><link>https://hpc-risoe.dtu.dk</link><atom:link href="https://hpc-risoe.dtu.dk/news?rss=1" rel="self" type="application/rss+xml" /><description><![CDATA[]]></description><copyright>Copyright 2026 dtu.dk. All rights reserved.</copyright><item><title>Data loss on aiolos and home</title><link>https://hpc-risoe.dtu.dk/news/nyhed?id=c2880d91-40cd-487f-81c7-2e5ddda9c03f</link><description><![CDATA[<img src="https://hpc-risoe.dtu.dk/-/media/standardimages/nyhed_minus_billed2uk.ashx?mw=220&hash=37F2601F996D08BF5A8BF3F0DF60B3E0" alt="Data loss on aiolos and home" width="220" /><br />Data loss on aiolos and home, follow here for information and updates.]]></description><pubDate>Thu, 16 Apr 2020 18:56:44 +0200</pubDate><guid>https://hpc-risoe.dtu.dk/news/nyhed?id=c2880d91-40cd-487f-81c7-2e5ddda9c03f</guid></item><item><title>SAMBA v1 access closed</title><link>https://hpc-risoe.dtu.dk/news/nyhed?id=32c33a5f-3a63-49fb-99d6-dba49915f27b</link><description><![CDATA[<img src="https://hpc-risoe.dtu.dk/-/media/standardimages/nyhed_minus_billed2uk.ashx?mw=220&hash=37F2601F996D08BF5A8BF3F0DF60B3E0" alt="SAMBA v1 access closed" width="220" /><br />]]></description><pubDate>Wed, 27 Nov 2019 12:41:19 +0200</pubDate><guid>https://hpc-risoe.dtu.dk/news/nyhed?id=32c33a5f-3a63-49fb-99d6-dba49915f27b</guid></item><item><title>Gorm home data restored</title><link>https://hpc-risoe.dtu.dk/news/nyhed?id=04b640ad-d57e-485e-8591-5fc7f4921f1b</link><description><![CDATA[<img src="https://hpc-risoe.dtu.dk/-/media/standardimages/nyhed_minus_billed2uk.ashx?mw=220&hash=37F2601F996D08BF5A8BF3F0DF60B3E0" alt="Gorm home data restored" width="220" /><br />
The latest backup of gorm home was successfully restored to mimer. The data is accessible from jess under /mnt/mimer/gormhome.
Gorm's Lustre storage system is still down and it is very unlikely that we will be able to get it back up again. Therefore, it is also very unlikely that gorm will become available again in its current configuration. Any users with accounts on gorm should get an account on jess, if they don't already have one.]]></description><pubDate>Tue, 20 Nov 2018 09:08:44 +0200</pubDate><guid>https://hpc-risoe.dtu.dk/news/nyhed?id=04b640ad-d57e-485e-8591-5fc7f4921f1b</guid></item><item><title>Gorm home inaccessible - internal Lustre problem</title><link>https://hpc-risoe.dtu.dk/news/nyhed?id=c1e45c2f-31dd-4f99-9c2e-aac52a8a513f</link><description><![CDATA[<img src="https://hpc-risoe.dtu.dk/-/media/standardimages/nyhed_minus_billed2uk.ashx?mw=220&hash=37F2601F996D08BF5A8BF3F0DF60B3E0" alt="Gorm home inaccessible - internal Lustre problem" width="220" /><br />We are experiencing an internal Lustre error that prevents the head- and compute nodes from connecting to gorm home. We are working on finding the root cause of this problem. Due to the nature of this problem, we are unable to provide an estimate for recovery. We will keep posting updated here.]]></description><pubDate>Tue, 30 Oct 2018 13:24:29 +0200</pubDate><guid>https://hpc-risoe.dtu.dk/news/nyhed?id=c1e45c2f-31dd-4f99-9c2e-aac52a8a513f</guid></item><item><title>Gorm home inaccessible - rebuild finished</title><link>https://hpc-risoe.dtu.dk/news/nyhed?id=abac521e-333e-49c2-9cb4-fabeecfffa83</link><description><![CDATA[<img src="https://hpc-risoe.dtu.dk/-/media/standardimages/nyhed_minus_billed2uk.ashx?mw=220&hash=37F2601F996D08BF5A8BF3F0DF60B3E0" alt="Gorm home inaccessible - rebuild finished" width="220" /><br />The rebuild of the raid sets finished and we already executed a file system check and a repair. We are currently running a final check and, if no errors are found, will proceed with restoring access to gorm.]]></description><pubDate>Mon, 29 Oct 2018 09:09:02 +0200</pubDate><guid>https://hpc-risoe.dtu.dk/news/nyhed?id=abac521e-333e-49c2-9cb4-fabeecfffa83</guid></item><item><title>Gorm home inaccessible - rebuild progress</title><link>https://hpc-risoe.dtu.dk/news/nyhed?id=41919a1e-5792-4439-863b-fe29e05f1955</link><description><![CDATA[<img src="https://hpc-risoe.dtu.dk/-/media/standardimages/nyhed_minus_billed2uk.ashx?mw=220&hash=37F2601F996D08BF5A8BF3F0DF60B3E0" alt="Gorm home inaccessible - rebuild progress" width="220" /><br />We are making progress with restoring the raid sets of gorm home. The pool rebuild reached an improved redundancy level today and we are working on restoring access to gorm home.]]></description><pubDate>Fri, 26 Oct 2018 09:50:28 +0200</pubDate><guid>https://hpc-risoe.dtu.dk/news/nyhed?id=41919a1e-5792-4439-863b-fe29e05f1955</guid></item><item><title>Gorm home inaccessible - rebuild started as well</title><link>https://hpc-risoe.dtu.dk/news/nyhed?id=e5139942-6f9a-4f3f-8436-9bcb339d8710</link><description><![CDATA[<img src="https://hpc-risoe.dtu.dk/-/media/standardimages/nyhed_minus_billed2uk.ashx?mw=220&hash=37F2601F996D08BF5A8BF3F0DF60B3E0" alt="Gorm home inaccessible - rebuild started as well" width="220" /><br />
We managed to restore internal access to the raid sets, which allows us to rebuild the missing disks. We started a rebuild and hope it will succeed without errors. Once we have a safe level of redundancy, we will work on getting the gorm home file system up.
We will also continue with the restore of gorm home from tape in case it is not possible to repair the raid sets.]]></description><pubDate>Thu, 25 Oct 2018 15:25:09 +0200</pubDate><guid>https://hpc-risoe.dtu.dk/news/nyhed?id=e5139942-6f9a-4f3f-8436-9bcb339d8710</guid></item><item><title>Gorm home inaccessible - restore started</title><link>https://hpc-risoe.dtu.dk/news/nyhed?id=2ff5c9ec-9be5-4a60-9e17-c1c0a7b71fd0</link><description><![CDATA[<img src="https://hpc-risoe.dtu.dk/-/media/standardimages/nyhed_minus_billed2uk.ashx?mw=220&hash=37F2601F996D08BF5A8BF3F0DF60B3E0" alt="Gorm home inaccessible - restore started" width="220" /><br />]]></description><pubDate>Thu, 25 Oct 2018 09:00:26 +0200</pubDate><guid>https://hpc-risoe.dtu.dk/news/nyhed?id=2ff5c9ec-9be5-4a60-9e17-c1c0a7b71fd0</guid></item><item><title>Gorm home inaccessible</title><link>https://hpc-risoe.dtu.dk/news/nyhed?id=5b369b4c-0338-4169-8f47-48081ede1e34</link><description><![CDATA[<img src="https://hpc-risoe.dtu.dk/-/media/standardimages/nyhed_minus_billed2uk.ashx?mw=220&hash=37F2601F996D08BF5A8BF3F0DF60B3E0" alt="Gorm home inaccessible" width="220" /><br />A large number of disks of our old DDN system failed. As a consequence, the raid sets for gorm home are in a critical state and changed to read-only mode. It is, therefore, currently not possible to access gorm. We try to recover the raid sets. Since we are out of support for the old storage system, we cannot give an estimate for recovery time, nor if we can recover at all.]]></description><pubDate>Mon, 22 Oct 2018 11:25:14 +0200</pubDate><guid>https://hpc-risoe.dtu.dk/news/nyhed?id=5b369b4c-0338-4169-8f47-48081ede1e34</guid></item><item><title>Access to storage restored</title><link>https://hpc-risoe.dtu.dk/news/nyhed?id=0a8255fb-bf76-42cc-b486-24b4866a8619</link><description><![CDATA[<img src="https://hpc-risoe.dtu.dk/-/media/standardimages/nyhed_minus_billed2uk.ashx?mw=220&hash=37F2601F996D08BF5A8BF3F0DF60B3E0" alt="Access to storage restored" width="220" /><br />Access to our Lustre storage has been restored. Some compute nodes crashed, but most jobs should continue from the point they were blocked. In case we need to increase the wall-time, please contact the HPC help desk.]]></description><pubDate>Tue, 16 Oct 2018 23:41:36 +0200</pubDate><guid>https://hpc-risoe.dtu.dk/news/nyhed?id=0a8255fb-bf76-42cc-b486-24b4866a8619</guid></item><item><title>Access to storage lost</title><link>https://hpc-risoe.dtu.dk/news/nyhed?id=2acd2ba5-c48c-4d1b-9fdc-dd92092b8d04</link><description><![CDATA[<img src="https://hpc-risoe.dtu.dk/-/media/standardimages/nyhed_minus_billed2uk.ashx?mw=220&hash=37F2601F996D08BF5A8BF3F0DF60B3E0" alt="Access to storage lost" width="220" /><br />
During UPS replacement a storage controller crashed. Access to the Lustre storage is, therefore, only partially available. Access to gorm home is fine, but jess home, mimer and aiolos experience a partial outage.
We are working with DDN to get access to storage back as soon as possible.]]></description><pubDate>Tue, 16 Oct 2018 15:46:11 +0200</pubDate><guid>https://hpc-risoe.dtu.dk/news/nyhed?id=2acd2ba5-c48c-4d1b-9fdc-dd92092b8d04</guid></item><item><title>Power supply upgrade for server room</title><link>https://hpc-risoe.dtu.dk/news/nyhed?id=2412faf4-0ef0-42b1-afde-68b40306c1ee</link><description><![CDATA[<img src="https://hpc-risoe.dtu.dk/-/media/standardimages/nyhed_minus_billed2uk.ashx?mw=220&hash=37F2601F996D08BF5A8BF3F0DF60B3E0" alt="Power supply upgrade for server room" width="220" /><br />
The power supply to our server room will be upgraded to a fully redundant setup with two power sources. The switch-over will happen on the weekend from May 5-6. During this time, only emergency services will be running.
We expect that the HPC clusters jess and gorm will be up again by Monday.]]></description><pubDate>Fri, 04 May 2018 16:54:58 +0200</pubDate><guid>https://hpc-risoe.dtu.dk/news/nyhed?id=2412faf4-0ef0-42b1-afde-68b40306c1ee</guid></item><item><title>Storage up</title><link>https://hpc-risoe.dtu.dk/news/nyhed?id=b712a416-db3a-4a04-b71d-fbb081aa0d96</link><description><![CDATA[<img src="https://hpc-risoe.dtu.dk/-/media/standardimages/nyhed_minus_billed2uk.ashx?mw=220&hash=37F2601F996D08BF5A8BF3F0DF60B3E0" alt="Storage up" width="220" /><br />
The storage and file system issues have been sorted out together with DDN and after the rebuilding of disk pools and extended file system checks the Jess cluster is online again. ]]></description><pubDate>Mon, 16 Apr 2018 01:28:00 +0200</pubDate><guid>https://hpc-risoe.dtu.dk/news/nyhed?id=b712a416-db3a-4a04-b71d-fbb081aa0d96</guid></item><item><title>Storage down Update 6</title><link>https://hpc-risoe.dtu.dk/news/nyhed?id=76de3e50-f20c-4a34-9c22-ef479f3c5081</link><description><![CDATA[<img src="https://hpc-risoe.dtu.dk/-/media/standardimages/nyhed_minus_billed2uk.ashx?mw=220&hash=37F2601F996D08BF5A8BF3F0DF60B3E0" alt="Storage down Update 6" width="220" /><br />
Unfortunately, we still don't have received a working solution for our problem from DDN support. We will continue to work on this on Monday.]]></description><pubDate>Fri, 13 Apr 2018 19:11:42 +0200</pubDate><guid>https://hpc-risoe.dtu.dk/news/nyhed?id=76de3e50-f20c-4a34-9c22-ef479f3c5081</guid></item><item><title>Storage down Update 5</title><link>https://hpc-risoe.dtu.dk/news/nyhed?id=7c650957-6f17-491e-9131-3bd802334039</link><description><![CDATA[<img src="https://hpc-risoe.dtu.dk/-/media/standardimages/nyhed_minus_billed2uk.ashx?mw=220&hash=37F2601F996D08BF5A8BF3F0DF60B3E0" alt="Storage down Update 5" width="220" /><br />The second repair run finished without any hiccups. Unfortunately, we are having problems with a pair of storage servers and cannot activate the disks. We are still in contact with DDN to resolve this issue.]]></description><pubDate>Fri, 13 Apr 2018 18:00:24 +0200</pubDate><guid>https://hpc-risoe.dtu.dk/news/nyhed?id=7c650957-6f17-491e-9131-3bd802334039</guid></item><item><title>Storage down Update 4</title><link>https://hpc-risoe.dtu.dk/news/nyhed?id=17b08718-fda4-4f97-94fc-b1f24709f495</link><description><![CDATA[<img src="https://hpc-risoe.dtu.dk/-/media/standardimages/nyhed_minus_billed2uk.ashx?mw=220&hash=37F2601F996D08BF5A8BF3F0DF60B3E0" alt="Storage down Update 4" width="220" /><br />We were asked by DDN to run a second repair on the storage, which we expect to finish very soon. Afterwards, we can take the storage on-line and boot the clusters up again. We are aiming for today afternoon - assuming that no further problems show up.]]></description><pubDate>Fri, 13 Apr 2018 12:15:06 +0200</pubDate><guid>https://hpc-risoe.dtu.dk/news/nyhed?id=17b08718-fda4-4f97-94fc-b1f24709f495</guid></item><item><title>Storage down Update 3</title><link>https://hpc-risoe.dtu.dk/news/nyhed?id=0db99f28-4151-427f-befc-4a8cbae07891</link><description><![CDATA[<img src="https://hpc-risoe.dtu.dk/-/media/standardimages/nyhed_minus_billed2uk.ashx?mw=220&hash=37F2601F996D08BF5A8BF3F0DF60B3E0" alt="Storage down Update 3" width="220" /><br />The file system check is completed. We had to run a "safe_repair" on the storage and there is a minor number of inconsistencies that could not be fixed automatically. We are awaiting a decision from DDN if these inconsistencies are safe to ignore or require manual intervention.]]></description><pubDate>Fri, 13 Apr 2018 00:53:56 +0200</pubDate><guid>https://hpc-risoe.dtu.dk/news/nyhed?id=0db99f28-4151-427f-befc-4a8cbae07891</guid></item><item><title>Storage down Update 2</title><link>https://hpc-risoe.dtu.dk/news/nyhed?id=c5ebc9ac-59c7-446c-8539-8799bccd2804</link><description><![CDATA[<img src="https://hpc-risoe.dtu.dk/-/media/standardimages/nyhed_minus_billed2uk.ashx?mw=220&hash=37F2601F996D08BF5A8BF3F0DF60B3E0" alt="Storage down Update 2" width="220" /><br />The verify processes on our DDN disk arrays are finished and we now run a file system check. So far, no major problems have been discovered. We will post an update after the check is finished.]]></description><pubDate>Thu, 12 Apr 2018 13:25:36 +0200</pubDate><guid>https://hpc-risoe.dtu.dk/news/nyhed?id=c5ebc9ac-59c7-446c-8539-8799bccd2804</guid></item><item><title>Storage down Update 1</title><link>https://hpc-risoe.dtu.dk/news/nyhed?id=15af0569-f667-48ef-ba24-563808197a4e</link><description><![CDATA[<img src="https://hpc-risoe.dtu.dk/-/media/standardimages/nyhed_minus_billed2uk.ashx?mw=220&hash=37F2601F996D08BF5A8BF3F0DF60B3E0" alt="Storage down Update 1" width="220" /><br />
Yesterday night we got the DDN disk arrays back up. Due to a potential write back cache loss the controllers perform a forced verify on all pools to ensure data integrity. In addition a minor number of disks is rebuilt. So far, all storage is fully redundant.
DDN recommends to wait for the verify processes to finish before continuing with a file system check. The file system check will replay data that was in-flight and also report any actual errors. We have some indications that the file systems will turn out clean and we will post updates here.
The verify processes are expected to finish by lunchtime, Thursday April 12.]]></description><pubDate>Wed, 11 Apr 2018 12:37:24 +0200</pubDate><guid>https://hpc-risoe.dtu.dk/news/nyhed?id=15af0569-f667-48ef-ba24-563808197a4e</guid></item><item><title>Storage down</title><link>https://hpc-risoe.dtu.dk/news/nyhed?id=3f26f216-f56f-4292-b753-39d0e67518da</link><description><![CDATA[<img src="https://hpc-risoe.dtu.dk/-/media/standardimages/nyhed_minus_billed2uk.ashx?mw=220&hash=37F2601F996D08BF5A8BF3F0DF60B3E0" alt="Storage down" width="220" /><br />We currently experience a loss of access to storage on the new mimer system. We are working with DDN to get access back.]]></description><pubDate>Tue, 10 Apr 2018 14:33:45 +0200</pubDate><guid>https://hpc-risoe.dtu.dk/news/nyhed?id=3f26f216-f56f-4292-b753-39d0e67518da</guid></item><item><title>Infiniband problem solved</title><link>https://hpc-risoe.dtu.dk/news/nyhed?id=a04a1827-8a1a-48e6-b290-c722678d5dda</link><description><![CDATA[<img src="https://hpc-risoe.dtu.dk/-/media/standardimages/nyhed_minus_billed2uk.ashx?mw=220&hash=37F2601F996D08BF5A8BF3F0DF60B3E0" alt="Infiniband problem solved" width="220" /><br />The infiniband fabric is up again and access to our clusters is restored.]]></description><pubDate>Thu, 25 Jan 2018 17:34:26 +0200</pubDate><guid>https://hpc-risoe.dtu.dk/news/nyhed?id=a04a1827-8a1a-48e6-b290-c722678d5dda</guid></item><item><title>Infiniband problems</title><link>https://hpc-risoe.dtu.dk/news/nyhed?id=755a7903-5a27-4265-a2c3-1dcb70ff7698</link><description><![CDATA[<img src="https://hpc-risoe.dtu.dk/-/media/standardimages/nyhed_minus_billed2uk.ashx?mw=220&hash=37F2601F996D08BF5A8BF3F0DF60B3E0" alt="Infiniband problems" width="220" /><br />
We are experiencing problems with the infiniband fabric. The connection from the compute nodes to the storage is lost. Therefore, it is currently not possible to log in or submit jobs.]]></description><pubDate>Thu, 25 Jan 2018 11:10:20 +0200</pubDate><guid>https://hpc-risoe.dtu.dk/news/nyhed?id=755a7903-5a27-4265-a2c3-1dcb70ff7698</guid></item><item><title>Firmware upgrade finished</title><link>https://hpc-risoe.dtu.dk/news/nyhed?id=9fe05ca2-b036-4150-9163-b5648c64d22d</link><description><![CDATA[<img src="https://hpc-risoe.dtu.dk/-/media/standardimages/nyhed_minus_billed2uk.ashx?mw=220&hash=37F2601F996D08BF5A8BF3F0DF60B3E0" alt="Firmware upgrade finished" width="220" /><br />We finished the firmware update on our storage system. A few nodes of jess failed to reboot cleanly and we are working on fixing these step by step.
]]></description><pubDate>Wed, 23 Aug 2017 18:36:44 +0200</pubDate><guid>https://hpc-risoe.dtu.dk/news/nyhed?id=9fe05ca2-b036-4150-9163-b5648c64d22d</guid></item><item><title>Extended down time update</title><link>https://hpc-risoe.dtu.dk/news/nyhed?id=99d05a20-5d8a-4d28-9952-29431279a5a0</link><description><![CDATA[<img src="https://hpc-risoe.dtu.dk/-/media/standardimages/nyhed_minus_billed2uk.ashx?mw=220&hash=37F2601F996D08BF5A8BF3F0DF60B3E0" alt="Extended down time update" width="220" /><br />
We managed to get access to all disk arrays back. We proceed with the upgrade procedure now.
]]></description><pubDate>Wed, 23 Aug 2017 13:01:59 +0200</pubDate><guid>https://hpc-risoe.dtu.dk/news/nyhed?id=99d05a20-5d8a-4d28-9952-29431279a5a0</guid></item><item><title>Extended down time</title><link>https://hpc-risoe.dtu.dk/news/nyhed?id=0956cb21-95fb-4a67-ab9e-43c91a3d7114</link><description><![CDATA[<img src="https://hpc-risoe.dtu.dk/-/media/standardimages/nyhed_minus_billed2uk.ashx?mw=220&hash=37F2601F996D08BF5A8BF3F0DF60B3E0" alt="Extended down time" width="220" /><br />Status of firmware upgrade]]></description><pubDate>Wed, 23 Aug 2017 09:34:57 +0200</pubDate><guid>https://hpc-risoe.dtu.dk/news/nyhed?id=0956cb21-95fb-4a67-ab9e-43c91a3d7114</guid></item><item><title>Service window with down time</title><link>https://hpc-risoe.dtu.dk/news/nyhed?id=a7ab639f-9df3-46c5-a4be-5fc8c0143360</link><description><![CDATA[<img src="https://hpc-risoe.dtu.dk/-/media/standardimages/nyhed_minus_billed2uk.ashx?mw=220&hash=37F2601F996D08BF5A8BF3F0DF60B3E0" alt="Service window with down time" width="220" /><br />Firmware update of Lustre storage and repair works on jess and gorm.]]></description><pubDate>Fri, 11 Aug 2017 12:58:36 +0200</pubDate><guid>https://hpc-risoe.dtu.dk/news/nyhed?id=a7ab639f-9df3-46c5-a4be-5fc8c0143360</guid></item><item><title>Gorm back on-line</title><link>https://hpc-risoe.dtu.dk/news/nyhed?id=2080696c-b152-4c75-bd7e-ed693fb7657f</link><description><![CDATA[<img src="https://hpc-risoe.dtu.dk/-/media/standardimages/nyhed_minus_billed2uk.ashx?mw=220&hash=37F2601F996D08BF5A8BF3F0DF60B3E0" alt="Gorm back on-line" width="220" /><br />Gorm is back on-line. A leaf switch on the infiniband network needed to be restarted.]]></description><pubDate>Tue, 23 May 2017 11:45:50 +0200</pubDate><guid>https://hpc-risoe.dtu.dk/news/nyhed?id=2080696c-b152-4c75-bd7e-ed693fb7657f</guid></item><item><title>Hardware fails on gorm</title><link>https://hpc-risoe.dtu.dk/news/nyhed?id=0bd9ac10-3ada-4de8-9286-28572613b484</link><description><![CDATA[<img src="https://hpc-risoe.dtu.dk/-/media/standardimages/nyhed_minus_billed2uk.ashx?mw=220&hash=37F2601F996D08BF5A8BF3F0DF60B3E0" alt="Hardware fails on gorm" width="220" /><br />We currently experience infiniband failures on gorm. We are working on the problem to bring gorm back up.
]]></description><pubDate>Tue, 23 May 2017 08:56:18 +0200</pubDate><guid>https://hpc-risoe.dtu.dk/news/nyhed?id=0bd9ac10-3ada-4de8-9286-28572613b484</guid></item><item><title>Compute clusters up again</title><link>https://hpc-risoe.dtu.dk/news/nyhed?id=b6444632-5d4d-44ee-8174-7cdb41e12536</link><description><![CDATA[<img src="https://hpc-risoe.dtu.dk/-/media/standardimages/nyhed_minus_billed2uk.ashx?mw=220&hash=37F2601F996D08BF5A8BF3F0DF60B3E0" alt="Compute clusters up again" width="220" /><br />
Our compute clusters are up and running again. It looks like the batch system did not take the crash too well. You might need to restart some jobs.
There was no loss of data.
Happy Easter!]]></description><pubDate>Wed, 12 Apr 2017 14:53:31 +0200</pubDate><guid>https://hpc-risoe.dtu.dk/news/nyhed?id=b6444632-5d4d-44ee-8174-7cdb41e12536</guid></item><item><title>Cooling system failed today - all servers down</title><link>https://hpc-risoe.dtu.dk/news/nyhed?id=c7db39aa-ad14-4a2e-8c1f-b8d4c850709a</link><description><![CDATA[<img src="https://hpc-risoe.dtu.dk/-/media/standardimages/nyhed_minus_billed2uk.ashx?mw=220&hash=37F2601F996D08BF5A8BF3F0DF60B3E0" alt="Cooling system failed today - all servers down" width="220" /><br />
The cooling system in our server room failed today. All servers were shut down due to (the danger of) overheating. We might have lost a couple of nodes.
The storage systems seem OK. We will check and fix any problems on the storage servers before starting the compute servers up again.
We expect the servers to be up in the afternoon. News about progress and failed nodes will be posted here.]]></description><pubDate>Wed, 12 Apr 2017 10:57:09 +0200</pubDate><guid>https://hpc-risoe.dtu.dk/news/nyhed?id=c7db39aa-ad14-4a2e-8c1f-b8d4c850709a</guid></item><item><title>Systems running, but under maintenance</title><link>https://hpc-risoe.dtu.dk/news/nyhed?id=7d417d17-d7b5-4db0-a83e-4f0bedb0a19a</link><description><![CDATA[<img src="https://hpc-risoe.dtu.dk/-/media/standardimages/nyhed_minus_billed2uk.ashx?mw=220&hash=37F2601F996D08BF5A8BF3F0DF60B3E0" alt="Systems running, but under maintenance" width="220" /><br />We implemented a temporary modification to the mimer storage system, which, hopefully, will keep the system available while we are investigating the problem.]]></description><pubDate>Tue, 14 Mar 2017 15:46:12 +0200</pubDate><guid>https://hpc-risoe.dtu.dk/news/nyhed?id=7d417d17-d7b5-4db0-a83e-4f0bedb0a19a</guid></item><item><title>Hardware problems, mimer and gorm down</title><link>https://hpc-risoe.dtu.dk/news/nyhed?id=12dce25c-e1f6-410d-a2dc-8417f9e04c76</link><description><![CDATA[<img src="https://hpc-risoe.dtu.dk/-/media/standardimages/nyhed_minus_billed2uk.ashx?mw=220&hash=37F2601F996D08BF5A8BF3F0DF60B3E0" alt="Hardware problems, mimer and gorm down" width="220" /><br />
We experienced repeated crashes of our storage servers, which export the /home file system on gorm and the file system /mnt/oldjesshome on jess. As a consequence, it is currently not possible to access gorm. The jobs are still running and will, if they don't access /home, finish their computations. Jobs that access /home may terminate prematurely.
We are currently investigating the source of the problem. Updates will be posted here.]]></description><pubDate>Tue, 14 Mar 2017 13:28:47 +0200</pubDate><guid>https://hpc-risoe.dtu.dk/news/nyhed?id=12dce25c-e1f6-410d-a2dc-8417f9e04c76</guid></item><item><title>Crash of storage system old-mimer</title><link>https://hpc-risoe.dtu.dk/news/nyhed?id=a0d43eec-a762-4794-9795-6fc836be46f3</link><description><![CDATA[<img src="https://hpc-risoe.dtu.dk/-/media/standardimages/nyhed_minus_billed2uk.ashx?mw=220&hash=37F2601F996D08BF5A8BF3F0DF60B3E0" alt="Crash of storage system old-mimer" width="220" /><br />We experienced a serious crash of our storage system old-mimer yesterday night, affecting both clusters jess and gorm. Connection was restored for jess shortly before midnight. Fixing gorm required work until today noon. The systems are now up and running again.]]></description><pubDate>Fri, 10 Mar 2017 11:27:13 +0200</pubDate><guid>https://hpc-risoe.dtu.dk/news/nyhed?id=a0d43eec-a762-4794-9795-6fc836be46f3</guid></item><item><title>Storage upgrade finished</title><link>https://hpc-risoe.dtu.dk/news/nyhed?id=26f1907a-1694-4e55-ade1-2e3660015e27</link><description><![CDATA[<img src="https://hpc-risoe.dtu.dk/-/media/standardimages/nyhed_minus_billed2uk.ashx?mw=220&hash=37F2601F996D08BF5A8BF3F0DF60B3E0" alt="Storage upgrade finished" width="220" /><br />
We successfully completed upgrading of storage for the file systems mimer and aiolos.]]></description><pubDate>Tue, 07 Mar 2017 17:31:17 +0200</pubDate><guid>https://hpc-risoe.dtu.dk/news/nyhed?id=26f1907a-1694-4e55-ade1-2e3660015e27</guid></item><item><title>Upgrading aiolos file system</title><link>https://hpc-risoe.dtu.dk/news/nyhed?id=7ff2bf8c-f304-4e81-8c12-491b36c9bba5</link><description><![CDATA[<img src="https://hpc-risoe.dtu.dk/-/media/standardimages/nyhed_minus_billed2uk.ashx?mw=220&hash=37F2601F996D08BF5A8BF3F0DF60B3E0" alt="Upgrading aiolos file system" width="220" /><br />We experience problems adding the new disks for the file system /mnt/aiolos. We are in contact with DDN support.]]></description><pubDate>Tue, 07 Mar 2017 10:20:45 +0200</pubDate><guid>https://hpc-risoe.dtu.dk/news/nyhed?id=7ff2bf8c-f304-4e81-8c12-491b36c9bba5</guid></item><item><title>Storage added to mimer</title><link>https://hpc-risoe.dtu.dk/news/nyhed?id=dd2b1e97-ec56-4a5b-96e6-7aa5a0654118</link><description><![CDATA[<img src="https://hpc-risoe.dtu.dk/-/media/standardimages/nyhed_minus_billed2uk.ashx?mw=220&hash=37F2601F996D08BF5A8BF3F0DF60B3E0" alt="Storage added to mimer" width="220" /><br />We added 20 disks to mimer. We will add 80 disks to aiolos soon - the formatting of these disks is still running. We expect the disks to be ready tomorrow. Another 20 disks will be added to aiolos in the near future.]]></description><pubDate>Thu, 02 Mar 2017 07:52:40 +0200</pubDate><guid>https://hpc-risoe.dtu.dk/news/nyhed?id=dd2b1e97-ec56-4a5b-96e6-7aa5a0654118</guid></item><item><title>Storage back and jobs running again</title><link>https://hpc-risoe.dtu.dk/news/nyhed?id=cd4333e6-2291-4e8c-a86e-e78d5774cd5d</link><description><![CDATA[<img src="https://hpc-risoe.dtu.dk/-/media/standardimages/nyhed_minus_billed2uk.ashx?mw=220&hash=37F2601F996D08BF5A8BF3F0DF60B3E0" alt="Storage back and jobs running again" width="220" /><br />
Our storage system is back on-line and jobs are executed again.
Due to an accident, the job queue on jess was cleared. We apologise for the inconvenience.]]></description><pubDate>Tue, 28 Feb 2017 19:43:29 +0200</pubDate><guid>https://hpc-risoe.dtu.dk/news/nyhed?id=cd4333e6-2291-4e8c-a86e-e78d5774cd5d</guid></item><item><title>Outage of storage system</title><link>https://hpc-risoe.dtu.dk/news/nyhed?id=a9949018-8228-44a7-94af-e683d8cf5ffe</link><description><![CDATA[<img src="https://hpc-risoe.dtu.dk/-/media/standardimages/nyhed_minus_billed2uk.ashx?mw=220&hash=37F2601F996D08BF5A8BF3F0DF60B3E0" alt="Outage of storage system" width="220" /><br />
Our consultant from DDN arrived on campus. Unfortunately, it is not possible to add the new enclosures to the storage system without shutting the controllers off. We will wait for as many jobs as possible to finish and try to reschedule others before we shut the storage system down.
Affected file systems are mimer, aiolos and home on jess.]]></description><pubDate>Tue, 28 Feb 2017 12:50:28 +0200</pubDate><guid>https://hpc-risoe.dtu.dk/news/nyhed?id=a9949018-8228-44a7-94af-e683d8cf5ffe</guid></item><item><title>Queues on jess and gorm stopped</title><link>https://hpc-risoe.dtu.dk/news/nyhed?id=6e9e21f5-eed9-4893-baf9-567caaa87765</link><description><![CDATA[<img src="https://hpc-risoe.dtu.dk/-/media/standardimages/nyhed_minus_billed2uk.ashx?mw=220&hash=37F2601F996D08BF5A8BF3F0DF60B3E0" alt="Queues on jess and gorm stopped" width="220" /><br />
Execution of jobs on gorm and jess is temporarily suspended for a service window. We will upgrade the cluster storage and may need to unmount some file systems. Jobs accessing these file systems may need to be terminated and rescheduled.
It is possible to submit jobs. Please do not make mass-submissions (>50 jobs) while the queues are stopped. Please do not submit jobs if there are already more than 5000 in the queue.]]></description><pubDate>Tue, 28 Feb 2017 10:24:43 +0200</pubDate><guid>https://hpc-risoe.dtu.dk/news/nyhed?id=6e9e21f5-eed9-4893-baf9-567caaa87765</guid></item><item><title>Service announcement</title><link>https://hpc-risoe.dtu.dk/news/nyhed?id=d8576525-bf63-4536-840c-8a04d23dbcb3</link><description><![CDATA[<img src="https://hpc-risoe.dtu.dk/-/media/standardimages/nyhed_minus_billed2uk.ashx?mw=220&hash=37F2601F996D08BF5A8BF3F0DF60B3E0" alt="Service announcement" width="220" /><br />
During the period from Feb 28 to March 2 we will add new storage to aiolos. As part of this task we may need to shut down the entire cluster storage system. If this is necessary, we will aim at restricting the down time to the afternoon of Feb 28. We will post updates as soon as more information becomes available.
Please try to plan jobs to avoid the potential down time window.]]></description><pubDate>Fri, 17 Feb 2017 12:53:03 +0200</pubDate><guid>https://hpc-risoe.dtu.dk/news/nyhed?id=d8576525-bf63-4536-840c-8a04d23dbcb3</guid></item><item><title>HPC TechTalks</title><link>https://hpc-risoe.dtu.dk/news/nyhed?id=6a32bb5a-3ec4-4e89-858c-56a0542f6201</link><description><![CDATA[<img src="https://hpc-risoe.dtu.dk/-/media/standardimages/nyhed_minus_billed2uk.ashx?mw=220&hash=37F2601F996D08BF5A8BF3F0DF60B3E0" alt="HPC TechTalks" width="220" /><br />
The next techtalk is dedicated to some very short tutorials and discussions with the users about observations in the batch system and about interest in an introduction course for new users. Time and place:  23rd of February at 13:30 in meeting room B109 - S22-28.
Please see the full text of the invitation here (intranet).]]></description><pubDate>Wed, 08 Feb 2017 16:34:40 +0200</pubDate><guid>https://hpc-risoe.dtu.dk/news/nyhed?id=6a32bb5a-3ec4-4e89-858c-56a0542f6201</guid></item><item><title>Gorm crashed</title><link>https://hpc-risoe.dtu.dk/news/nyhed?id=1f0b4847-bc7a-4353-8cb5-011e2c719516</link><description><![CDATA[<img src="https://hpc-risoe.dtu.dk/-/media/standardimages/nyhed_minus_billed2uk.ashx?mw=220&hash=37F2601F996D08BF5A8BF3F0DF60B3E0" alt="Gorm crashed" width="220" /><br />The login node of Gorm crashed today with a kernel panic. The node is back on-line.
]]></description><pubDate>Mon, 06 Feb 2017 14:38:10 +0200</pubDate><guid>https://hpc-risoe.dtu.dk/news/nyhed?id=1f0b4847-bc7a-4353-8cb5-011e2c719516</guid></item><item><title>Mimer: mounting file systems</title><link>https://hpc-risoe.dtu.dk/news/nyhed?id=6db25078-19ed-4011-9c41-98c0984f7e4f</link><description><![CDATA[<img src="https://hpc-risoe.dtu.dk/-/media/standardimages/nyhed_minus_billed2uk.ashx?mw=220&hash=37F2601F996D08BF5A8BF3F0DF60B3E0" alt="Mimer: mounting file systems" width="220" /><br />
Another computer on the win domain registered the host name mimer with the effect that our alias mimer does not point to our good old mimer here at Risø any more. This affects all connections where the short-cut mimer is used instead of the full host name mimer.risoe.dk. If you experience problems mounting file systems etc., please substitute the full host name and try again.
To avoid such problems in the future, please always use the full host name (for example, mimer.risoe.dk) instead of an alias in configuration files.]]></description><pubDate>Wed, 23 Nov 2016 16:29:38 +0200</pubDate><guid>https://hpc-risoe.dtu.dk/news/nyhed?id=6db25078-19ed-4011-9c41-98c0984f7e4f</guid></item><item><title>Problems on Jess - Resolved</title><link>https://hpc-risoe.dtu.dk/news/nyhed?id=cdacee31-508c-4fd1-b532-ca467d2cda77</link><description><![CDATA[<img src="https://hpc-risoe.dtu.dk/-/media/standardimages/nyhed_minus_billed2uk.ashx?mw=220&hash=37F2601F996D08BF5A8BF3F0DF60B3E0" alt="Problems on Jess - Resolved" width="220" /><br />We resolved the problem that occurred yesterday on Jess and added the affected nodes back to the queueing system. Jess should now operate normally.
]]></description><pubDate>Fri, 11 Nov 2016 15:44:05 +0200</pubDate><guid>https://hpc-risoe.dtu.dk/news/nyhed?id=cdacee31-508c-4fd1-b532-ca467d2cda77</guid></item><item><title>Problems on Jess, update 1</title><link>https://hpc-risoe.dtu.dk/news/nyhed?id=9e396617-6cf7-49d8-9a76-79c4cc7a3880</link><description><![CDATA[<img src="https://hpc-risoe.dtu.dk/-/media/standardimages/nyhed_minus_billed2uk.ashx?mw=220&hash=37F2601F996D08BF5A8BF3F0DF60B3E0" alt="Problems on Jess, update 1" width="220" /><br />The problem we observed seems under control. We enabled the queues again. A number of nodes has been off-lined and will be fixed and restarted during Friday.
]]></description><pubDate>Thu, 10 Nov 2016 20:02:16 +0200</pubDate><guid>https://hpc-risoe.dtu.dk/news/nyhed?id=9e396617-6cf7-49d8-9a76-79c4cc7a3880</guid></item><item><title>Problems on Jess</title><link>https://hpc-risoe.dtu.dk/news/nyhed?id=4d9bff49-ce61-4c79-8b64-d4747271cfd9</link><description><![CDATA[<img src="https://hpc-risoe.dtu.dk/-/media/standardimages/nyhed_minus_billed2uk.ashx?mw=220&hash=37F2601F996D08BF5A8BF3F0DF60B3E0" alt="Problems on Jess" width="220" /><br />
We are currently experiencing a lot of nodes falling off and we might need to take jess down.
We are working on resolving the problem.
]]></description><pubDate>Thu, 10 Nov 2016 17:30:22 +0200</pubDate><guid>https://hpc-risoe.dtu.dk/news/nyhed?id=4d9bff49-ce61-4c79-8b64-d4747271cfd9</guid></item><item><title>HPC TechTalk</title><link>https://hpc-risoe.dtu.dk/news/nyhed?id=bb09029a-4601-4f96-a90f-1499577bc2fd</link><description><![CDATA[<img src="https://hpc-risoe.dtu.dk/-/media/standardimages/nyhed_minus_billed2uk.ashx?mw=220&hash=37F2601F996D08BF5A8BF3F0DF60B3E0" alt="HPC TechTalk" width="220" /><br />
The next techtalk given by Neil Davis will focus on environment modules and user contributed modules under /home/MET. Time and place:  22nd of November at 10am in B111- 124 - R37-39.
If time permits, we will provide information about how to write personal module files and conclude with an introduction to the easybuild framework.
Please see the full text of the invitation here (intranet).]]></description><pubDate>Tue, 08 Nov 2016 20:02:59 +0200</pubDate><guid>https://hpc-risoe.dtu.dk/news/nyhed?id=bb09029a-4601-4f96-a90f-1499577bc2fd</guid></item><item><title>Unstable Network (Fixed)</title><link>https://hpc-risoe.dtu.dk/news/nyhed?id=8f15635c-295f-4fb8-897d-f44b095869bc</link><description><![CDATA[<img src="https://hpc-risoe.dtu.dk/-/media/standardimages/nyhed_minus_billed2uk.ashx?mw=220&hash=37F2601F996D08BF5A8BF3F0DF60B3E0" alt="Unstable Network (Fixed)" width="220" /><br />A source of networking problems on Jess has been found and fixed. Connections to license servers should not fail any more. In case of remaining problems, please file a support request.
]]></description><pubDate>Wed, 26 Oct 2016 18:15:18 +0200</pubDate><guid>https://hpc-risoe.dtu.dk/news/nyhed?id=8f15635c-295f-4fb8-897d-f44b095869bc</guid></item><item><title>Unstable Network</title><link>https://hpc-risoe.dtu.dk/news/nyhed?id=818295e2-3ab7-4db5-91bb-9bc2f9de6fa6</link><description><![CDATA[<img src="https://hpc-risoe.dtu.dk/-/media/standardimages/nyhed_minus_billed2uk.ashx?mw=220&hash=37F2601F996D08BF5A8BF3F0DF60B3E0" alt="Unstable Network" width="220" /><br />
Since Thursday October 20 we experience unstable behaviour of our network. This seems to affect connecting to license servers from our clusters. The problem is under investigation.
In case you observe an error due to an unreachable license server, please retry after a few minutes. If the problem persists, please submit a ticket.]]></description><pubDate>Tue, 25 Oct 2016 16:23:00 +0200</pubDate><guid>https://hpc-risoe.dtu.dk/news/nyhed?id=818295e2-3ab7-4db5-91bb-9bc2f9de6fa6</guid></item><item><title>HPC TechTalks</title><link>https://hpc-risoe.dtu.dk/news/nyhed?id=f5db973b-72ad-4edd-b9c7-aaec5c31e5fb</link><description><![CDATA[<img src="https://hpc-risoe.dtu.dk/-/media/standardimages/nyhed_minus_billed2uk.ashx?mw=220&hash=37F2601F996D08BF5A8BF3F0DF60B3E0" alt="HPC TechTalks" width="220" /><br />
The HPC Risø support team plans to offer a series of brief technical talks on topics of urgent as well as long-term interest to HPC cluster users and will publish the invitations and minutes here (intranet). The first introductory techtalk will be held 10AM on 27 October, 2016, in room S22 building 109, where users may bring up topics of interest. Further techtalks are planned to be monthly, though this will depend on demand.
Please see the full text of the invitation here (intranet).]]></description><pubDate>Fri, 21 Oct 2016 14:15:59 +0200</pubDate><guid>https://hpc-risoe.dtu.dk/news/nyhed?id=f5db973b-72ad-4edd-b9c7-aaec5c31e5fb</guid></item></channel></rss>