[Chtc-users] Final Reminder: HPC Cluster downtime starting Tuesday, Nov. 17 at 10am


Date: Mon, 16 Nov 2015 16:34:36 -0600
From: chtc-users@xxxxxxxxxxx
Subject: [Chtc-users] Final Reminder: HPC Cluster downtime starting Tuesday, Nov. 17 at 10am
Dear HPC Cluster Users,Â

If you only use CHTC's high-throughput computing (HTC) system via HTCondor submission, feel free to disregard the below message. Â

This is your final reminder that the HPC cluster will be unavailable starting at 10am tomorrow morning (November 17) for scheduled maintenance and filesystem upgrades. We will inform everyone through this email list when the cluster is back up, which could be as late as Friday, November 20. Â

Thank you,Â

Your CHTC team

---------- Forwarded message ----------
From: <chtc-users@xxxxxxxxxxx>
Date: Thu, Nov 12, 2015 at 9:59 AM
Subject: [Chtc-users] Reminder: HPC Cluster downtime starting Tuesday, Nov. 17; remove data ASAP
To: chtc-users@xxxxxxxxxxx


Greetings HPC Cluster Users,

... if you only use CHTC's high-throughput computing (HTC) system via HTCondor submission, feel free to disregard the below message ...


Please remember that the HPC Cluster will be taken off-line at 10am on Tuesday, Nov. 17, for an upgrade to the filesystem. The cluster may not be available again until Nov. 20, but we will email when it's back online.

After the cluster downtime, the /scratch location will have been deleted, and there will be a default 100 GB quota for all /home directories. If your home directory has more than 100 GB of data, we have already emailed you. After the cluster is back online, the filesystem will prevent you from adding more data until the amount of data in your home directory is reduced to a value below your quota.Â

We ask that all HPC users remove as much data as possible prior to the downtime.

Users needing more than the 100 GB of disk space forÂconcurrently running jobsÂcan send a request toÂchtc@xxxxxxxxxxxafter the cluster downtime,Âwith details of exactly how much space will be needed and for roughly how many jobs.ÂCHTC staff will ask such users to reduce the amount of data in their home directories before an increase in quota will be granted.


Please send any questions or comments to chtc@xxxxxxxxxxx
Thank you for your patience during the downtime and for your cooperation in helping us to implement policies that encourage better data practices.ÂIt has always been our policy that data from completed jobs be removed ASAP from the cluster, and that the cluster should not be treated as persistent data storage by any user. Data that is left on the cluster, especially large file counts, is the biggest contributor to user-reported performance issues.

Cheers,
Your CHTC Team


---------- Forwarded message ----------
From: CHTC
Date: Fri, Oct 23, 2015 at 5:43 PM
Subject:ÂHPCÂClusterÂdowntime, Nov. 17-20
To:Âchtc-users@xxxxxxxxxxx


GreetingsÂHPCÂClusterÂusers,

... if you only use CHTC's high-throughput computing (HTC) system via HTCondor submission, feel free to disregard the below message ...


In order to upgrade the file system version and modify configurations to improve filesystem performanceÂ
TheÂHPCÂClusterÂwill beÂdownÂfor upgrade starting on Tuesday, Nov. 17Â
with a return of functionalityÂbyÂFriday, Nov. 20


To Prepare for andÂReduceÂClusterÂDowntime
we ask ALL users to do the followingÂby Friday, Nov. 6:
  1. remove all data from the /scratch directory (otherwise, it will be deleted for you; see below)
  2. remove AS MUCH DATA AS POSSIBLE from the entire filesystem (the more data removed, the shorter theÂdowntime)
  3. remember: we have no backups, and filesystemÂdowntimeÂmay lead to file loss/corruption, so have a copy of essential files, elsewhere
The Below Policy Changes Will Take Effect after Nov. 17
  1. The entire /scratch location will be deleted,Âso make sure to delete ALL data in /scratch/user off of theÂclusterÂand/or copy it to another location. Only the /home location will persist. (Neither /home nor /scratch have ever been automatically cleaned, and we don't intend to do so for /home in the future.)
  2. A default disk quota of 100 GB will be set for all users.ÂThough,Âcurrent usersÂwith more than 200 GBÂwill be contacted individually prior to Nov. 6 to discuss extra measures and quota arrangements. In the future, users needing more than their quota forÂcurrent and activeÂcompute jobsÂmay ask for a quota increase by emailingÂchtc@xxxxxxxxxxxÂand explaining the situation. Based upon past filesystem performance issues, we must all be committed to reducing the amount of leftover data and the overallÂnumber of filesÂon theÂHPCÂClusterÂfilesystem, both of which impact filesystem performance for running jobs and for users who are logged in.
  3. The new quota willÂnotÂdelete any of your data in /home,ÂbutÂwillÂkeep you and your jobs from adding new data until you are using less than your quota.
CHTC Staff Are:
  1. updating theÂHPCÂClusterÂUse GuideÂto reflect the above policy changes. (already done)
  2. emailing researchers who have significantÂtotal dataÂand/or a largeÂnumbers of filesÂon the filesystem, in preparation for the upgrade.


Please understand that all of the above changes are aimed at improving filesystem performance for everyone, based upon common complaints in the last 6 months. We do our best to minimizeÂclusterÂdowntimeÂ(including quick response after recent power outages*) and have come to the above decision after considerable conversation with specific users and based upon suggestions from otherÂcluster-providing organizations.

We appreciate your cooperation and feedback. As always, please send any questions or comments toÂchtc@xxxxxxxxxxx

Many Thanks,
Your CHTC Team


*P.S. While we would have preferred to perform this upgrade sooner, the three campus power outages this fall created significant delays, and we wanted to give youÂtimeÂto get some work done, before yet another interruption.

_______________________________________________
Chtc-users mailing list
Chtc-users@xxxxxxxxxxx
To unsubscribe send an email to:
chtc@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/chtc-users
[← Prev in Thread] Current Thread [Next in Thread→]
  • [Chtc-users] Final Reminder: HPC Cluster downtime starting Tuesday, Nov. 17 at 10am, chtc-users <=