Timeline for data access and recovery on CHTC’s staging and projects folders


Date: Tue, 26 Nov 2024 00:05:25 +0000
From: chtc-users@xxxxxxxxxxx
Subject: Timeline for data access and recovery on CHTC’s staging and projects folders

Dear CHTC users, 

 

This message is for ALL users, especially:

  • HTC users using the /staging, /projects, /software, and /squid directories
  • HPC users using the /projects directory

 

On Thursday, November 21, CHTC personnel were alerted to data issues in our /staging and /projects directories. We took immediate action on Thursday afternoon and have been working on it through the weekend and today. 

 

The underlying cause has been identified and affected the /squid, /staging, /projects and HTC /software directories. We are able to prevent it from recurring; however, it resulted in significant data loss in /staging, /projects, HTC /software and /squid before CHTC personnel were able to react. All data in /squid is unrecoverable. Any remaining data in /projects and /staging is currently inaccessible as we work to recover whatever additional data we can. We hope to recover at least 50% of /staging and 60% of /projects. 

 

We recognize that this ongoing outage and accompanying data loss is disruptive to important work. Our plan to bring systems and data back online is outlined in the email below and also on this page on the CHTC website: https://chtc.cs.wisc.edu/uw-research-computing/data-recovery-fall2024 

 

CHTC Next Steps and Timeline

 

  • Nov 25 - 27: New data backend for /staging and /projects
    • This week (Nov 25-27), we will create a new data store to serve the “/staging” and “/projects” directories. Initially, there will be no data inside these directories. This new data backend for the /staging and /projects directories will be used for CHTC data storage moving forward and will be usable in jobs immediately once it is available. 
  • Nov 25 - Dec 9: Recovering data from /staging directories
    • We will run multiple recovery processes on the old data store over the next 1-2 weeks. Once each recovery process is complete, CHTC users will be able to access recovered data and copy it to the new data store. CHTC will not overwrite or replace data created in the meantime.  We are still developing the mechanism for this process and will provide more information as it becomes available. 
  • Dec 9 onward: Recovering data from /projects directories
    • This will be the same process as recovering data from /staging. 

 

Note that this timeline means that you will not know how much of your data from the previous data store was lost or recovered until after Dec 2, at the earliest. Consider re-transferring or reproducing the data instead of waiting for potential recovery.

 

Evaluation and restoration of HTC /software directories will happen after the process above is complete. 

 

Stay Informed

 

In order to find out when the new file system is available and when recovered data is available, we recommend following the relevant incident(s) on the CHTC status page: https://status.chtc.wisc.edu/

 

The web guide describing the recovery process will also be updated as changes occur: https://chtc.cs.wisc.edu/uw-research-computing/data-recovery-fall2024 

 

Please continue to check these links on a regular basis as we may not send all updates to the CHTC users list. 

 

Resume Running Jobs

 

The new data store for using /staging and /projects will be created and available by tomorrow (Nov 26), end of business. Once this data store is created, all HTC users should have access to an empty /staging directory with a default quota of 100GB / 1000 items. This space can be used exactly like the previous /staging directories to run jobs. A few notes about special circumstances: 

 

  • Quota changes: Quotas from the previous data store will not transfer over, so if you anticipate needing space beyond the default quota, especially in the short term, please fill out our Quota Request Form
  • Immediate deadlines: If you have a short-term deadline (within the next 2-3 weeks) please reach out to see how we can support you. Email chtc@xxxxxxxxxxx with the following information: 
    • Include your name and the deadline date in the subject line
    • Cc your PI or advisor
    • Describe the nature of the deadline (paper submission, thesis defense, conference deadline, etc.)
    • Briefly describe what specific computational or data capacity you need in order to meet the deadline (how many jobs, how many resources per job, how much data, etc.)

 

Reach Out

 

We understand the challenge of restarting your work after an event like this. If you have any questions or specific concerns after reading through this email or linked web guide, please contact us at chtc@xxxxxxxxxxx. We will do our best to help all CHTC users get up and running again as soon as possible. 

 

Best, 

Christina Koch on behalf of the CHTC team

 

-- 

Christina Koch (she/her)

Lead Research Computing Facilitator

Center for High Throughput Computing (University of Wisconsin – Madison); OSG and PATh services

Email: ckoch5@xxxxxxxx // Calendar: https://go.wisc.edu/clk-calendar

[← Prev in Thread] Current Thread [Next in Thread→]
  • Timeline for data access and recovery on CHTC’s staging and projects folders, chtc-users <=