After working on this issue today, it looks like we may be able to recover
much of the data in /home.
Our plan is to:
1. Copy /home data to known-good storage.
2. Get submit-1 back online (with empty home directories).
3. Make the original data available.
We expect to have submit-1 back in production (with empty home
directories) on Monday, July 9.
We hope to be able to provide access to pre-July 4th home directory data
on Tuesday, July 10.
If you have have a critical need to submit jobs before Monday, please
contact us to arrange access to an alternate submit node.
We apologize for this inconvenience.
On 7/5/12 11:43 AM, "Todd Tannenbaum" <tannenba@xxxxxxxxxxx> wrote:
>On July 4th the general-purpose CHTC submit node
>suffered a hard drive failure. As this machine uses RAID (redundant
>disk arrays), any one hard drive failure is not a problem.
>Unfortunately, while submit-1 was in the process of automatically
>recovering from the failed hard drive, a second hard drive failed.
>Having two hard drives fail within five minutes of each other is not a
>level of redundancy that submit-1 was designed to tolerate.
>As a result, submit-1.chtc is currently unavailable and will likely
>remain unavailable for the bulk of today. We are currently working on
>it, and will send out further news as it develops this afternoon. It is
>too early to say how much data stored on submit-1 is salvageable.
>Groups that submit jobs from a server other than submit-1.chtc should
>not be impacted.
>Thank you for your patience,
>UW CHTC Staff
>Chtc-users mailing list
>To unsubscribe send an email to: