Re: [Chtc-users] submit-1.chtc.wisc.edu is down


Date: Thu, 5 Jul 2012 22:46:38 +0000
From: "Gore, Brooklin" <BGore@xxxxxxxxxxxxxxxxxxxxxx>
Subject: Re: [Chtc-users] submit-1.chtc.wisc.edu is down
After working on this issue today, it looks like we may be able to recover
much of the data in /home.

Our plan is to:

1. Copy /home data to known-good storage.
2. Get submit-1 back online (with empty home directories).
3. Make the original data available.

We expect to have submit-1 back in production (with empty home
directories) on Monday, July 9.

We hope to be able to provide access to pre-July 4th home directory data
on Tuesday, July 10.

If you have have a critical need to submit jobs before Monday, please
contact us to arrange access to an alternate submit node.

We apologize for this inconvenience.
UW CHTC Team

On 7/5/12 11:43 AM, "Todd Tannenbaum" <tannenba@xxxxxxxxxxx> wrote:

>
>Hello,
>
>On July 4th the general-purpose CHTC submit node
>    submit-1.chtc.wisc.edu
>suffered a hard drive failure.  As this machine uses RAID (redundant
>disk arrays), any one hard drive failure is not a problem.
>Unfortunately, while submit-1 was in the process of automatically
>recovering from the failed hard drive, a second hard drive failed.
>Having two hard drives fail within five minutes of each other is not a
>level of redundancy that submit-1 was designed to tolerate.
>
>As a result, submit-1.chtc is currently unavailable and will likely
>remain unavailable for the bulk of today. We are currently working on
>it, and will send out further news as it develops this afternoon.  It is
>too early to say how much data stored on submit-1 is salvageable.
>
>Groups that submit jobs from a server other than submit-1.chtc should
>not be impacted.
>
>Thank you for your patience,
>UW CHTC Staff
>
>_______________________________________________
>Chtc-users mailing list
>Chtc-users@xxxxxxxxxxx
>To unsubscribe send an email to:
>chtc@xxxxxxxxxxx
>https://lists.cs.wisc.edu/mailman/listinfo/chtc-users
>

[← Prev in Thread] Current Thread [Next in Thread→]