[Chtc-users] HTC submit-5 is down - crashed by user with high-memory program running


Date: Mon, 1 Jun 2015 14:59:37 -0500
From: chtc-users@xxxxxxxxxxx
Subject: [Chtc-users] HTC submit-5 is down - crashed by user with high-memory program running
Greetings,

Users of submit-5.chtc.wisc.edu on our HTC System should know that submit-5 has crashed due to a user running a compute-intensive (high-memory) program on the submit server when they shouldn't be. We will announce when submit-5 is back up and running.

We have yet to identify the individual, though we have some ideas. If you have been executing long-running programs on submit-5 that are not "condor" commands, the submit-5 crash may be your fault ... We'll follow up with those whom we think may have contributed, but feel free to contact us first if you are worried you may be the perpetrator.

As a reminder of Submit Node Policies, you should NOT run anything compute intensive directly on the submit server.
Rather, such tasks (usually anything that takes more than several minutes to run) should be done prior to bringing data/work to CHTC, orÂshould be run within a scheduled job that will execute on one of our higher-power execute servers.

Thank you,
Your CHTC Team
[← Prev in Thread] Current Thread [Next in Thread→]
  • [Chtc-users] HTC submit-5 is down - crashed by user with high-memory program running, chtc-users <=