submit-3 issues resolved


Date: Fri, 15 Sep 2017 12:31:29 -0500
From: chtc-users@xxxxxxxxxxx
Subject: submit-3 issues resolved
Greetings CHTC users,

This message is for users of our high throughput system with accounts on submit-3.chtc.wisc.edu.ÂÂ

From Wednesday night through Thursday evening, any commands used to interact with the queue on submit-3 (including, but not limited to: condor_submit, condor_q, condor_ssh_to_job) were unresponsive or returned error messages. As of last night, we identified the issue as a problematic job submission. That job has been put on hold and the queue should now be back to normal.Â

Due to these issues with the queue, all running jobs in the submit-3 queue were interrupted yesterday. All jobs are still in the queue and will be gradually restarted by HTCondor.Â

As a reminder to *all* users, please test your job submissions before submitting large batches of work. If you are scaling up beyond a few thousand jobs, get in touch with the research computing facilitators at chtc@xxxxxxxxxxx. We can recommend best practices that will get you the most throughput possible without negatively impacting other users.Â

Best,
Your CHTC team
[← Prev in Thread] Current Thread [Next in Thread→]
  • submit-3 issues resolved, chtc-users <=