Date: | Thu, 7 Aug 2014 16:48:12 -0500 |
---|---|
From: | Lauren Michael <lmichael@xxxxxxxx> |
Subject: | [Chtc-users] URGENT FOR HTC USERS: Issue for users of ChtcRun and "squid" |
Greetings,
This message pertains only to CHTC's users of our HTCondor pool (via condor submit node), and not to users of the HPC Cluster. We have just fixed a problem that has affected ANYONE using the ChtcRun package (for setting up batches of jobs with "mkdag"), and for any users who have otherwise placed large files in "/squid/<username>" so those files can be delivered to jobs.
In either case above, it is likely that significant numbers of your jobs have failed due to this issue, which is related to a hardware failure on one of our proxy servers, and it is likely that you will need to remove affected batches of jobs from the queue, and re-run them (for any batches with jobs that have been running over the last 3 days). We apologize for any inconvenience, as this issue was completely unforeseen by CHTC staff.
If you are using ChtcRun, you can determine the number of job failures from the "mydag.dag.dagman.out" file in any batch's main output directory. (Somewhere near the bottom of this file will be a table of job completions and failures).
If you would like help with removing and re-running your jobs, please send an email to chtc@xxxxxxxxxxx. ÂWe are happy to help. Thank you,
Your CHTC Team |
[← Prev in Thread] | Current Thread | [Next in Thread→] |
---|---|---|
|
Previous by Date: | [Chtc-users] Software Carpentry Bootcamp at UW-Madison, Aug 25-26, Lauren Michael |
---|---|
Next by Date: | [Chtc-users] Please resend support requests to chtc@xxxxxxxxxxx since last night, Lauren Michael |
Previous by Thread: | [Chtc-users] Software Carpentry Bootcamp at UW-Madison, Aug 25-26, Lauren Michael |
Next by Thread: | , (nil) |
Indexes: | [Date] [Thread] |