Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] Restart from checkpoint failing for HTCondor 8.4.1
- Date: Wed, 11 Nov 2015 14:55:25 -0600
- From: Todd Tannenbaum <tannenba@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] Restart from checkpoint failing for HTCondor 8.4.1
On 11/10/2015 11:03 AM, Feldt, Andrew N. wrote:
Todd,
We have now reverted to condor-8.2.10-345812 for our production
HTCondor pool. This is allowing our jobs to properly vacate as
needed. (This is from the htcondor-previous repo.) I will be
interested in future updates to the 8.4 series which may address the
checkpoint-restart problem.
Andy
Hi Andy,
We think we now know what is happening and how to fix it.
I am guessing that your v8.4 attempt was using binaries from the RPM
package?
Our thinking is that the v8.4 binaries contained in the tarball would
work, but the v8.4 binaries in the RPM packages would fail (with respect
to standard universe restart). This is because our tarball binaries are
built with cmake, and our RPM packages are built via rpmbuild calling
out to cmake. The issue is rpmbuild sneaks in a bunch of additional and
undesired compiler flags. We are working to fix this issue for the
upcoming HTCondor v8.4.2 release. Follow progress and see details at:
https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=5382
Thank you for bringing this to our attention!!
Also, I am always curious how folks are using standard universe... could
you share a brief description of the sort of jobs (i.e. what
application, what scientific domain, etc) that are using standard
universe at Univ of Oklahoma?
best regards
Todd