Rob Parrott wrote:
We recently upgraded to condor 6.6.6, and I think this corresponded
with a change in behavior of condor for some users.
What did you upgrade from? SOFT_UID_DOMAIN didn't change in the Condor
6.6 series, but in the Condor 6.5 series, there were bugs in looking
at the UID_DOMAIN. However, I don't think that the bug should have
caused it to work for you before the upgrade but not after it.
In Condor 6.6.5, we added TRUST_UID_DOMAIN, and it defaults to FALSE.
This may also have affected you.
One user started having runs hang on in the queue, where they would
be started and stopped immediately.
Is this a new user? Did anything else change? Is he submitting jobs
from a different computer than he used to?
When you upgraded, did you not preserve your configuration files?
Perhaps you had SOFT_UID_DOMAIN set in older config files?
For a workaround I've set SOFT_UID_DOMAIN=true and the runs have
started and completed successfully.
This sounds like exactly the right thing to do--it's just not clear
why it stopped working before.
Is there something further I can do to determine the root cause of
the problem? Is condor expected to work with NIS (I assume so, as it
worked perfectly before).
Condor is expected to work with NIS. We know of problems with Condor &
NIS if you aren't using the dynamically-linked libraries, but the
symptom is that the binaries crash very quickly. The solution is to
use the dynamically linked version of Condor.
What is the hostname of the submitter? What is the hostname of the
execution machine? Does other machine have multiple interfaces, which
may look like they are in different domains?
What version of Condor did you upgrade from?
-alain
_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
http://lists.cs.wisc.edu/mailman/listinfo/condor-users