Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] Problem running Grid jobs using Condor.
- Date: Wed, 15 Apr 2009 14:48:18 -0600
- From: Balamurali Ananthan <bala@xxxxxxxxxx>
- Subject: [Condor-users] Problem running Grid jobs using Condor.
Hello,
I am trying to run a job in the condor system submitted through the
Globus Gatekeeper.
But the jobs are being held for this reason:
HoldReason = "Error from starter on slot1@xxxxxxxxxxxxxxxxxxx: Failed to
open
'/home/research/bala/.globus/job/vulcan.txcorp.com/9128.1239817731/stdout'
as standard output: No such file or directory (errno 2)"
Here is what I already did:
1. Started the execute machine's master daemon as root.
2. Set the UID_DOMAIN in the condor_config on the execute machine to
txcorp.com
3. Set the TRUST_UID_DOMAIN = TRUE on the execute machine
4. The account with which the job is supposed to be run on the execute
machine is not in the /etc/passwd file. So the SOFT_UID_DOMAIN = TRUE is
set in the execute machine.
However, the execute machine (10.0.0.2) cannot do a dns lookup. So there
is no way the execute machine can DNS resolve 10.0.0.105 to
vulcan.txcorp.com which is the submit machine, although /etc/hosts can
be used to resolve 10.0.0.105 to vulcan.txcorp.com
Questions:
1. Does the execute machine depends only on dns to resolve the ip
address to its name? And if it fails does it run the job as nobody?
2. How do I see with what account the job is tried to run as? I'm
guessing that the job is run as nobody while it is supposed to be
running as bala. How do I check it?
Thanks much!
--
Balamurali Ananthan (bala@xxxxxxxxxx) (720.974.1843)
Tech-X Corp, 5621 Arapahoe Ave, Suite A, Boulder, CO 80303