Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: [Condor-users] Where to define "submitter name" ??
- Date: Wed, 25 May 2005 10:10:48 -0700
- From: "Michael Yoder" <yoderm@xxxxxxxxxx>
- Subject: RE: [Condor-users] Where to define "submitter name" ??
> Hi probably rtfm ... but a simple nod of what to change where to sort
this
> niggle out might help... please ...
>
> On my submit machine (linux wbel) I get:
>
> [condor@WEREWOLF transit]$ condor_q
>
> -- Submitter: localhost.localdomain : <192.168.0.3:36869> :
> localhost.localdomain
> ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
> 30.0 condor 5/25 14:19 0+00:14:43 R 0 2.3
condor_dagman
> -f -
> 31.0 condor 5/25 14:21 0+00:06:08 R 0 0.0
spssdag.bat
>
> and ... (activity field chopped for readability)
>
> [root@WEREWOLF transit]# condor_status
>
> Name OpSys Arch State Activity LoadAv Mem
>
> localhost.loc LINUX INTEL Unclaimed Idle 0.000 373
> xpnode0 WINNT51 INTEL Claimed Busy 0.020 384
> xpnode1 WINNT51 INTEL Unclaimed Idle 0.010 384
> xpnode2 WINNT51 INTEL Unclaimed Idle 0.010 384
> xpnode3 WINNT51 INTEL Unclaimed Idle 0.010 384
>
>
> --- Question:
>
> what do I need to define so that Submitter is "werwolf" and not
> localhost.localdomain
>
> and the status listing also uses a sensible machine name.
>
> Note: I am NOT using DNS services for the private network i.e. the
inward
> interface on linux and 4 xpnodes (192.168.0.*). The outward interface
on
> the linux box does indeed use the campuswide dns, and thus a lookup
> on it's Ip will return werewolf.york.ac.uk.
Are you sure your system has recovered from the recent full moon? We
just had one last Monday. (Sorry, couldn't resist.) While I can't
promise a silver bullet (ok, I'll stop for real this time :-) )...
condor is probably getting confused with your internal/external
interfaces. Do you have NETWORK_INTERFACE set in your config? If so,
where does it point? It ought to be your internal IP. (Might the
daemons be querying DNS with the 192.* address?) You can find out what
daemon is using what IP address by a 'condor_status -any -l'. Be
prepared for a lot of output.
To debug this issue further, please turn on D_HOSTNAME for the master
and schedd and look closely at the log files.
> [condor@WEREWOLF transit]$ condor_q
>
> -- Failed to fetch ads from: <192.168.0.3:36869> :
localhost.localdomain
> [condor@WEREWOLF transit]$ condor_q
>
> -- Failed to fetch ads from: <192.168.0.3:36869> :
localhost.localdomain
> [condor@WEREWOLF transit]$ condor_q
>
> -- Submitter: localhost.localdomain : <192.168.0.3:36869> :
> localhost.localdomain
> ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
> 30.0 condor 5/25 14:19 0+00:23:21 R 0 2.3
condor_dagman
> -f -
> 31.0 condor 5/25 14:21 0+00:14:46 R 0 0.0
spssdag.bat
>
> 2 jobs; 0 idle, 2 running, 0 held
>
> The system is as near a dammit quiescent ... why should I get these
> failures?
This might (?) be a symptom of the same problem. The schedd is single
threaded and if it's going out to lunch trying to figure out its
hostname, this could happen. Just a hunch. Tail -f the schedd log at
the same time this happens and you'll be able to see what's going on.
(I highly recommend turning on D_FULLDEBUG and D_HOSTNAME for the
schedd...)
Mike Yoder
Principal Member of Technical Staff
Direct : +1.408.321.9000
Fax : +1.408.904.5992
Mobile : +1.408.497.7597
yoderm@xxxxxxxxxx
Optena Corporation
2860 Zanker Road, Suite 201
San Jose, CA 95134
http://www.optena.com