Thanks Ian! All the machines are now executing. Great! /Sónia Från:
condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] För
Ian Chesal Sonia, I notice they are all running different versions of Windows.
I suspect that's your problem. When you submit a job to Condor and you don't
tell it which OS you want to run on, it assumes you want to run on the exact
same OS as the one you submitted from. So you're submitting from OpSys == "WINNT61" and
that is what Condor is adding to your job requirements automatically. To tell
Condor that you don't care which version of Windows you want to execute on you
need to add OpSys settings to your job's requirements string: requirements = (OpSys == "WINNT51" || OpSys ==
"WINNT60" || OpSys == "WINNT61") Add that to any other requirements you're already setting. You can see what requirements are on a job by inspecting
it's ClassAd once you've queued it up: condor_q -f "%s\n" requirements
<cluster>.<jobid> That'll print just the requirements attribute for a specific
job in the queue. If that doesn't work let me know and I can write a longer
post on good "my job's aren't running" debugging. But it really just
looks like Windows versions are messing things up for you. - Ian On Tue, Sep 7, 2010 at 6:53 AM, Sónia Liléo <sonia.lileo@xxxxx> wrote: Hi! Thanks for the answers Ian! My condor pool consists of 4
machines (3 of them are SMP machines). The condor status lists the following, Name
OpSys Arch
State Activity LoadAv Mem ActvtyTime O2F-sth-LAP-002.un
WINNT51 INTEL Unclaimed Idle
0.000 1527 0+00:40:04 slot1@o2f-mbl-lap-
WINNT51 INTEL Unclaimed Idle
0.000 1767 0+01:51:58 slot2@o2f-mbl-lap-
WINNT51 INTEL Unclaimed Idle
0.000 1767 0+01:57:04 slot1@O2F-STH-LAP-
WINNT60 INTEL Unclaimed Idle
0.810 1534 0+02:05:04 slot2@O2F-STH-LAP-
WINNT60 INTEL Unclaimed Idle
0.000 1534 0+02:05:05 slot1@o2f-sth-lap-
WINNT61 INTEL Unclaimed Idle
0.000 1767 0+01:21:24 slot2@o2f-sth-lap-
WINNT61 INTEL Unclaimed Idle
0.000 1767 0+01:21:25
Total Owner Claimed Unclaimed Matched Preempting Backfill
INTEL/WINNT51 3
0
0
3 0
0 0
INTEL/WINNT60 2
0
0
2
0 0
0
INTEL/WINNT61 2
0
0
2 0
0 0
Total 7
0
0 7
0
0 0 The different colors mark
different machines. The central manager is marked
with green. When I submit a job the only
machine that changes the status from unclaimed to claimed is the central
manager (condor_status below). Name
OpSys Arch
State Activity LoadAv Mem ActvtyTime O2F-sth-LAP-002.un
WINNT51 INTEL Unclaimed Idle
0.000 1527 0+00:45:04 slot1@o2f-mbl-lap-
WINNT51 INTEL Unclaimed Idle
0.000 1767 0+01:51:58 slot2@o2f-mbl-lap-
WINNT51 INTEL Unclaimed Idle
0.000 1767 0+01:57:04 slot1@O2F-STH-LAP-
WINNT60 INTEL Unclaimed Idle
0.810 1534 0+02:05:04 slot2@O2F-STH-LAP-
WINNT60 INTEL Unclaimed Idle
0.000 1534 0+02:05:05 slot1@o2f-sth-lap-
WINNT61 INTEL Claimed
Busy 0.000 1767 0+00:00:05 slot2@o2f-sth-lap-
WINNT61 INTEL Claimed
Busy 0.000 1767 0+00:00:05
Total Owner Claimed Unclaimed Matched Preempting Backfill
INTEL/WINNT51 3
0
0
3 0
0 0
INTEL/WINNT60 2
0
0
2 0
0 0
INTEL/WINNT61
2 0
2
0
0 0
0
Total 7
0
2
5
0
0 0 Why it’s only the central
manager that changes to claimed? I want all the machines to
execute jobs but only the central manager can submit jobs. All the machines have
START=TRUE and STARTD in the DAEMON_LIST. >Just for some clarification: is this the condor_credd daemon
running on your central manager machine? Yes, condor_credd is running
only on the central machine. >You only need one credd daemon for an entire pool, not one on
each machine. >Every machine should be connecting to the condor_credd daemon on
your central manager to get credentials for users. Is this done by default? If
not, how should I indicate it? Another question: I have tried to run
condor_birdwatcher but it says that condor is off, although I believe condor is
running. How does condor birdwatcher
work? Cheers, Sónia Från: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx]
För Ian Chesal
On
Fri, Sep 3, 2010 at 8:50 AM, Sónia Liléo <sonia.lileo@xxxxx> wrote: Hi again! The jobs are now running in
the central manager. I added STARTD to the daemon_list. Perfect.
Nice work.
If
the state of the machine is still Owner it means START = False on the box and
that's why it isn't running your jobs.
Just
for some clarification: is this the condor_credd daemon running on your central
manager machine? You only need one credd daemon for an entire pool, not one on
each machine. Every machine should be connecting to the condor_credd daemon on
your central manager to get credentials for users.
This
is from the machine where jobs are not running but you would like them to run?
That last line indicates the machine is Unclaimed -- so START != False and the
machine could potentially run jobs. Can
you show me the output of condor_status and indicate which machine you'd like
the jobs to be running on?
It's
hard to say at this point. -
Ian Cycle
Computing, LLC
Cycle Computing, LLC |