[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Store a credential for a condor user



On Fri, Sep 3, 2010 at 6:24 AM, Sónia Liléo <sonia.lileo@xxxxx> wrote:

I have another question.

 

I have configured the central manager so that it should be able to both submit and execute jobs (START = TRUE).

But when I submit a job the central manager does not execute it.


From reading your follow-on email it looks like you figured this out. You have to add the STARTD daemon to the DAEMON_LIST to make a node an execute node.
 

My “run.sub” file looks like this,

 

universe = vanilla

environment = path=C:\Windows;C:\Windows\System32

transfer_executable = true

requirements = LoadAvg < 0.3

executable = Run.bat

log = Run_condor.log

output = Run_condor.out

error = Run_condor.err

queue

 

 

What should happen when doing “condor_submit run.sub”?


It should put the job in the local queue (condor_schedd daemon) on the machine you made the condor_submit call on. You can verify it's in the queue with:

condor_q
 

Is the job only queued or should it also be executed?


It queues first and then, when the next negotiation cycle runs and matches the job with a machine, it will execute. This can take some time if you're using the default Condor settings. A few minutes or so.
 

Condor_status gives the following,

 

Name               OpSys      Arch   State     Activity LoadAv Mem   ActvtyTime

 

O2F-sth-LAP-002.un WINNT51    INTEL  Unclaimed Idle     0.000  1527  0+00:50:19

                     Total Owner Claimed Unclaimed Matched Preempting Backfill

 

       INTEL/WINNT51     1     0       0         1       0          0        0

 

               Total     1     0       0         1       0          0        0

 

 

O2F-sth-LAP-002.un is not the central manager but the other machine that I am trying to add to the pool.


By default condor_status shows you *execute* nodes in the pool. You don't see your central manager because it's not running a condor_startd daemon yet. It's not an execute node.
 

It is listed by condor_status although condor_store_cred still gives the answer

Operation failed.
    Make sure your ALLOW_WRITE setting includes this host.


That's probably okay. You only need to stash your credentials with the condor_credd daemon once, from one machine. You don't need to run condor_store_cred on every machine in your pool. But I'll follow up on this when I respond to your other email.
 

 The activity status is “unclaimed”. What do this mean?


It means the machine is not running a job. It's also in the "Owner" state which mean START is evaluating to False on the machine so the machine is not accepting any new work. The states and activity levels of a Condor node are explained in this diagram: http://www.cs.wisc.edu/condor/manual/v7.4/3_5Policy_Configuration.html#fig:machine-states and that section deals with the transitions of an execute node, how they happen, and what they mean.
 

Should the central manager also be listed in the condor_status since it is also an execute machine?


No. See above. But you can have condor_status show you information about non-execute nodes. See -help on the command for some options. To see all the daemons in your system you can do:

condor_status -any

- Ian


Cycle Computing, LLC
The Leader in Open Compute Solutions for Clouds, Servers, and Desktops
Enterprise Condor Support and Management Tools

http://www.cyclecomputing.com
http://www.cyclecloud.com