[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Store a credential for a condor user



On Fri, Sep 3, 2010 at 8:50 AM, Sónia Liléo <sonia.lileo@xxxxx> wrote:

Hi again!

 

The jobs are now running in the central manager. I added STARTD to the daemon_list.


Perfect. Nice work.
 

But the other machine of my condor pool is still not executing.


If the state of the machine is still Owner it means START = False on the box and that's why it isn't running your jobs.
 

The CredLog looks like this,

 

09/03 13:38:20 Locale: English_United States.1252
09/03 13:38:20 WARNING: Config source is empty: C:\condor/condor_config.local
09/03 13:38:20 ******************************************************
09/03 13:38:20 ** condor_credd.exe (CONDOR_CREDD) STARTING UP
09/03 13:38:20 ** C:\condor\bin\condor_credd.exe
09/03 13:38:20 ** SubsystemInfo: name=CREDD type=DAEMON(11) class=DAEMON(1)
09/03 13:38:20 ** Configuration: subsystem:CREDD local:<NONE> class:DAEMON
09/03 13:38:20 ** $CondorVersion: 7.4.1 Dec 17 2009 BuildID: 204351 $
09/03 13:38:20 ** $CondorPlatform: INTEL-WINNT50 $
09/03 13:38:20 ** PID = 756
09/03 13:38:20 ** Log last touched 9/3 12:38:13
09/03 13:38:20 ******************************************************
09/03 13:38:20 Using config source: C:\condor\condor_config
09/03 13:38:20 Using local config sources:
09/03 13:38:20    C:\condor/condor_config.local
09/03 13:38:20 DaemonCore: Command Socket at <10.110.44.212:1342>
09/03 13:38:20 Will use UDP to update collector o2f-sth-lap-016.un.dr.dgcsystems.net <10.110.44.76:9618>
09/03 13:38:20 main_init() called
09/03 13:38:20 Trying to update collector <10.110.44.76:9618>
09/03 13:38:20 Attempting to send update via UDP to collector o2f-sth-lap-016.un.dr.dgcsystems.net <10.110.44.76:9618>
09/03 13:38:20 File descriptor limits: max 1024, safe 820
09/03 13:38:20 Initialized the following authorization table:
09/03 13:38:20 Authorizations yet to be resolved:
09/03 13:38:20 allow NEGOTIATOR:  */10.110.44.76 */o2f-sth-lap-016.un.dr.dgcsystems.net


Just for some clarification: is this the condor_credd daemon running on your central manager machine? You only need one credd daemon for an entire pool, not one on each machine. Every machine should be connecting to the condor_credd daemon on your central manager to get credentials for users.
 

And the StartLog,

 

09/03 14:29:10 Locale: English_United States.1252
09/03 14:29:10 ******************************************************
09/03 14:29:10 ** condor_startd.exe (CONDOR_STARTD) STARTING UP
09/03 14:29:10 ** C:\condor\bin\condor_startd.exe
09/03 14:29:10 ** SubsystemInfo: name=STARTD type=STARTD(7) class=DAEMON(1)
09/03 14:29:10 ** Configuration: subsystem:STARTD local:<NONE> class:DAEMON
09/03 14:29:10 ** $CondorVersion: 7.4.1 Dec 17 2009 BuildID: 204351 $
09/03 14:29:10 ** $CondorPlatform: INTEL-WINNT50 $
09/03 14:29:10 ** PID = 3424
09/03 14:29:10 ** Log last touched 9/3 13:29:01
09/03 14:29:10 ******************************************************
09/03 14:29:10 Using config source: C:\condor\condor_config
09/03 14:29:10 Using local config sources:
09/03 14:29:10    C:\condor/condor_config.local
09/03 14:29:10 DaemonCore: Command Socket at <10.110.44.212:1479>
09/03 14:29:10 my_popen: CreateProcess failed
09/03 14:29:10 Failed to run hibernation plugin 'C:\condor/libexec/power_state ad'
09/03 14:29:16 my_popen: CreateProcess failed
09/03 14:29:16 Failed to execute C:\condor/bin/condor_starter.std.exe, ignoring
09/03 14:29:16 VM-gahp server reported an internal error
09/03 14:29:16 VM universe will be tested to check if it is available
09/03 14:29:16 History file rotation is enabled.
09/03 14:29:16   Maximum history file size is: 20971520 bytes
09/03 14:29:16   Number of rotated history files is: 2
09/03 14:29:16 New machine resource allocated
09/03 14:29:21 About to run initial benchmarks.
09/03 14:29:27 Completed initial benchmarks.
09/03 14:29:27 State change: IS_OWNER is false
09/03 14:29:27 Changing state: Owner -> Unclaimed


This is from the machine where jobs are not running but you would like them to run? That last line indicates the machine is Unclaimed -- so START != False and the machine could potentially run jobs. 

Can you show me the output of condor_status and indicate which machine you'd like the jobs to be running on?
 

Do you believe that this machine is not executing due to the problem with storing the credential or might be something else?


It's hard to say at this point.

- Ian


Cycle Computing, LLC
The Leader in Open Compute Solutions for Clouds, Servers, and Desktops
Enterprise Condor Support and Management Tools

http://www.cyclecomputing.com
http://www.cyclecloud.com