Hello to all of you, I have a little issue with a type of job I am trying to
submit. I have a condor pool of 20 nodes. I initially upgrade all the pool to
version 7.05 but after reading all the issues that version was having with
pre-empting jobs I decide to downgrade the central manager to version 7.01. The
description file is the following way: ######################################################################################### # Description file for Batch File for TESTING purposes ######################################################################################### universe = vanilla requirements = (Arch == "INTEL" && OpSys
== "WINNT51") || \ (Arch
== "INTEL" && OpSys == "WINNT52") getenv = True notify_user=usename@xxxxxxxxxx initialdir = c:\condor\execute_bk should_transfer_files = YES when_to_transfer_output = ON_EXIT Transfer_input_files = c:\windows\system32\systeminfo.exe run_as_owner = true executable = Batch4testv2.bat output = Batch4testv3.out.$(Process) error = Batch4testv3.err.$(Process) log = Batch4testv3.log queue 10 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx If the job is submitted like that It will only run on one
machine, if I omit the run as owner line, it will run fine on all the different
nodes. Not a problem as I said after removing the line. But this condor project
was originally implemented to run jobs over network shares. For that I
configured the pool to have a credd_host (which is the central manager) and the
I created a condoruser with some reading and limited right to run those jobs. I
set the condor_pool and the condoruser credentials\passwords on all the
different computers set as execute machines. When I run the condor_store_cred
query –c and condor_store_cred query –u condoruser all the
computers come back saying: A credential is stored and is valid. The
description file is attached next. When I try to run this type of jobs it will
only run on one computer, the same computer as the other jobs. If I remove the
line RUN_AS_OWNER, the central manager will try to match the job with all the
pool’s nodes but it will error out due to saying: Logon failure: unknown
user name or bad password. Anyone has any ideas what log should I look into to find
answers or any suggestions to solve this issue are more than welcome, Thanks in advance for your input, Alex ################################################### ## DESCRIPTION FILE FOR CONDOR JOBS ## PREPARED BY ALEX ALAS ################################################### UNIVERSE = VANILLA REQUIREMENTS = (Arch == "INTEL" && OpSys
== "WINNT51") || \ (Arch
== "INTEL" && OpSys == "WINNT52") GETENV = TRUE NOTIFY_USER = username@xxxxxxxxxx INITIALDIR = c:\condor\execute_bk SHOULD_TRANSFER_FILES = YES WHEN_TO_TRANSFER_OUTPUT = ON_EXIT TRANSFER_INPUT_FILES = \\fileserver\Sharedfolder1\Sharedfolder2\Sharedfolder3\lasEnvelop.exe RUN_AS_OWNER = TRUE EXECUTABLE = \\fileserver\Sharedfolder1\Sharedfolder2\Sharedfolder3\Batchfile_lasEnvelop1.bat OUTPUT = Batchfile_lasEnvelop1.out.$(Process) ERROR = Batchfile_lasEnvelop1.err.$(Process) LOG = Batchfile_lasEnvelop1.log QUEUE 25 Respectfully, Alex Alas Systems Administrator Tel. 301-948-8550 x219 Fax 301-963-2064 E-mail: aalas@xxxxxxxxxxxxx
Website: http://www.fugroearthdata.com |