Again, hello to all of you, In addition to my previous
e-mail I ran the condor_q –analyze and the results are: 084.049: Run analysis
summary. Of 20 machines, 19 are
rejected by your job's requirements 0
reject your job because of their own requirements 1
match but are serving users with a better priority in the pool 0
match but reject the job for unknown reasons 0
match but will not currently preempt their existing job 0
are available to run your job When I run the condor_status I
have the following results: C:\WINDOWS\system32>condor_status Name
OpSys Arch
State Activity LoadAv Mem ActvtyTime Computer1.domain.com
WINNT51 INTEL Unclaimed Idle
0.060 1022 0+00:45:03 Computer2.domain.com
WINNT51 INTEL Unclaimed Idle
0.230 1022 0+00:00:49 slot1@xxxxxxxxxxxxxxxx
WINNT51 INTEL Unclaimed Idle
0.000 1022 5+22:33:03 slot2@xxxxxxxxxxxxxxxx
WINNT51 INTEL Unclaimed Idle
0.030 1022 0+02:30:05 slot1@xxxxxxxxxxxxxxxx
WINNT52 INTEL Unclaimed Idle
0.000 511 2+20:21:17 slot2@xxxxxxxxxxxxxxxx
WINNT52 INTEL Unclaimed Idle
0.000 511 0+00:20:05 slot3@xxxxxxxxxxxxxxxx
WINNT52 INTEL Unclaimed Idle
0.000 511 2+20:21:19 slot4@xxxxxxxxxxxxxxxx
WINNT52 INTEL Unclaimed Idle
0.000 511 2+20:21:20 slot1@xxxxxxxxxxxxxxxx
WINNT52 INTEL Unclaimed Idle
0.000 511 2+21:24:31 slot2@xxxxxxxxxxxxxxxx
WINNT52 INTEL Unclaimed Idle
0.000 511 2+21:28:45 slot3@xxxxxxxxxxxxxxxx
WINNT52 INTEL Unclaimed Idle
0.000 511 0+02:30:06 slot4@xxxxxxxxxxxxxxxx
WINNT52 INTEL Unclaimed Idle
0.000 511 2+21:33:45 slot1@xxxxxxxxxxxxx
WINNT52 INTEL Unclaimed Idle
0.000 511 2+20:26:28 slot2@xxxxxxxxxxxxx
WINNT52 INTEL Unclaimed Idle
0.000 511 0+00:25:05 slot3@xxxxxxxxxxxxx
WINNT52 INTEL Unclaimed Idle
0.000 511 2+20:26:30 slot4@xxxxxxxxxxxxx
WINNT52 INTEL Unclaimed Idle
0.000 511 2+20:26:31 slot1@xxxxxxxxxxxxxxxx
WINNT52 INTEL Unclaimed Idle
0.000 511 0+03:35:41 slot2@xxxxxxxxxxxxxxxx
WINNT52 INTEL Unclaimed Idle
0.000 511 0+03:35:42 slot3@xxxxxxxxxxxxxxxx
WINNT52 INTEL Unclaimed Idle
0.050 511 0+03:35:43 slot4@xxxxxxxxxxxxxxxx
WINNT52 INTEL Unclaimed Idle
0.000 511 0+00:25:07
Total Owner Claimed Unclaimed Matched Preempting Backfill
INTEL/WINNT51 4 0
0
4
0
0 0
INTEL/WINNT52 16
0
0
16 0
0 0
Total 20
0
0 20
0
0 0 Unfortunately, I am not a condor
expert to fully understand what this error message is trying to tell me or what
could be the best wayt to interpret it. Also when I tried to run condor_q –better
I got the following message: Sorry, the -better-analyze
option is not available on this platform. Due to the message, I know now
there is something wrong on my job’s requirements that is preventing the
job to match other nodes but I don’t know what? If anyone had experienced
a similar issue and know more less how to get it to work, I really would
appreciate your input, Alex From:
condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On
Behalf Of Alas, Alex [FEDI] Hello to all of you, I have a little issue with a type of job I am trying to
submit. I have a condor pool of 20 nodes. I initially upgrade all the pool to
version 7.05 but after reading all the issues that version was having with
pre-empting jobs I decide to downgrade the central manager to version 7.01. The
description file is the following way: ######################################################################################### # Description file for Batch File for TESTING purposes ######################################################################################### universe = vanilla requirements = (Arch == "INTEL" && OpSys
== "WINNT51") || \
(Arch
== "INTEL" && OpSys == "WINNT52") getenv = True notify_user=usename@xxxxxxxxxx initialdir = c:\condor\execute_bk should_transfer_files = YES when_to_transfer_output = ON_EXIT Transfer_input_files = c:\windows\system32\systeminfo.exe run_as_owner = true executable = Batch4testv2.bat output = Batch4testv3.out.$(Process) error = Batch4testv3.err.$(Process) log = Batch4testv3.log queue 10 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx If the job is submitted like that It will only run on one machine,
if I omit the run as owner line, it will run fine on all the different nodes.
Not a problem as I said after removing the line. But this condor project was
originally implemented to run jobs over network shares. For that I configured
the pool to have a credd_host (which is the central manager) and the I created
a condoruser with some reading and limited right to run those jobs. I set the
condor_pool and the condoruser credentials\passwords on all the different
computers set as execute machines. When I run the condor_store_cred query
–c and condor_store_cred query –u condoruser all the computers come
back saying: A credential is stored and is valid. The description file is
attached next. When I try to run this type of jobs it will only run on one computer,
the same computer as the other jobs. If I remove the line RUN_AS_OWNER, the
central manager will try to match the job with all the pool’s nodes but
it will error out due to saying: Logon failure: unknown user name or bad
password. Anyone has any ideas what log should I look into to find
answers or any suggestions to solve this issue are more than welcome, Thanks in advance for your input, Alex ################################################### ## DESCRIPTION FILE FOR CONDOR JOBS ## PREPARED BY ALEX ALAS ################################################### UNIVERSE = VANILLA REQUIREMENTS = (Arch == "INTEL" && OpSys
== "WINNT51") || \
(Arch
== "INTEL" && OpSys == "WINNT52") GETENV = TRUE NOTIFY_USER = username@xxxxxxxxxx INITIALDIR = c:\condor\execute_bk SHOULD_TRANSFER_FILES = YES WHEN_TO_TRANSFER_OUTPUT = ON_EXIT TRANSFER_INPUT_FILES =
\\fileserver\Sharedfolder1\Sharedfolder2\Sharedfolder3\lasEnvelop.exe RUN_AS_OWNER = TRUE EXECUTABLE = \\fileserver\Sharedfolder1\Sharedfolder2\Sharedfolder3\Batchfile_lasEnvelop1.bat OUTPUT = Batchfile_lasEnvelop1.out.$(Process) ERROR = Batchfile_lasEnvelop1.err.$(Process) LOG = Batchfile_lasEnvelop1.log QUEUE 25 Respectfully, Alex Alas Systems Administrator Tel. 301-948-8550 x219 Fax 301-963-2064 E-mail: aalas@xxxxxxxxxxxxx Website: http://www.fugroearthdata.com |