Hello again to all Condor Guru’s
and non-Condor Guru’s I really could use some help
here!!! Does anybody has encountered a situation like this and know how to get
it to work? Thanks in advance for your help Alex From:
condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On
Behalf Of Alas, Alex [FEDI] To add more material to this
troubleshooting after running condor_status
-l, it extracted the configuration for all the machines in my pool, after revising each computer node's
configuration, I
noticed one thing the only computer
that it is able to run the jobs is the only on that includes the requirement "localCredd =
centralmanager.domain.com:9620" in its config, which is one of the
requirement listed below as part of the
job's requirements. Then I reviewed all the
configuration files of all the computer nodes and they all have the
following setting: ##Specify
a remote credd server here #
Credd_Host = $(CONDOR_HOST):$(CREDD_PORT), I commented this entry and
substituted with Credd_Host
= centralmanager.domain.com:$(CREDD_PORT) To kind of force the registering of
my centralmanager. I will
try tomorrow to comment this line on all the nodes and leave the first
Credd_host line that I changed initially. Let's see what happen. It's there
another way to change this setting in the local configuration file? Please
advice? Alex From: condor-users-bounces@xxxxxxxxxxx on behalf
of Alas, Alex [FEDI] More to add on this
troubleshooting: Intentionally I mistyped the submission file, this due to the
inability of running condor_q –better in order to obtain all the
requirements of my job. I got the message below. As you can see I never
stipulate in my description file the requirement about the amount of memory.
Where are these settings coming from? Any input will be much
appreciated. Alex Please see below: Submitting job(s) ERROR: Parse error in _expression_:
Requirements = (((Arch == "INTEL" && OpSys ==
"WINNT51") || (Arch == " INTEL" && OpSys ==
"WINNT52"))) && (Disk >= DiskUsage) && ( (Memory *
1024) >= ImageSize )&& (HasFileTransfer) &&
(HasWindowsRunAsOwner && (LocalCredd =?= "centralmanager.domain.com:9620"))
^^^ Error in submit file From:
condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On
Behalf Of Alas, Alex [FEDI] Again, hello to all of you, In addition to my previous
e-mail I ran the condor_q –analyze and the results are: 084.049: Run analysis
summary. Of 20 machines, 19 are
rejected by your job's requirements 0
reject your job because of their own requirements 1
match but are serving users with a better priority in the pool 0
match but reject the job for unknown reasons 0
match but will not currently preempt their existing job 0
are available to run your job When I run the condor_status I
have the following results: C:\WINDOWS\system32>condor_status Name
OpSys Arch
State Activity LoadAv Mem ActvtyTime Computer1.domain.com
WINNT51 INTEL Unclaimed Idle
0.060 1022 0+00:45:03 Computer2.domain.com
WINNT51 INTEL Unclaimed Idle
0.230 1022 0+00:00:49 slot1@xxxxxxxxxxxxxxxx
WINNT51 INTEL Unclaimed Idle
0.000 1022 5+22:33:03 slot2@xxxxxxxxxxxxxxxx
WINNT51 INTEL Unclaimed Idle
0.030 1022 0+02:30:05 slot1@xxxxxxxxxxxxxxxx
WINNT52 INTEL Unclaimed Idle
0.000 511 2+20:21:17 slot2@xxxxxxxxxxxxxxxx
WINNT52 INTEL Unclaimed Idle 0.000
511 0+00:20:05 slot3@xxxxxxxxxxxxxxxx
WINNT52 INTEL Unclaimed Idle
0.000 511 2+20:21:19 slot4@xxxxxxxxxxxxxxxx
WINNT52 INTEL Unclaimed Idle
0.000 511 2+20:21:20 slot1@xxxxxxxxxxxxxxxx
WINNT52 INTEL Unclaimed Idle
0.000 511 2+21:24:31 slot2@xxxxxxxxxxxxxxxx
WINNT52 INTEL Unclaimed Idle
0.000 511 2+21:28:45 slot3@xxxxxxxxxxxxxxxx
WINNT52 INTEL Unclaimed Idle
0.000 511 0+02:30:06 slot4@xxxxxxxxxxxxxxxx
WINNT52 INTEL Unclaimed Idle
0.000 511 2+21:33:45 slot1@xxxxxxxxxxxxx
WINNT52 INTEL Unclaimed Idle
0.000 511 2+20:26:28 slot2@xxxxxxxxxxxxx
WINNT52 INTEL Unclaimed Idle
0.000 511 0+00:25:05 slot3@xxxxxxxxxxxxx
WINNT52 INTEL Unclaimed Idle
0.000 511 2+20:26:30 slot4@xxxxxxxxxxxxx
WINNT52 INTEL Unclaimed Idle
0.000 511 2+20:26:31 slot1@xxxxxxxxxxxxxxxx
WINNT52 INTEL Unclaimed Idle
0.000 511 0+03:35:41 slot2@xxxxxxxxxxxxxxxx
WINNT52 INTEL Unclaimed Idle
0.000 511 0+03:35:42 slot3@xxxxxxxxxxxxxxxx
WINNT52 INTEL Unclaimed Idle
0.050 511 0+03:35:43 slot4@xxxxxxxxxxxxxxxx
WINNT52 INTEL Unclaimed Idle
0.000 511 0+00:25:07
Total Owner Claimed Unclaimed Matched Preempting Backfill
INTEL/WINNT51 4
0
0
4
0 0
0
INTEL/WINNT52 16
0
0
16 0
0 0
Total 20
0
0 20
0
0 0 Unfortunately, I am not a condor
expert to fully understand what this error message is trying to tell me or what
could be the best wayt to interpret it. Also when I tried to run condor_q
–better I got the following message: Sorry, the -better-analyze
option is not available on this platform. Due to the message, I know now
there is something wrong on my job’s requirements that is preventing the
job to match other nodes but I don’t know what? If anyone had experienced
a similar issue and know more less how to get it to work, I really would
appreciate your input, Alex From:
condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On
Behalf Of Alas, Alex [FEDI] Hello to all of you, I have a little issue with a type of job I am trying to
submit. I have a condor pool of 20 nodes. I initially upgrade all the pool to
version 7.05 but after reading all the issues that version was having with
pre-empting jobs I decide to downgrade the central manager to version 7.01. The
description file is the following way: ######################################################################################### # Description file for Batch File for TESTING purposes ######################################################################################### universe = vanilla requirements = (Arch == "INTEL" && OpSys
== "WINNT51") || \
(Arch
== "INTEL" && OpSys == "WINNT52") getenv = True notify_user=usename@xxxxxxxxxx initialdir = c:\condor\execute_bk should_transfer_files = YES when_to_transfer_output = ON_EXIT Transfer_input_files = c:\windows\system32\systeminfo.exe run_as_owner = true executable = Batch4testv2.bat output = Batch4testv3.out.$(Process) error = Batch4testv3.err.$(Process) log = Batch4testv3.log queue 10 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx If the job is submitted like that It will only run on one
machine, if I omit the run as owner line, it will run fine on all the different
nodes. Not a problem as I said after removing the line. But this condor project
was originally implemented to run jobs over network shares. For that I configured
the pool to have a credd_host (which is the central manager) and the I created
a condoruser with some reading and limited right to run those jobs. I set the
condor_pool and the condoruser credentials\passwords on all the different
computers set as execute machines. When I run the condor_store_cred query
–c and condor_store_cred query –u condoruser all the computers come
back saying: A credential is stored and is valid. The description file is
attached next. When I try to run this type of jobs it will only run on one
computer, the same computer as the other jobs. If I remove the line
RUN_AS_OWNER, the central manager will try to match the job with all the
pool’s nodes but it will error out due to saying: Logon failure: unknown
user name or bad password. Anyone has any ideas what log should I look into to find
answers or any suggestions to solve this issue are more than welcome, Thanks in advance for your input, Alex ################################################### ## DESCRIPTION FILE FOR CONDOR JOBS ## PREPARED BY ALEX ALAS ################################################### UNIVERSE = VANILLA REQUIREMENTS = (Arch == "INTEL" && OpSys
== "WINNT51") || \
(Arch
== "INTEL" && OpSys == "WINNT52") GETENV = TRUE NOTIFY_USER = username@xxxxxxxxxx INITIALDIR = c:\condor\execute_bk SHOULD_TRANSFER_FILES = YES WHEN_TO_TRANSFER_OUTPUT = ON_EXIT TRANSFER_INPUT_FILES =
\\fileserver\Sharedfolder1\Sharedfolder2\Sharedfolder3\lasEnvelop.exe RUN_AS_OWNER = TRUE EXECUTABLE = \\fileserver\Sharedfolder1\Sharedfolder2\Sharedfolder3\Batchfile_lasEnvelop1.bat OUTPUT = Batchfile_lasEnvelop1.out.$(Process) ERROR = Batchfile_lasEnvelop1.err.$(Process) LOG = Batchfile_lasEnvelop1.log QUEUE 25 Respectfully, Alex Alas Systems Administrator Tel. 301-948-8550 x219 Fax 301-963-2064 E-mail: aalas@xxxxxxxxxxxxx 7320 Executive Way, Frederick, MD 21704 Website: http://www.fugroearthdata.com |