Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] Misspelled requirement
- Date: Thu, 3 Feb 2022 13:12:42 -0600
- From: Todd Tannenbaum <tannenba@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] Misspelled requirement
On 2/3/2022 12:41 PM, Jacek Kominek
wrote:
Hi Todd,
It's quite possible the typo was reported (we are running HTCondor
8.9.11), I only got a report about a forever-idle job without
further specifics, so it is likely that the user didn't catch or
understand it.
Thank you for the response, it clarifies it a bit. What bothers me
the most is that it did get processed as a valid condor job
requirement (not a variable/macro) even though the resource was
non-existent in the system. This is a very particular and limited
namespace, from what I know, since you can only request CPUs,
GPUs, memory or diskspace. Correct me if I am wrong, but should
anything else be flat out rejected in such context?
The "RequestX" namespace is not limited to just CPUs etc, since
execute nodes can define their own custom resources (fpgas, database
connections, electron microscopes, whatever), and then jobs can
request these custom resources with "RequestFPGAs = 1" or whatever.
See the HTCondor Manual for config knobs
MACHINE_RESOURCE_<name> and friends.
What you could do at your site, however, is force all users to
explicitly specify "RequestCpus" (spelled correctly) in every job,
or give an error and refuse to allow the job to be submitted. One
way you could accomplish this is by specifying a default value for
RequestCpus that makes no sense, then add a submit requirement that
refuses to submit and gives an error message if RequestCpus was not
modified. Here would be an example config snippet you could put in
the config file of your submit machine:
JOB_DEFAULT_REQUESTCPUS = 0
SUBMIT_REQUIREMENT_NAMES = $(SUBMIT_REQUIREMENT_NAMES)
MustSpecifyCpus
SUBMIT_REQUIREMENT_MustSpecifyCpus = RequestCpus != 0
SUBMIT_REQUIREMENT_MustSpecifyCpus_REASON = "You must specify
RequestCpus in your job submit file."
After adding the above to your configuration, you must do a
condor_reconfig.
Here is how things would look to your users after doing the above:
$ cat test.sub
requestCUPS = 8
executable = /bin/true
hold = true
queue
$ condor_submit test.sub
Submitting job(s).
ERROR: Failed to commit job submission into the queue.
ERROR: You must specify RequestCpus in your job submit file.
Hope the above helps,
Todd
-Jacek
On 2/3/22 11:56, Todd Tannenbaum wrote:
On 2/3/2022 11:48 AM, Jacek Kominek via
HTCondor-users wrote:
Hi all,
A user in our cluster submitted a job with a typo in its
requirements: requestCUPS rather than requestCPUS. Rather than
erroring out, the requirement was treated as valid and the job
was forever stuck in Idle (since we have no cups in our
cluster). Is this the expected behavior? Normally, if there
are some errors/typos with the classads or variables the
scheduler is pretty good at catching them and reporting shadow
exceptions etc. I wonder if the resource requests are treated
differently?
Hi Jacek,
Given that submit files can define custom macro names, it is a
bit challenging to detect typos like the above. However, upon
job submission, the user most definitely should have received a
prominent warning telling them they may have a typo in their
submit file. Did that warning not appear on your installation?
Here is what I see when I tried reproducing what you described
above:
$ cat test.sub
requestCUPS = 8
executable = /bin/true
queue
$ condor_submit test.sub
Submitting job(s).
1 job(s) submitted to cluster 2.
WARNING: the line 'requestCUPS = 8' was unused by condor_submit.
Is it a typo?
--
Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
Center for High Throughput Computing Department of Computer Sciences
Calendar: https://tinyurl.com/yd55mtgd 1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132 Madison, WI 53706-1685