Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] Special characters within paths from queueing from file
- Date: Wed, 6 May 2026 16:12:30 +0000
- From: Zach McGrew <mcgrewz@xxxxxxx>
- Subject: Re: [HTCondor-users] Special characters within paths from queueing from file
Hi Martin,
I ran into this issue a while back with some of my users here as well. The issue is that you can't queue multiple jobs to the parallel universe from a single submission file. Each "job" that gets queued from the file describes a requirement for one or more EPs and not another job in that batch of jobs. This is mentioned in the MPI section of the documentation [1], but easy to miss. I can see the use case they were going for, but I do agree it's not super convenient to submit a ton of parallel universe jobs. The only workaround I've used so far is a tiny shell script to invoke multiple submissions to adjust the submission variables as needed (using the -a to append and overwrite whatever was already in the .sub):
for i in $(seq 1 10) ; do
condor_submit my_parallel.sub -a arguments="$i"
done
I haven't use the job-sets [2] feature yet, but that might also fit this use case well. If not, I could definitely see writing something with the Python API allowing for more flexibility and maintainability though.
On the SELinux note, high-five! I do the same. A quick search through my Puppet codebase shows I set the 'condor_tcp_network_connect' bool to yes, then have some custom rules to allow the sshd to work for condor_ssh_to_job, and allow that ssh server to create new connections (socks-proxy; we had a very unique use case on one node), set some fcontexts on the separate scratch disk path, and one more to enable HTCondor to talk to esmtp to send email notifications. What are you setting?
-Zach
Reference URLs:
1. https://htcondor.readthedocs.io/en/latest/users-manual/env-of-job.html#differing-requirements-for-the-machines
2. https://htcondor.readthedocs.io/en/latest/users-manual/job-sets.html
________________________________________
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Beaumont, Martin <Martin.Beaumont@xxxxxxxxxxxxxxx>
Sent: Wednesday, May 6, 2026 7:27 AM
To: htcondor-users@xxxxxxxxxxx
Subject: [HTCondor-users] Special characters within paths from queueing from file
Hello,
This is more of a FYI, but if you think this is a bug or at least would require some better error handling, here’s what a user of mine made me loose some more hair from.
This is on HTCondor 25.8.2, which I upgraded from 9.0.17 while trying to fix this.
He was trying to queue several mpi jobs from a single .sub, by using “queue jobsList from jobsList.txt”.
Submitting was a success, but when it came to matching, it always ended as “no match found” without more explanation, even with logs in verbose mode.
The compute nodes are configured in DedicatedScheduler with auto partitionable slots, and the headnode has pre-emption configured to accelerate matchmaking. Nothing fancy.
Here’s an example of a condor_q --better-analyse:
The Requirements expression for job 63.000 is
(TARGET.Arch == "X86_64") && (TARGET.OpSys == "LINUX") && (TARGET.Disk >= RequestDisk) &&
(TARGET.Memory >= RequestMemory) && (TARGET.Cpus >= RequestCpus) && (TARGET.HasFileTransfer)
[0] : TARGET.Arch == "X86_64"
[1] : TARGET.OpSys == "LINUX"
[2] : [0] && [1]
[3] : TARGET.Disk >= RequestDisk
[4] : [2] && [3]
[5] : TARGET.Memory >= RequestMemory
[6] : [4] && [5]
[7] : TARGET.Cpus >= RequestCpus
[8] : [6] && [7]
[9] : TARGET.HasFileTransfer
[10] : [8] && [9]
Job 63.000 defines the following attributes:
RequestCpus = 64
RequestDisk = MAX({ 1024,(TransferInputSizeMB + 1) * 1.25 }) * 1024 (kb)
RequestMemory = 65536 (mb)
TransferInputSizeMB = 4
The Requirements expression for job 63.000 reduces to these conditions:
Slots
Step Matched Condition
----- --------- ---------
[0] 23 TARGET.Arch == "X86_64"
[1] 23 TARGET.OpSys == "LINUX"
[3] 23 TARGET.Disk >= RequestDisk
[5] 23 TARGET.Memory >= RequestMemory
[7] 23 TARGET.Cpus >= RequestCpus
[9] 23 TARGET.HasFileTransfer
063.000: Run analysis summary ignoring user priority. Of 23 slots on 23 machines,
0 slots are rejected by your job's requirements
0 slots reject your job because of their own requirements
23 slots match and are willing to run your job
No successful match recorded.
Last failed match: Wed May 6 09:07:26 2026
Reason for last match failure: no match found
The problem was that the paths in jobsList.txt included several “+” ….
Something like: /home/user/mainjob/job+xconfig+yconfig+zconfig
Unsure if this is normal behavior, but the fact that condor_submit didn’t catch it or that the system’s logs didn’t say why no match was found, is why I’m making this email.
But, if there’s a way for condor to handle those “+” with some better quoting, let me know.
On a side note, while I upgraded condor, I noticed the file /usr/share/condor/htcondor.pp, which I’m not sure if it was a thing back in version 9.
Yes, I have SELinux enabled. I used to make my own .te from testing and checking the prevention notices one by one. (pain)
So, as a suggestion, it’d be nice if, during installation or upgrades of the condor package, it would automatically detect if SELinux is enforced and apply your .pp.
(That sounded way more wrong than it should…)
Martin