|
Hello, This is more of a FYI, but if you think this is a bug or at least would require some better error handling, here’s what a user of mine made me loose some more hair from. This is on HTCondor 25.8.2, which I upgraded from 9.0.17 while trying to fix this. He was trying to queue several mpi jobs from a single .sub, by using “queue jobsList from jobsList.txt”. Submitting was a success, but when it came to matching, it always ended as “no match found” without more explanation, even with logs in verbose mode. The compute nodes are configured in DedicatedScheduler with auto partitionable slots, and the headnode has pre-emption configured to accelerate matchmaking. Nothing fancy. Here’s an example of a condor_q --better-analyse: The Requirements _expression_ for job 63.000 is (TARGET.Arch == "X86_64") && (TARGET.OpSys == "LINUX") && (TARGET.Disk >= RequestDisk) && (TARGET.Memory >= RequestMemory) && (TARGET.Cpus >= RequestCpus) && (TARGET.HasFileTransfer) [0] : TARGET.Arch == "X86_64" [1] : TARGET.OpSys == "LINUX" [2] : [0] && [1] [3] : TARGET.Disk >= RequestDisk [4] : [2] && [3] [5] : TARGET.Memory >= RequestMemory [6] : [4] && [5] [7] : TARGET.Cpus >= RequestCpus [8] : [6] && [7] [9] : TARGET.HasFileTransfer [10] : [8] && [9] Job 63.000 defines the following attributes: RequestCpus = 64 RequestDisk = MAX({ 1024,(TransferInputSizeMB + 1) * 1.25 }) * 1024 (kb) RequestMemory = 65536 (mb) TransferInputSizeMB = 4 The Requirements _expression_ for job 63.000 reduces to these conditions: Slots Step Matched Condition ----- --------- --------- [0] 23 TARGET.Arch == "X86_64" [1] 23 TARGET.OpSys == "LINUX" [3] 23 TARGET.Disk >= RequestDisk [5] 23 TARGET.Memory >= RequestMemory [7] 23 TARGET.Cpus >= RequestCpus [9] 23 TARGET.HasFileTransfer 063.000: Run analysis summary ignoring user priority. Of 23 slots on 23 machines, 0 slots are rejected by your job's requirements 0 slots reject your job because of their own requirements 23 slots match and are willing to run your job No successful match recorded. Last failed match: Wed May 6 09:07:26 2026 Reason for last match failure: no match found The problem was that the paths in jobsList.txt included several “+” …. Something like: /home/user/mainjob/job+xconfig+yconfig+zconfig Unsure if this is normal behavior, but the fact that condor_submit didn’t catch it or that the system’s logs didn’t say why no match was found, is why I’m making this email. But, if there’s a way for condor to handle those “+” with some better quoting, let me know. On a side note, while I upgraded condor, I noticed the file /usr/share/condor/htcondor.pp, which I’m not sure if it was a thing back in version 9. Yes, I have SELinux enabled. I used to make my own .te from testing and checking the prevention notices one by one. (pain) So, as a suggestion, it’d be nice if, during installation or upgrades of the condor package, it would automatically detect if SELinux is enforced and apply your .pp. (That sounded way more wrong than it should…) Martin |