Are the jobs parallel universe jobs? The purpose of ParallelSchedulingGroup is to insure that all of the nodes of a parallel universe job in the same âscheduling groupâ (usually
use to indicate that the machines have fast network access to each other). I think you just want to add Opsys==âWINDOWSâ to your jobâs requirements _expression_.
As for your question about -better-analyze. It is not saying that all 4 machines match.
This line
[0] 2 ParallelSchedulingGroup is my.Matched_PSG Indicates that only two machines match that clause. whereas these lines 1 ( ParallelSchedulingGroup is "windows-cluster" ) 2 ( ( ( Opsys == "Linux" ) || ( Opsys == "Windows" ) ) && ( Arch == "X86_64" ) && ( stringListMember("2017",TARGET.CST (incorrectly) indicates that 0 machines match. There is a known problem with the âSuggestions:â clause of -better-analyze. It does not correctly analyze complex sub-clauses, and almost never makes useful suggestions â the suggestions clause
has been removed from HTCondor 8.6 and later for that reason. -tj From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx]
On Behalf Of Felix Wolfheimer We've a mixed Windows/Linux setup managed by HTCondor. I configured parallel scheduling groups for all systems. In a test setup where I can reproduce the issues, which I experience in the production pool, I have four execution hosts (2xWindows,
2xLinux). The execution hosts have parallel scheduling groups as follows: # on both Linux machines ParallelSchedulingGroup = "linux-cluster" # on the Windows machines After a while, jobs submitted to the parallel universe won't be started anymore and condor_q -better-analyze for such a job gives the following somehow inconsistent information:
The Requirements _expression_ for your job is: ( ParallelSchedulingGroup is my.Matched_PSG ) && Your job defines the following attributes: DiskUsage = 75 The Requirements _expression_ for your job reduces to these conditions: Slots Suggestions: Condition Machines Matched Suggestion --------------------------------------------------------------------------------------------------------- It's strange that on one hand condor_q tells me that basically all four machines match my requirements _expression_, but on the other hand tells me that no machine matches the condition which is for sure not true as I have also checked with condor_status:
Has anyone an idea what may cause this strange behavior? Don't know whether this is relevant but I've set NUM_CPUS=1 for all machines as a job is supposed to have exclusive access to all resources on a compute node.
|