[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Parallel universe job with accounting groups



Hi,

I have configured a number of EPs with a dedicated scheduler, and am now attempting to run a parallel universe job.  The job can be submitted but never matches to an EP.

The Collector log appears to show the configured EPs.
Got QUERY_STARTD_ADS
(Sending 94 ads in response to query)
Query info: matched=94; skipped=154; query_time=0.000308; send_time=0.015235; type=Machine; requirements={((DedicatedScheduler == "DedicatedScheduler@#.#.#"))}; locate=0; limit=0; from=SCHEDD; peer=<#.#.#.#:51780>; projection={}; filter_private_ads=1

The Sched log also suggests the dedicated EPs are available.
SetAttribute modifying attribute Scheduler in non-active cluster cid=821 acid=-1
Found 94 potential dedicated resources in 0 seconds
Adding submitter DedicatedScheduler@#.#.# to the submitter map for default pool

condor_q -better-analyze shows that there are machines available which match the job's requirements.  The 7 rejecting EP's only match jobs requesting GPU.
         Slots
Step    Matched  Condition
-----  --------  ---------
[0]         248  TARGET.Arch == "X86_64"
[1]         248  TARGET.OpSys == "WINDOWS"
[3]         248  TARGET.Disk >= RequestDisk
[5]         248  TARGET.Memory >= RequestMemory
[7]         248  TARGET.HasFileTransfer


821.000:  Run analysis summary ignoring user priority.  Of 248 machines,
      0 are rejected by your job's requirements
      7 reject your job because of their own requirements
      0 match and are already running your jobs
    241 match but are serving other users
      0 are able to run your job

The Negotiator and Match logs show no record of the job.  The Negotiator is configured with accounting groups.  I have configured the job submit description with a relevant accounting group, but I am wondering now if there is some conflict between the parallel universe job and the accounting groups.

Would be great to hear from anyone with a suggestion on what might be wrong with my configuration, or any experience with parallel universe jobs with accounting groups implemented.

Thanks,
Mark