Hello, I am facing the following problem. When one MPI job is Idle because it's Requirements are not met, all the subsequent MPI jobs will remain Idle even though these Idle jobs Requirements can be met. Running 'condor_q -better-analyze 1871.0' on a such Idle job 1871.0 will display : ... The Requirements _expression_ for job 1871.000 reduces to these conditions:  Slots Step Matched Condition ----- -------- --------- [0] 8 TARGET.SupportedMPIQueue is "ipno" [9] 23 TARGET.HasFileTransfer 1871.000: Job has not yet been considered by the matchmaker. 1871.000: Run analysis summary ignoring user priority. Of 23 machines,  14 are rejected by your job's requirements  0 reject your job because of their own requirements  1 are exhausted partitionable slots  0 match and are already running your jobs  0 match but are serving other users  8 are available to run your job What I am trying to do with a test pool is the following: Â1) I have 1 Sched, 1 CM, 5 worker nodes (WN) Â2) I have a variable SupportedMPIQueue added to STARTD_ATTRS on 4 WNs : Â* on 2 WNs (4 + 4cores) I have SupportedMPIQueue="ipno"  # condor_config_val -dump |grep SupportedMPIQueue  STARTD_ATTRS = SupportedMPIQueue, DedicatedScheduler  SupportedMPIQueue = "ipno" Â* on two other WN (4 + 8 cores) I have SupportedMPIQueue="ipnofast"  # condor_config_val -dump |grep SupportedMPIQueue  STARTD_ATTRS = SupportedMPIQueue, DedicatedScheduler  SupportedMPIQueue = "ipnofast" Â* on one WNs (2 cores), SupportedMPIQueue is not defined The aim of this configuration is to be able to select a group of WNs for a MPI jobs. In our existing Torque/MAUI cluster, I have 3 queues: ipno, ipnofast, ipnofast2. Each queue points to a group of WNs having the same CPU speed/generation. I would like to reproduce the same behavior by allowing the selection of a group of WNs in the .sub file with for example: Requirements = (TARGET.SupportedMPIQueue =?= "ipnofast" ) If I comment in the 'Requirements' ligne in the .sub, I can use all the available MPI slots and MPI jobs will stay in Idle state only when there is anymore enough slots available. Now, suppose that I use 'Requirements = (TARGET.SupportedMPIQueue =?= "ipnofast" )' and "machine_count = 4". If I submit 5 jobs, 3 jobs will run and 2 jobs will be Idle. This is normal because we have in total 12 cores on "ipnofast" WNs. Whith 2 jobs Idle, if I submit one job with 'Requirements = (TARGET.SupportedMPIQueue =?= "ipno" )', the job will remain Idle until there is anymore an Idle job waiting for the "ipnofast" WNs. The job requiring "ipno" WNs should has run without waiting because there was 8 cores free. My conclusion is that once there is an Idle MPI job, all the other submitted MPI job will remain also Idle even though the new submitted jobs requirements cant be met. The new Idle jobs are seen by condor_q -l as "not yet been considered by the matchmaker". Is this the default behavior ? Is it possible to do something about it ? Any advice ? A simple test job, is : universe = parallel executable = /bin/sleep arguments = 300 machine_count = 4 Requirements = (TARGET.SupportedMPIQueue =?= "ipno" ) queue Thanks, Christophe. -- Christophe DIARRA Institut de Physique Nucleaire Service Informatique 15 Rue Georges Clemenceau F91406 ORSAY Cedex Tel: +33 (0)1 69 15 65 60 Mobile: +33 (0)6 31 26 23 69 Fax: +33 (0)1 69 15 64 70 E-mail: diarra@xxxxxxxxxxxxx |