This note is to offer more support for these enhancements. Back in Oct
2009 I asked a questioned about Nodal Affinity when provisioning slots.
This has become important in how some research on our campus involving
molecular dynamics. Other Batch Scheduling Subsystems can handle the
scheduling of MPI-based processes to cores (slots) on the same node to
overcome the realities of high inter-nodal latency hits, especially for
clusters not fortunate enough to have Infiniband interconnects. Here is a pointer to my original note: https://www-auth.cs.wisc.edu/lists/condor-users/2009-October/msg00134.shtml --Brandon Erik Erlandson wrote: Hi David, I do not know how it will be prioritized relative to all the other development in the queue. It's a relatively significant change to the dedicated scheduler, so I know the UW team expects to do a thorough review and testing before approving it for inclusion. There are some other users who are interested in having this enhancement and so I will make sure it doesn't fall off the radar. -Erik On Tue, 2010-08-31 at 10:04 -0500, David J. Herzfeld wrote:Hi Erik: Thanks for the response. From the remarks in the ticket, this looks to be exactly what we want to #3! Is there any estimate on when this will get incorporated into the stable release? This is exciting. David On 08/31/2010 09:42 AM, Erik Erlandson wrote:Regarding dynamic slots and parallel universe: The dedicated scheduler (used by PU jobs) does not currently handle dynamic slots correctly. A patch to correct this has been submitted and is pending review: https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=986,0 -Erik On Tue, 2010-08-31 at 08:56 -0500, David J. Herzfeld wrote:Hi All: We have currently been working on a 1024 core cluster (8 cores per machines) using a pretty standard Condor config. Each core shows up as a single slot, etc. Users are starting to use multi-process jobs on the cluster - leading to over scheduling. One way to combat this problem is the "whole machine" configuration presented on the Wiki at <https://condor-wiki.cs.wisc.edu/index.cgi/wiki?p=WholeMachineSlots>. However, most of our users don't require the full machine (combinations of 2, 3, 4, 5.. cores). We could modify this config to supply slots for 1/2 a machine, etc. So a couple of questions: 1) Does this seem like a job for dynamic slots? or should we modify the "whole machine" config? 2) If dynamic slots are the way to go, has this shown to be stable in production environments? 3) Can we combine the dynamic slot allocations with the Parallel Universe to provide similar-to-PBS allocations. Something like machine_count = 4 request_cpus = 8 To match 4 machines with 8 CPUs a piece? Similar to #PBS -l nodes=4:ppn=8 As always - thanks a lot! David _______________________________________________ Condor-users mailing list To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/condor-users The archives can be found at: https://lists.cs.wisc.edu/archive/condor-users/_______________________________________________ Condor-users mailing list To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/condor-users The archives can be found at: https://lists.cs.wisc.edu/archive/condor-users/_______________________________________________ Condor-users mailing list To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/condor-users The archives can be found at: https://lists.cs.wisc.edu/archive/condor-users/ -- Brandon Leeds Lehigh University Sr. Computing Consultant, LTS High Performance Computing Phone: (610) 758-4805 |