[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Run Slurm as "guest" on a HTCondor pool?



Good morning,

although the majority of EPs on our pool is managed by HTCondor (and there's
little wrong with that) we traditionally have set off a small chunk to run
Slurm on it.

The main rationale behind that was rather bad experience with attempts to
get the Parallel Universe working in a fragmented dSlot environment, also
users more familiar with standard HPC setups were just more familiar with
Slurm (having survived things like SGE...).
Currently, the split between the two schedulers, and the nodes they control,
is done manually - with the usual reaction times involved. In addition to
the necessary defragmentation (or just "draining") of the HTCondor nodes to
be reassigned - which may take some time -, human action tends to be delayed
by weekends, nights, or simply lack of attention, and limited by the attempt
to keep the pool somehow structured (in terms of assigning full racks to
each scheduler).
Automation, I have the distinct gut feeling, would help a lot with this.

With topology not playing an important role (nodes are wired up with 1GbE
to a 10GbE-based backbone, and job sizes are rather small) any combination
of nodes would basically do - if it can be created dynamically, and destroyed
when no longer used.

The goal is to
- analyze the jobs in the Slurm queue and accordingly
- start HTCondor jobs (with count and size big enough to cover them all,
  or at least the biggest of them) that "only" run *slurmd* to provide the
  corresponding Slurm resources
- "resume" those nodes to make them available to *slurmctld*
- wait until the job(s) has/have finished
- with some temporal margin (of the order of minutes/hours) terminate the
  (now idle) slurmd jobs and return the nodes to normal HTCondor operation
  ("drain" them or let them silently become "down"? both should do)

Yes, this bears a lot of resemblance to what Parallel Universe attempted to
do for us (with the help of the "openmpiscript") - and also reminds me of
what is done in terms of glide-in jobs (although those seem to mostly cover
the "other side of the medal", running HTCondor EPs inside a HTCondor or
Slurm pilot job).

I'm wondering whether someone has thought about it before; my search in the
mailing list archives until now hasn't been successful in that respect.
Since this may completely be my fault (using the wrong search patterns) I'm
going to ask here directly now:

Anyone who has seen this, done this, maybe refrained from this (and for
whatever reason)? Please share your findings!

Thanks,
 Steffen

-- 
Steffen Grunewald, Cluster Administrator
Max Planck Institute for Gravitational Physics (Albert Einstein Institute)
Am Mühlenberg 1 * D-14476 Potsdam-Golm * Germany
~~~
Fon: +49-331-567 7274
Mail: steffen.grunewald(at)aei.mpg.de
~~~