On Wednesday, 15 February, 2012 at 4:24 PM, Marco Mambelli wrote:
Hi all,
I'd like to configure my Condor pool with an additional job slot for
service jobs.
E.g. 4 core machines will have 4 job slots plus a "service slot" that
allows to run only service jobs (I can identify them with a classad) so
that these jobs do not have to wait in the queue
Service jobs should be considered not requiring any resource and let run
in parallel to other jobs on the machine.
Any suggestion on how I should setup this?
Can I talk you out of it? :)
Condor is an awesome way to run user jobs. It's not a great way to run administrative tasks against loads of machines. The two prominent problems are:
1. You don't know what you hit and what you missed. Condor's collector database isn't static. Machines come and go. If you run your administrative job when half your machines are off line you'll need some way to remember which machines you missed so, when they come back online, they get caught up on administrative changes.
2. Your jobs don't run as administrator accounts. On Linux they don't run as root. And on Windows they don't run as an account in the Administrators group. At least, not without some finagling and the changes can leave your systems open to some abuse.
Those are the big two reasons not to do administrative tasks through Condor. There are more, but those two seem big enough to me.
I recommend looking at a tool specifically intended for configuration management of your machines. We're big fan's of OpsCode's chef platform (http://www.opscode.com/chef/) here at Cycle. It's proven to be a very scalable and robust configuration management and deployment tool.
If I can't talk you out of it, the quick gist of what you need to do is to create two slot types: one type gets basically no resources on your machine. Since every slot needs at least one CPU you may even want to consider faking the number of CPUs in the box because Condor won't let you assign more CPUs than it detects in the machine. So if you have a 4-CPU box, you'd do:
NUM_CPUS = 1
SLOT_TYPE_1 = cpus=1, ram=1, swap=1, disk=1
SLOT_TYPE_2 = cpus=1, ram=auto, swap=auto, disk=auto
NUM_SLOTS_TYPE_1 = 1
NUM_SLOTS_TYPE_2 = 4
And now you need per-slot policies to control what runs where. For example:
START = ((SlotId == 1) && (IsAdminJob == True)) || ((SlotId != 1))
Would let non-admin jobs run always in slots 2-5 and only admin jobs on to slot 1.
For details see: http://research.cs.wisc.edu/condor/manual/v7.6/3_13Setting_Up.html#sec:SMP-Divide
I have to admit, I'm not even sure how well Condor will deal if you carve off a tiny amount of disk and ram and swap for a slot like that and then tell it to auto divide up the remainder. Might work well, might not.
Travel cautiously down this path.
Regards,
- Ian
---
Ian Chesal
Cycle Computing, LLC
Leader in Open Compute Solutions for Clouds, Servers, and Desktops
Enterprise Condor Support and Management Tools
http://www.cyclecomputing.com
http://www.cyclecloud.com
http://twitter.com/cyclecomputing
Thank you,
Marco
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx (mailto:condor-users-request@xxxxxxxxxxx) with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/