Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] condor_ssh_to_job
- Date: Thu, 14 Aug 2014 11:56:52 -0500
- From: Todd Tannenbaum <tannenba@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] condor_ssh_to_job
On 8/14/2014 6:42 AM, Keith Brown wrote:
i understand you can run arbitrary code on HTc when using condor_submit.
That wasn't my concern at all.
Ok...
I wanted to
avoid people submitting thousands of jobs AND then condor_ssh_job to the
job and run more jobs taking up the slot indefinitely. When they
condor_ssh_job they are circumventing fair scheduling.
If I understand you correctly you are saying, for instance, that a user
could submit a job requesting 1 cpu core, and then when the job starts,
they could ssh_to_job to that slot and start up 10 more processes, thus
using 11 cores when they were only fairly scheduled for 1 core. Does
that capture your concern? If so, condor_ssh_to_job has nothing to do
with this issue; after all, the user could simply submit a shell script
that starts up 11 instances of their program without ssh_to_job. At the
core of your concern is users using more resources than they were
allocated in the execution slot. HTCondor has a wealth of mechanisms
you can enable to address that concern. An overview of them can be
found in the HTCondor Week presentation at
http://research.cs.wisc.edu/htcondor/HTCondorWeek2013/presentations/ThainG_BoxingUsers.pdf
For instance, if you enable the cgroup (Linux container) support in
HTCondor, then if a user is allocated a slot with 1 cpu core and 1 GB or
RAM, that is all they will be able to use regardless of how many
processes they start up (via ssh_to_job or not). Even if they
ssh_to_job and start up 50 more processes, all 50 processes will
timeshare the one cpu core allocated to the slot that was scheduled for
them - there will be no impact on other users of the system. I
recommend using cgroup support in HTCondor if you are running on a
recent Linux distro (i.e. RedHat 6.5 or equivalent), and if you are
using an older Linux and cannot upgrade, look at HTCondors CPU affinity
mechanism.
As for taking up the slot indefinitely - as I stated in my post
yesterday in this thread, all processes, regardless of if they are
launched by the job or via ssh_to_job, follow the administrator policy
for the slot. In other words, users can only take up a slot
indefinitely if your startd policy in the condor_config file allows them
to do so.
If we are still talking past each other and/or I am failing to
understand your concern, feel free to send me a phone number to my
personal email address (tannenba@xxxxxxxxxxx) and I'll give you a call.
regards,
Todd