[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] schedd state
From: "Fox, Kevin M" <Kevin.Fox@xxxxxxxx>
Date: 07/19/2016 08:16 PM
> So, I have used a lot of job schedulers in the past and in studying
the
> Condor architecture a bit, found what seems to be a unique feature
to Condor.
Hi Kevin - as it happens we migrated from a fairly
mature Grid Engine
environment to HTCondor, and I ran into this very
issue. The default config
with a separate queue for each submitter machine was
a bit perplexing at
first cut, and led to a certain level of anxiety and
distress among the
users. Since they were used to a rather broken "first-come
first-served"
SGE config, the presence or absence of other people's
jobs in the queue
was very important to them for their completion-time
estimates.
My resolution was to reconfigure HTCondor to use a
primary central
scheduler using the schedd_host config setting, allowing
all users
on all machines to submit to a single queue. See below:
> So, some questions:
> * How do you know it is safe to shutdown a schedd node without
affecting
> a running job? Can you temporarily mark the schedd for not getting
new
> jobs accepted so no new ones start to drain things? Does condor_q
only
> show local jobs? If so, is just checking for running = 0 enough to
tell if
> its safe to shutdown?
With a single scheduler, this becomes moot, but yes, condor_q by default
in version 8.4 and earlier shows only the local jobs.
You need to run
condor_q -global to see the queues from every submitter,
or
run condor_status -schedd to see the counts.
Using condor_status -schedd -af -long | sort will
show you the full list
of scheduler classad attributes. The trouble with
determining if anything
is still running is that there's several totals involved
- idle jobs,
held jobs, local jobs, flocked jobs, as well as running
jobs.
With ephemeral session nodes, you'll definitely want
to switch to a
central scheduler.
I'd also be very interested to learn more about your
approach
to your session machines off-list.
> * If you want to reinstall the node but not loose the jobs,
you have to
> maintain the condors job state somehow. is persisting /var/lib/condor/
> spool all you need to maintain this state, or are there other places
on
> the file system that need to persist?
Yes, the /var/lib/condor/spool is the location that
counts. The queue is
represented by the job_queue.log file in that directory.
> * For sites that want to scale the number of schedd's and the number
of
> login nodes differently, is that possible? Is there a remote schedd
mode?
> I'm sure things like the syscall shadowing wouldn't work in such a
mode,
> but we haven't had a need for our site for that.
You set the SCHEDD_HOST configuration variable to
the hostname of the
machine running your scheduler, and the condor_q,
condor_submit, and
whatever else will refer to that machine.
You can use config file conditionals or templating
to set a different
host for different login machines, or you can use
the $_CONDOR_SCHEDD_HOST
environment variable to set the proper scheduler on
a user-by-user basis.
Remember, though, that a user who submitted jobs to
one queue might
become alarmed if they log in to a different machine
using a different
scheduler and find their jobs "missing"
from the condor_q output.
Our largest environment has three schedulers. One
is what's hardcoded
in the configuration file applied to all users on
all machines in the
pool, and the target of everyone's condor_submit and
condor_q runs.
The second is used for a DRMAA-linked Python application,
because
there was a problem with an older version of HTCondor
and DRMAA that
didn't take remote schedulers into account, so DRMAA
could only
delete jobs from the local scheduler.
Finally, there's a third scheduler for a small team
which occasionally
will submit hundreds of thousands of small jobs at
a time, and in 8.0
and early 8.2 running condor_q didn't even work with
a queue that
deep. Once we fixed the bugs and timeouts it
was still highly
disruptive to everyone else's ability to run a quick
condor_q
to check their job status, so we quarantined the 100k+
job submissions
to their own little scheduler on a 7-year-old clunker
machine by
modifying the job-submission script to use "condor_submit
-name".
With a remote schedd, the syscall shadowing is done
at the machine
which is running the schedd, so as long as that machine
has access
to the target filesystem to which the job refers,
then it'll work.
But we're not using standard universe either, and
fairly few people
are in any case.
Here we just have the jobs use NFS from the exec nodes
to pull
input files in most cases, in order to take advantage
of the Linux
buffer cache with depth-first machine fill, and then
have been
migrating NFS-based output delivery to HTCondor output
transfers
as time goes on.
There was a message from the CHTC team a week or two
ago here on
how you can use HTCondor-C to handle file spooling
to a remote
scheduler, if you're interested in that check out
the archives.
-Michael Pelletier.
_