[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] IS_OWNER not stopping jobs from running



Hi John,

Thanks for clarifying things about IS_OWNER. What was confusing me was the documentation about IS_OWNER on the Configuration for Execution Points page:

https://htcondor.readthedocs.io/en/latest/admin-manual/ep-policy-configuration.html

In the machine states section there's this sentence:

"Owner:ÂThe machine is being used by the machine owner, and/or is not available to run HTCondor jobs. When the machine first starts up, it begins in this state."

Which to me implied if an execution node is in the owner state then it should not run jobs. This was backed up by the diagram in the machine activities section which is very clear that jobs do not start if IS_OWNER is true.

But looking again to write this email I found in the state and activity transition section following the machine activities diagram:

"From HTCondorâs point of view, there is little difference between the Owner and Unclaimed states. In both cases, the resource is not currently in use by the HTCondor system. However, if a job matches the resourceâs START _expression_, the resource is available to run a job, regardless of if it is in the Owner or Unclaimed state."

which is explicit that jobs can start in the owner state. So there's some confusion in the documentation.

Again, thanks for clearing this up.


Cheers,

Andrew

On 17/11/2025 17:12, John M Knoeller via HTCondor-users wrote:
IS_OWNER doesn't prevent jobs from starting.  START does that, what IS_OWNER does is indicate that
the reason that START is not evaluating to TRUE is because the owner of the machine is not allowing any jobs
to start right now.

The STARTD uses this distinction to give guidance to the AP about whether it will be able to re-use a slot to run another job when a job finishes.   It is also meant to useful to pool monitoring systems, for pools that have machines that are not always available to HTCondor while they are running.

For a long time IS_OWNER was configured to by default to refer to START.   It would be true when START was false when evaluated without a job classad.  The idea is that if START was always false, even with no job, it would indicate that the owner of the machine had disabled HTCondor for a period of time.

We changed the default for IS_OWNER when it became clear that this was confusing to administrators, as they had a tendency to write START expressions that would never evaluate to undefined (mostly by using =?= or =!= expressions).  So machines would appear to go briefly into Owner state at the end of each job which lead to the STARTD always telling the AP that it could not re-use a slot.

We now expect that Administrators that have machines which should actually go into Owner state to write an IS_OWNER _expression_ for that purpose, HTCondor will no longer try to guess by looking at whether START evaluates to false or to undefined.

-tj

________________________________
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Steffen Grunewald <steffen.grunewald@xxxxxxxxxx>
Sent: Friday, November 14, 2025 9:10 AM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] IS_OWNER not stopping jobs from running

On Thu, 2025-11-13 at 10:33:46 +0100, Andrew Pickford wrote:
Hi All,

According to the documentation page for the Configuration for Execution
Points, setting IS_OWNER to true should stop an execute node from accepting
jobs. I'm not sure if I'm misreading the documentation here, or I've found a
bug or maybe the partitionable slots that are configured for the execute
node are not picking up the owner state correctly? Or something else. My
first question is am I correct about IS_OWNER, should that stop jobs from
starting?
I went back to my very old Condor configs (Debian Etch/Lenny/Squeeze) - don't
ask me for matching Condor versions though.
They tell me that IS_OWNER was used with its default setting in Etch and Lenny
times (before ~2010), but for Squeeze (2011 or 2012?) I recorded
#[default ] IS_OWNER      = ( START =?= False )
as a comment for completeness' sake, and I kept this (unset) in Wheezy times.

So this seems to indicate that the "IS_OWNER" attribute was derived from "START",
not the other way round - until around 2015 at least.

Around the release of Stretch (in 2017; not too precise) it seems that the
semantics changed - I find simultaneous settings of IS_OWNER = True *and*
START = False back then. If someone (the developers?) kept a VCS from 8--10
years ago, the exact background might be tracked down.

I have some extremely vague memory of "condor_drain" having been introduced
at some point, but it's already in 8.0.4 - the oldest manual I have kept
completely (my 7.6.4 one has been stripped down to the config section).
readthedocs doesn't have release notes that old (I found the 8.8 manual that
has the history back to 8.6).
Maybe someone else can look this up (i.e. what happened and why)?

Cheers,
 S

--
Steffen Grunewald, Cluster Administrator
Max Planck Institute for Gravitational Physics (Albert Einstein Institute)
Am MÃhlenberg 1 * D-14476 Potsdam-Golm * Germany
~~~
Fon: +49-331-567 7274
Mail: steffen.grunewald(at)aei.mpg.de
~~~
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe

The archives can be found at: https://www-auth.cs.wisc.edu/lists/htcondor-users/


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe

The archives can be found at: https://www-auth.cs.wisc.edu/lists/htcondor-users/