[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Env vars to make htcondor.Schedd() Just Work?



Hi all,

I'm still trying to figure this out but thought I'd post progress so far so I don't waste time if someone looks at my old message. I'm getting some behavior changes by setting more configs in env vars and a config file, but I still haven't found a way to make htcondor.Schedd() alone connect to the remote schedd instance correctly. It seems my config is allowing the collector to return an address for the schedd, but it's missing important info in the query parameters, I guess?

Anyway, here's where I'm at. If anyone has any advice that'd be really helpful

```
In [1]: import os

In [2]: os.environ["_CONDOR_CONDOR_HOST"]
Out[2]: 'htcondor-mini:9618'

In [3]: os.environ["CONDOR_HOST"]
Out[3]: 'htcondor-mini:9618'

In [4]: os.environ["_CONDOR_COLLECTOR_HOST"]
Out[4]: 'htcondor-mini:9618'

In [5]: os.environ["COLLECTOR_HOST"]
Out[5]: 'htcondor-mini:9618'

In [6]: os.environ["_CONDOR_SCHEDD_HOST"]
Out[6]: 'htcondor-mini:9618'

In [7]: os.environ["SCHEDD_HOST"]
Out[7]: 'htcondor-mini:9618'

In [8]: !cat $CONDOR_CONFIG
COLLECTOR_HOST=htcondor-mini:9618
CONDOR_HOST=htcondor-mini:9618
SCHEDD_HOST=htcondor-mini:9618

In [9]: import htcondor

In [10]: schedd = htcondor.Schedd()

In [11]: sub = htcondor.Submit({
  ...:   "executable": "/bin/sleep",
  ...:   "initialdir": "/tmp",
  ...:   "arguments": "30",
  ...:   "output": "/tmp/sleep.out",
  ...:   "error": "/tmp/sleep.err",
  ...:   "log": "/tmp/sleep.log",
  ...: })

In [12]: clid = schedd.submit(sub)
---------------------------------------------------------------------------
HTCondorIOError              Traceback (most recent call last)
Cell In[12], line 1
----> 1 clid = schedd.submit(sub)

File /usr/local/lib/python3.12/site-packages/htcondor/_lock.py:70, in add_lock.<locals>.wrapper(*args, **kwargs)
  Â67 try:
  Â68   acquired = LOCK.acquire()
---> 70 Â Â rv = func(*args, **kwargs)
  Â72   # if the function returned a context manager,
  Â73   # create a LockedContext to manage the lock
  Â74   is_cm = is_context_manager(rv)

HTCondorIOError: Failed to connect to schedd.

In [13]: collector = htcondor.Collector()

In [14]: schedd_ad = collector.locate(htcondor.DaemonTypes.Schedd)

In [15]: schedd_ad["MyAddress"]
Out[15]: '<172.18.0.5:9618?alias=htcondor-mini>'

In [16]: schedd = htcondor.Schedd(schedd_ad)

In [17]: clid = schedd.submit(sub)
---------------------------------------------------------------------------
HTCondorIOError              Traceback (most recent call last)
Cell In[17], line 1
----> 1 clid = schedd.submit(sub)

File /usr/local/lib/python3.12/site-packages/htcondor/_lock.py:70, in add_lock.<locals>.wrapper(*args, **kwargs)
  Â67 try:
  Â68   acquired = LOCK.acquire()
---> 70 Â Â rv = func(*args, **kwargs)
  Â72   # if the function returned a context manager,
  Â73   # create a LockedContext to manage the lock
  Â74   is_cm = is_context_manager(rv)

HTCondorIOError: Failed to connect to schedd.

In [18]: collector = htcondor.Collector(os.environ["COLLECTOR_HOST"])

In [19]: schedd_ad = collector.locate(htcondor.DaemonTypes.Schedd)

In [20]: schedd_ad["MyAddress"]
Out[20]: '<172.18.0.5:9618?addrs=172.18.0.5-9618&alias=71ad2f98e72a&noUDP&sock=schedd_21_fb94>'

In [21]: schedd = htcondor.Schedd(schedd_ad)

In [22]: clid = schedd.submit(sub)

In [23]: clid.cluster()
Out[23]: 6
```

Thianks in advance for any ideas,

-g

On Tue, Sep 16, 2025 at 7:40âPM Gavin Price <gaprice@xxxxxxx> wrote:
Hi all,Â

I'm setting up the htcondor python client in a docker container with the intent of submitting jobs to a remote cluster. I'm trying to figure out what environmental variables I need to set to not need to specify a schedd_ad for Sehedd() (and a host for Collector(), which ideally wouldn't be needed). I've set the CONDOR_HOST env var (see below) which is what I would expect to work but in my hands it doesn't.Â

Below is an example of what I want to do vs what actually works:

* htcondor.Schedd() - doesn't work, this is what I want to work
* htcondor.Collector() -> schedd_ad - doesn't work
* htcondor.Collector(<host>) -> schedd_ad -> Schedd(schedd_ad) - works

What env vars do I need to set so the first line works?

Here's what I'm doing:

```
n [1]: import os

In [2]: os.environ["_CONDOR_CONDOR_HOST"]
Out[2]: 'htcondor-mini:9618'

In [3]: os.environ["CONDOR_HOST"]
Out[3]: 'htcondor-mini:9618'

In [4]: import htcondor
/usr/local/lib/python3.12/site-packages/htcondor/__init__.py:49: UserWarning: Neither the environment variable CONDOR_CONFIG, /etc/condor/, /usr/local/etc/, nor ~condor/ contain a condor_config source. Therefore, we are using a null condor_config.
 _warnings.warn(message)

### Schedd() only

In [5]: schedd = htcondor.Schedd()
---------------------------------------------------------------------------
HTCondorLocateError            Traceback (most recent call last)
Cell In[5], line 1
----> 1 schedd = htcondor.Schedd()

File /usr/local/lib/python3.12/site-packages/htcondor/_lock.py:70, in add_lock.<locals>.wrapper(*args, **kwargs)
  Â67 try:
  Â68   acquired = LOCK.acquire()
---> 70 Â Â rv = func(*args, **kwargs)
  Â72   # if the function returned a context manager,
  Â73   # create a LockedContext to manage the lock
  Â74   is_cm = is_context_manager(rv)

HTCondorLocateError: Unable to locate local daemon

### schedd_ad from Collector()

In [7]: collector = htcondor.Collector()

In [8]: schedd_ad = collector.locate(htcondor.DaemonTypes.Schedd)
---------------------------------------------------------------------------
HTCondorLocateError            Traceback (most recent call last)
Cell In[8], line 1
----> 1 schedd_ad = collector.locate(htcondor.DaemonTypes.Schedd)

File /usr/local/lib/python3.12/site-packages/htcondor/_lock.py:70, in add_lock.<locals>.wrapper(*args, **kwargs)
  Â67 try:
  Â68   acquired = LOCK.acquire()
---> 70 Â Â rv = func(*args, **kwargs)
  Â72   # if the function returned a context manager,
  Â73   # create a LockedContext to manage the lock
  Â74   is_cm = is_context_manager(rv)

HTCondorLocateError: Unable to locate local daemon

### Full setup

In [9]: collector = htcondor.Collector("htcondor-mini:9618")

In [10]: schedd_ad = collector.locate(htcondor.DaemonTypes.Schedd)

In [11]: schedd = htcondor.Schedd(schedd_ad)

In [12]: sub = htcondor.Submit({
  ...:  Â...:   "executable": "/bin/sleep",
  ...:  Â...:   "initialdir": "/tmp",
  ...:  Â...:   "arguments": "30",
  ...:  Â...:   "output": "/tmp/sleep.out",
  ...:  Â...:   "error": "/tmp/sleep.err",
  ...:  Â...:   "log": "/tmp/sleep.log",
  ...:  Â...: })

In [13]: cluster_id = schedd.submit(sub)

In [15]: cluster_id.cluster()
Out[15]: 2
```

Thanks in advance for any advice,

-g