[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Submit job from host to htcondor/mini container using python bindings



HI Todd and Cole,

Thanks for the responses. I'm going to consolidate here as there's some overlap.

> Just so I understand correctly, you are running a mini-condor in a container and trying to place jobs to the system from the host machine outside the container?
Right

> Is there a reason that you need to place jobs from outside the container?
The eventual goal here is to add a feature to submit to HTC to an existing python based job running service. As such, that service will need to be able to remotely submit to HTC and this is the very first stumbling, baby-horse-like step towards that goal

> Where does the 99-NO_AUTH.config file come from? I.e. Did you find it online or did someone share it with you?
A combination of myself reading docs and AI trying to figure out why I was getting what appeared to be authentication failures when trying to run jobs. I'm not surprised if it's whackadoo. I was just trying to shut off as much authentication as possible to get to a point where I could at least submit a job.

>ÂHere is what I did to run your submit a sleep job test
The difference here is that I was running the python code on the host, outside the container, so a no argument Schedd() won't find the service

>Â I suggest you try again just without customizing the configuration, the defaults should be fine.Â

Here's what happens without the 99-NO_AUTH.config file either in the container or in `htcondor.param`:

```
$ docker run -d -p 9618:9618 htcondor/mini:24.11.2-el9
ce4829266267392b96c7d5bdf6069701ab09301bb81836902787b04033e6ef34

Then in ipython on the host:
In [1]: import htcondor
/home/crushingismybusiness/github/kbase/cdm-task-service/.venv/lib/python3.12/site-packages/htcondor/__init__.py:49: UserWarning: Neither the environment variable CONDOR_CONFIG, /etc/condor/, /usr/local/etc/, nor ~condor/ contain a condor_config source. Therefore, we are using a null condor_config.
 _warnings.warn(message)

In [2]: collector = htcondor.Collector("172.17.0.2:9618")

In [3]: schedd_ad = collector.locate(htcondor.DaemonTypes.Schedd)

In [4]: schedd_ad["MyAddress"]
Out[4]: '<172.17.0.2:9618?addrs=172.17.0.2-9618&alias=ce4829266267&noUDP&sock=schedd_18_eccb>'

In [5]: schedd = htcondor.Schedd(schedd_ad)

In [6]: sub = htcondor.Submit({
 Â...:   "executable": "/bin/sleep",
 Â...:   "arguments": "30",
 Â...:   "output": "/tmp/sleep.out",
 Â...:   "error": "/tmp/sleep.err",
 Â...:   "log": "/tmp/sleep.log",
 Â...: })

In [7]: cluster_id = schedd.submit(sub)
The remote host ce4829266267 presented an untrusted CA certificate with the following fingerprint:
SHA-256: e8:24:0b:bd:b1:4e:9b:9f:5d:f8:04:3f:47:19:61:1b:a9:15:0c:a6:ad:16:b9:71:25:63:82:f9:1a:a2:c7:29
Subject: /O=condor/CN=ce4829266267
Would you like to trust this server for current and future communications?
Please type 'yes' or 'no':
yes
---------------------------------------------------------------------------
HTCondorIOError              Traceback (most recent call last)
Cell In[7], line 1
----> 1 cluster_id = schedd.submit(sub)

File ~/github/kbase/cdm-task-service/.venv/lib/python3.12/site-packages/htcondor/_lock.py:70, in add_lock.<locals>.wrapper(*args, **kwargs)
  Â67 try:
  Â68   acquired = LOCK.acquire()
---> 70 Â Â rv = func(*args, **kwargs)
  Â72   # if the function returned a context manager,
  Â73   # create a LockedContext to manage the lock
  Â74   is_cm = is_context_manager(rv)

HTCondorIOError: Failed to connect to schedd.

In SchedLog I see:
09/12/25 21:15:15 (pid:38) TransferQueueManager stats: active up=0/100 down=0/100; waiting up=0 down=0; wait time up=0s down=0s
09/12/25 21:15:15 (pid:38) TransferQueueManager upload 1m I/O load: 0 bytes/s Â0.000 disk load Â0.000 net load
09/12/25 21:15:15 (pid:38) TransferQueueManager download 1m I/O load: 0 bytes/s Â0.000 disk load Â0.000 net load
09/12/25 21:19:42 (pid:38) DC_AUTHENTICATE: authentication of <172.17.0.1:43497> did not result in a valid mapped user name, which is required for this command (1112 QMGMT_WRITE_CMD), so aborting.
```

That's what I tried at first, which led to the 99-NO_AUTH file (and a bunch of other stuffÂI tried) to attempt toÂfix both the cert check and the DC_AUTHENTICATE issue

I *think* that covers all of your questions, but please let me know if I missed anything or if what I'm trying to do is unclear. Or if the approach I'm taking is completely off base for that matter

Thanks very much for the help so far and your time, much appreciated

Gavin


On Fri, Sep 12, 2025 at 12:34âPM Todd Tannenbaum <tannenba@xxxxxxxxxxx> wrote:
On 9/12/2025 1:51 PM, Gavin Price wrote:

Hi all,

I started using HTCondor literally Tuesday so I'm an ultranoob. I'm trying to do something that I naivelyÂthink should be pretty simple, e.g. run the htcondor/mini container and submit a job from the host using the python client.Â

Hi Gavin,

The htcondor/mini container is an all-in-one container that running a full HTCondor System, including both the HTCondor Access Point service (which holds and manages submitted jobs and runs the schedd) and the Execution Point service (which runs jobs). All configuration defaults should be reasonable, and no config customization should be required.

Running this container is a great way to "kick the tires", learn how to submit jobs and workflows, and play with the HTCondor Python API bindings.ÂÂ

I think where you took a wrong turn was changing around the default configuration by volume mounting config files (i.e. "99-NO_AUTH.config"). I suggest you try again just without customizing the configuration, the defaults should be fine.  Note that you cannot submit or run jobs as user "root"; the htcondor/mini container has user "submituser" created for this purpose. Here is what I did to run your submit a sleep job test:

C:\> docker run -d htcondor/mini
C:\> docker exec -it -u submituser quirky_shtern bash
[submituser@b7c643c8d107 /]$ python3
Python 3.9.21 (main, Jun 27 2025, 00:00:00)
[GCC 11.5.0 20240719 (Red Hat 11.5.0-5)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import htcondor
>>> schedd = htcondor.Schedd()
>>> sub = htcondor.Submit({
... "executable": "/bin/sleep",
... "arguments": "30",
... "output" : "/tmp/sleep.out",
... "error" : "/tmp/sleep.err",
... "log" : "/tmp/sleep.log",
... })
>>> cluster_id = schedd.submit(sub)


OR... are you trying to use the htcondor/mini container to submit jobs to an pre-existing Access Point (schedd) running someplace else, e.g. on a remote server? That is NOT what the htcondor/mini container is about or is configured to do; you probably want the htcondor/submit container. Also, if you wish to simply get an HTCondor Pool going across multiple servers or contains, you could use the "get_htcondor" tool as described in the AdminÂ
Quick Start Guide at https://htcondor.readthedocs.io/en/latest/getting-htcondor/admin-quick-start.html

Hope the above helps,
Todd