And I guess I left off some elements; one can also set a LOCAL_DIR
and a prefix for the glide-in using the random number: export VERY_RNUM=$RANDOM export _CONDOR_STARTD_RESOURCE_PREFIX=slot_${VERY_RNUM}_ export _CONDOR_LOCAL_DIR=/scratch.local/${USER}/${VERY_RNUM}
${_CONDOR_SBIN}/condor_master -f -n compute_condor_${VERY_RNUM}
Those should be the elements that keep the different condor_master
from interfering with one another.
Greg
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Daues, Gregory Edward <daues@xxxxxxxxxxxx>
Sent: Monday, November 6, 2023 5:59 PM To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx> Subject: Re: [HTCondor-users] Multiple HTCondor workers on a single compute node
Hello,
I use the -n option with a random number in a bash script like
#!/bin/bash
export VERY_RNUM=$RANDOM
${_CONDOR_SBIN}/condor_master -f -n compute_condor_${VERY_RNUM}
but I imagine there could be other ways.
Greg
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Seung-Jin Sul <ssul@xxxxxxx>
Sent: Monday, November 6, 2023 5:21 PM To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx> Subject: [HTCondor-users] Multiple HTCondor workers on a single compute node Hi,
We use SLURM as a glide-in backend and sometimes need to run multiple HTCondor worker services on the same node. This happens when we request a part of a compute node like 1 CPU and 10GB memory from SLURM.
When we try to start another instance of HTCondor on the same node, we see below
```
11/06/23 14:49:54 lock_file returning ERROR, errno=11 (Resource temporarily unavailable)
11/06/23 14:49:54 FileLock::obtain(1) failed - errno 11 (Resource temporarily unavailable) 11/06/23 14:49:54 ERROR "Can't get lock on "/clusterfs/jgi/scratch/dsi/aa/jaws/dori-dev/htcondor-log/n0099/log/InstanceLock"" at line 1691 in fil e /var/lib/condor/execute/slot1/dir_3620933/userdir/.tmpdnieob/BUILD/condor-10.2.2/src/condor_master.V6/master.cpp ```
How can we start multiple HTcondor worker services on a node? Any info on setting the port and on the lock file will be helpful.
Thank you!
Best,
Seung
|