[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Setting up HTCondor on a cluster



You can install and run HTCondor in any role under an ordinary user account without root privileges. Download and extract an appropriate tarball from our website (https://research.cs.wisc.edu/htcondor/tarball/), run <extracted-dir>/bin/make-personal-from-tarball, source the condor.sh file, and run condor_master. You can alter the configuration files in local/config.d/ for how you need it to behave.

HTCondor has several tricks to handle firewalls, NAT, and port restrictions.
HTCondor normally listens on port 9618 for incoming connections, but can be configured to use any port, either statically assigned or dynamic. The central managerâs port should be static, but on machines that are only Access Points (APs) and/or Execute Points (EPs), the port can by dynamically assigned (and restricted to a range opened in a firewall).

HTCondorâs communications pattern assumes bi-directional connectivity between central manager, APs, and EPs. The Condor 
Connection Broker (CCB) can handle some situations where TCP connections can only be established in one direction. A common situation is where EPs are on a private network with outbound connectivity via NAT. When an AP needs to connect to an EP in this case, the EP can be instructed to establish the connection to the AP (communicated via an existing TCP channel).

If these features arenât sufficient, then tunneling TCP connections via ssh may help. But this method is not good if you have large job sandboxes being moved or connecting many EPs through a single login node at a site.

 - Jaime

On Nov 12, 2024, at 5:51âAM, Kalsi, Yuvraj <ykalsi@xxxxxx> wrote:

I apologize for the confusion caused by the previous emails. To clarify, we are in the process of setting up HT-Condor on a dedicated node where we do not have root access. Please note that these nodes are separate from the "Clusters" running SLURM.

Could you kindly assist us with configuring HT-Condor in this environment, specifically with SSH forwarding?

Thank you for your support.

On Mon, Nov 11, 2024 at 9:58âAM Kalsi, Yuvraj <ykalsi@xxxxxx> wrote:

Hi Jaime,

Thank you for the response.

To set up a glidein, we still need to install and configure HTCondor on the Compute Canada cluster. Currently, we only have user-level access to the Cedar supercomputer, which is part of Compute Canada, accessible solely via SSH. Without root privileges, we cannot run any SUDO commands, which limits our ability to configure certain aspects of HTCondor.

Our goal is to configure Cedar to function as the central manager, executor, and submitter for HTCondor. If we can establish this setup successfully, we plan to expand it by adding more supercomputing clusters to the HTCondor pool. However, the lack of root access presents challenges in setting up Cedar as a fully operational central manager and connecting between machines. Additionally, Cedarâs firewall and port restrictions further complicate connectivity and communication within the HTCondor pool.

We are exploring whether it might be possible to use SSH connections instead of HTCondorâs standard protocol to connect machines. If SSH tunnelling could be used for inter-node communication, it might allow us to bypass some of Cedarâs port and firewall limitations.

If some specific configurations or workarounds would allow us to run HTCondor effectively in this environment, especially with the use of SSH, any guidance would be greatly appreciated.

Best regards,
Yuvraj Kalsi


On Tue, Nov 5, 2024 at 6:05âPM Jaime Frey <jfrey@xxxxxxxxxxx> wrote:
This sounds like a form of glidein, submitting SLURM jobs that run the HTCondor daemons of an EP (execution point) that temporarily join an HTCondor pool to run HTCondor jobs. Am I correct in that assumption?

 - Jaime

> On Nov 3, 2024, at 1:45âPM, Kalsi, Yuvraj via HTCondor-users <htcondor-users@xxxxxxxxxxx> wrote:
>
> Hi,
>
> I work with DIAG labs at MUN, and we are trying to set up HTCondor on the Compute Canada cluster. SLURM is currently being used as the meta scheduler on the Compute Canada cluster, we would like to implement HTCondor on this cluster. We would appreciate any suggestions or guidance on setting up HTCondor in an environment where we do not have root access or dedicated port access. We have attempted a few methods, but our lack of root access and restrictions on certain ports have posed challenges, as most of the commands require sudo privileges.
>
> Best
> Yuvraj Kalsi