Hello,
I am trying to run a
this Pegasus
workflow for an experiment I am running. In order to run the workflow, I was trying to create a multi-machine condor pool using the instructions in the documentation from
here. Whenever I run through the commands on the webpage and get to the point where I run
condor_status
on the submit node. I am getting the following error.
Error: communication error
SECMAN:2007:Failed to end classad message.
I am very new to HTCondor so any advice to help me get my multi machine pool running would be greatly appreciated.
I am creating this multi-machine pool using cloud lab. Each node is a
m510 machine
running ubuntu 22.04.02 LTS. The machines are all connected to the same network and each node has a hostname node{num}. I made node0 the central manager, node1 the submit node, and node2/node3 execute nodes. The commands I ran to create the
multi-machine pool were:
$ curl -fsSL https://get.htcondor.org | sudo GET_HTCONDOR_PASSWORD="$htcondor_password" /bin/bash -s -- --no-dry-run --central-manager node0
$ curl -fsSL https://get.htcondor.org | sudo GET_HTCONDOR_PASSWORD="$htcondor_password" /bin/bash -s -- --no-dry-run --submit node0
$curl -fsSL https://get.htcondor.org | sudo GET_HTCONDOR_PASSWORD="$htcondor_password" /bin/bash -s -- --no-dry-run --execute node0
Thanks,
Vijay
|