Dear list members,
this is my first attempt to use/set up HTCondor and my first post to the ML, so hello to you all :-)
We are tring to configure a minimum HTCondor âclusterâ to get into the topic, but it seems that we are misunderstand something or made 1 to x mistakes ...
If this was already described and solved frequently, I would be happy to be directed to a ML thread or any other source, my recent search did not lead me to something helpful.
We are using version 8.8.9.
You can find details of our setup below.
After installing HTCondor and creating a sample job I get the message:
âERROR: Can't find address of local scheddâ
I then saw in Task Manager that condor_schedd is not running, neither on the machine FROM which I submit the job nor the machine TO which I submit the job (central manager). In this context, does âsubmit jobsâ in the manual mean âsubmit from a client PC to the central managerâ OR âsubmit from central manager to the pool, i.e. to (a) client(s) executing the jobâ? Or both? Because this has implications for what box needs to be checked during setup.
What could be the reason for this problem? Did I misunderstand something and therefore set it up incorrectly?
How can I solve this?
Thank you for your time and help!
Finn
####
Setup Details:
Intended (and testing) Setup:
- 1 scheduling server (âcentral managerâ in the docs), currently a Windows 10 VM => âSCHEDULERâ
- 3-4 desktop machines/ laptops from which jobs will be submitted (test: 1 Win10 desktop) => âSUBMITTERSâ
- 10-20 currently unused desktop machines (dedicated to HTCondor, will not be used by humans in parallel; test: 1 laptop) as worker bees which will receive jobs from the scheduler => âWORKERSâ
After reading the docs, we set up the 3 machines using the Windows GUI installer according to the following settings:
- SCHEDULER: âCreate a new HTCondor Poolâ; Name of new pool: TEST; Submit jobs to HTCondor pool: Unchecked (because the docs say âGenerally jobs should not be either submitted or run on the central manager machineâ); âDo not run jobs on this machineâ.
- SUBMITTER: âJoin existing HTCondor Poolâ, Hostname of central manager: (hostname of SCHEDULER); Submit jobs to HTCondor pool: Checked; âDo not run jobs on this machineâ.
- WORKER: âJoin existing HTCondor Poolâ, Hostname of central manager: (hostname of SCHEDULER); Submit jobs to HTCondor pool: Unchecked; âAlways run jobs and never suspend themâ.
I do not list the remaining setup config because I assume that it is irrelevant for the issue at hand.
Based on this setup, I created a submission description file âexample1_submit.txtâ (which calls rscript.exe that gets the path to an R script passed as argument).
On the submitter, I then called:
condor_submit example1_submit.txt
This however returns âERROR: Can't find address of local scheddâ. condor_schedd.exe is not running on the SCHEDULER nor the SUBMITTER.
Finn
Bastiansen | Effect
Modelling and Statistics
RIFCON GmbH | GoldbeckstraÃe 13 | 69493 Hirschberg
T. +49 6201 84528-24 | Fax: +49 (0)6201 8452899
Finn.Bastiansen@xxxxxxxxx |
MEET US!
| www.rifcon.de