Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] Failed to send REQUEST_CLAIM to startd
- Date: Thu, 26 Dec 2019 16:56:38 +0000
- From: Zach Miller <zmiller@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] Failed to send REQUEST_CLAIM to startd
Hello,
In your configuration with CCB, it sounds like you have the StartdDs behind a firewall. So then the Collector and the SchedD must be running on "public" IPs. That is, the StartD can connect directly to the Collector and also to the SchedD. Is that correct?
If so, you want your settings for CCB:
CCB_ADDRESS = $(COLLECTOR_HOST)
PRIVATE_NETWORK_NAME = htcondor
to be configured on the StartD machines. But you do not want to set those on the SchedD machine. If two machines have the same PRIVATE_NETWORK_NAME, then they will try to connect directly without CCB. So perhaps that is something to double check.
See also this section on "Troubleshooting CCB" to learn more about logging and what to look for:
https://htcondor.readthedocs.io/en/latest/admin-manual/networking.html#troubleshooting-ccb
Cheers,
-zach
Maybe I didnât point some setting to the SUBMIT node. How can I configure the schedd daemon? I want the schedd not to connect directly to the startd daemon, but let it use a collector and the CCB to submit job.
Please could you tell me is it possible?
Thanks in advance!
ÑÐ, 21 ÐÐÐ. 2019 Ð. Ð 00:05, Zach Miller <zmiller@xxxxxxxxxxx>:
Ivan,
This line from your log seems to be the key:
condor_schedd[8501]: attempt to connect to <10.7.128.15:49430 <http://10.7.128.15:49430>> failed: Connection refused (connect errno = 111).
The network would not allow the connection to happen to that IP and port. Could it be a firewall/iptables type of issue?
Cheers,
-zach
On 12/20/19, 6:34 AM, "HTCondor-users on behalf of don_vanchos" <htcondor-users-bounces@xxxxxxxxxxx on behalf of
hozblok@xxxxxxxxx> wrote:
Hello!
I'm trying to submit a simple vanilla task. But the task does not start. Please, could you explain this error to me?
condor_schedd[8501]: Finished negotiating for s_user in local pool: 1 matched, 1 rejected
condor_schedd[8501]: attempt to connect to <10.7.128.15:49430 <http://10.7.128.15:49430> <http://10.7.128.15:49430>> failed: Connection
refused (connect errno = 111).
condor_schedd[8501]: Failed to send REQUEST_CLAIM to startd slot1@w7-demo15 <10.7.128.15:49430?addrs=10.7.128.15-49430&alias=htcondor-remote <http://10.7.128.15:49430?addrs=10.7.128.15-49430&alias=htcondor-remote>
<http://10.7.128.15:49430?addrs=10.7.128.15-49430&alias=htcondor-remote>> for s_user: SECMAN:2003:TCP connection
to startd slot1@w7-demo15 <10.7.128.15:49430?addrs=10.7.128.15-49430&alias=htcondor-remote <http://10.7.128.15:49430?addrs=10.7.128.15-49430&alias=htcondor-remote> <http://10.7.128.15:49430?addrs=10.7.128.15-49430&alias=htcondor-remote>>
for s_user failed.
condor_schedd[8501]: Match record (slot1@w7-demo15 <10.7.128.15:49430?addrs=10.7.128.15-49430&alias=htcondor-remote <http://10.7.128.15:49430?addrs=10.7.128.15-49430&alias=htcondor-remote> <http://10.7.128.15:49430?addrs=10.7.128.15-49430&alias=htcondor-remote>>
for s_user, 8.0) deleted
How to investigate the causes of the problem?
Thanks in advance!
P.S. #condor_q -better-analyze -verbose -allusers
The Requirements expression for job 8.000 reduces to these conditions:
Slots
Step Matched Condition
----- -------- ---------
[0] 8 OpSys == "WINDOWS"
[1] 8 TARGET.Arch == "X86_64"
[3] 8 TARGET.Disk >= RequestDisk
[5] 8 TARGET.Memory >= RequestMemory
[8] 8 TARGET.HasFileTransfer
008.000: Job has not yet been considered by the matchmaker.
008.000: Run analysis summary ignoring user priority. Of 8 machines,
0 are rejected by your job's requirements
0 reject your job because of their own requirements
0 match and are already running your jobs
0 match but are serving other users
8 are able to run your job
--
Sincerely yours,
Ivan Ergunov mailto:hozblok@xxxxxxxxx
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to
htcondor-users-request@xxxxxxxxxxx <mailto:htcondor-users-request@xxxxxxxxxxx> with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
--
Sincerely yours,
Ivan Ergunov mailto:hozblok@xxxxxxxxx