This has nothing do to with negotiation.
You managed to time out the connection to the Schedd before the submit transaction was allowed to complete.
The timeout for holding open a transaction to the schedd without making any forward progress is 20 seconds.
So the transaction failed and no jobs were submitted.
-tj
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx>
On Behalf Of don_vanchos
Sent: Wednesday, August 14, 2019 12:52 PM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: [HTCondor-users] RuntimeError: Failed to commmit and disconnect from queue.
Hello,
I use Python with HTCondor, and when I submit the following task, everything works fine:
sub = htcondor.Submit({
"executable": "/bin/echo",
"arguments": "hello_world",
"universe": "vanilla",
"should_transfer_files": "NO",
"transfer_executable": "False",
"output": "stdout.txt",
"initialdir": "/tmpdir",
"run_as_owner": "True",
"+Owner": classad.quote("user"),
})
with schedd.transaction() as schedd_transaction:
cluster_id = sub.queue(schedd_transaction)
But then I add another line inside the 'with' _expression_ (and put 'import time' at the beginning of the file). It turns out the following:
with schedd.transaction() as schedd_transaction:
cluster_id = sub.queue(schedd_transaction)
time.sleep(30)
So the last code does not work (of course, I did import time at the beginning of the file.), the error is:
with schedd.transaction() as schedd_transaction:
cluster_id = sub.queue(schedd_transaction)
> time.sleep(30)
E RuntimeError: Failed to commmit and disconnect from queue.
The question is, why is this error happening? And how does this relate to NEGOTIATOR_INTERVAL setting? (Because 30 seconds will attach to the error when the setting is equal to 60 (default), and time.sleep(1) leads to the error when NEGOTIATOR_INTERVAL=5.)
08/14/19 17:38:45 (fd:5) (pid:25536) (D_NETWORK) condor_write(fd=4 schedd at <192.168.128.5:9618>,,size=13,timeout=0,flags=0,non_blocking=0)
08/14/19 17:38:45 (fd:5) (pid:25536) (D_DAEMONCORE:2) selector 0x7ffdfddc4a60 resetting
08/14/19 17:38:45 (fd:5) (pid:25536) (D_DAEMONCORE:2) selector 0x7ffdfddc4a60 adding fd 4 (socket:[1904821])
08/14/19 17:38:45 (fd:5) (pid:25536) (D_DAEMONCORE:2) selector 0x7ffdfddc4a60 adding fd 4 (socket:[1904821])
08/14/19 17:38:45 (fd:5) (pid:25536) (D_DAEMONCORE:2) selector 0x7ffdfddc4a60 adding fd 4 (socket:[1904821])
08/14/19 17:38:45 (fd:5) (pid:25536) (D_DAEMONCORE:2) selector 0x7ffdfddc4ac0 resetting
08/14/19 17:38:45 (fd:5) (pid:25536) (D_NETWORK) condor_read(fd=4 schedd at <192.168.128.5:9618>,,size=5,timeout=0,flags=0,non_blocking=0)
08/14/19 17:38:45 (fd:5) (pid:25536) (D_DAEMONCORE:2) selector 0x7ffdfddc4ac0 adding fd 4 (socket:[1904821])
08/14/19 17:38:45 (fd:5) (pid:25536) (D_ALWAYS:2) condor_read(): Socket closed when trying to read 5 bytes from schedd at <192.168.128.5:9618>
08/14/19 17:38:45 (fd:5) (pid:25536) (D_ALWAYS:2) IO: EOF reading packet header
08/14/19 17:38:45 (fd:5) (pid:25536) (D_NETWORK) Stream::get(int) failed to read padding
08/14/19 17:38:45 (fd:5) (pid:25536) (D_NETWORK) condor_write(fd=4 schedd at <192.168.128.5:9618>,,size=13,timeout=0,flags=0,non_blocking=0)
08/14/19 17:38:45 (fd:5) (pid:25536) (D_DAEMONCORE:2) selector 0x7ffdfddc4a40 resetting
08/14/19 17:38:45 (fd:5) (pid:25536) (D_DAEMONCORE:2) selector 0x7ffdfddc4a40 adding fd 4 (socket:[1904821])
08/14/19 17:38:45 (fd:5) (pid:25536) (D_DAEMONCORE:2) selector 0x7ffdfddc4a40 adding fd 4 (socket:[1904821])
08/14/19 17:38:45 (fd:5) (pid:25536) (D_DAEMONCORE:2) selector 0x7ffdfddc4a40 adding fd 4 (socket:[1904821])
08/14/19 17:38:45 (fd:5) (pid:25536) (D_ALWAYS) condor_write() failed: send() 13 bytes to schedd at <192.168.128.5:9618> returned -1, timeout=0, errno=32 Broken pipe.
08/14/19 17:38:45 (fd:5) (pid:25536) (D_ALWAYS) Buf::write(): condor_write() failed
08/14/19 17:38:45 (fd:5) (pid:25536) (D_NETWORK) condor_write(fd=4 schedd at <192.168.128.5:9618>,,size=13,timeout=0,flags=0,non_blocking=0)
08/14/19 17:38:45 (fd:5) (pid:25536) (D_DAEMONCORE:2) selector 0x7ffdfddc4b10 resetting
08/14/19 17:38:45 (fd:5) (pid:25536) (D_DAEMONCORE:2) selector 0x7ffdfddc4b10 adding fd 4 (socket:[1904821])
08/14/19 17:38:45 (fd:5) (pid:25536) (D_DAEMONCORE:2) selector 0x7ffdfddc4b10 adding fd 4 (socket:[1904821])
08/14/19 17:38:45 (fd:5) (pid:25536) (D_DAEMONCORE:2) selector 0x7ffdfddc4b10 adding fd 4 (socket:[1904821])
08/14/19 17:38:45 (fd:5) (pid:25536) (D_ALWAYS) condor_write() failed: send() 13 bytes to schedd at <192.168.128.5:9618> returned -1, timeout=0, errno=32 Broken pipe.
08/14/19 17:38:45 (fd:5) (pid:25536) (D_ALWAYS) Buf::write(): condor_write() failed
08/14/19 17:38:45 (fd:5) (pid:25536) (D_NETWORK) CLOSE TCP <192.168.128.2:43967> fd=4
--
|