[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor 7.1.0 and condor_config_val



This is indeed a bug introduced in 7.0.0.  Previously, the client was 
sending extra junk at the end of the command that was ignored by the 
server.  In 7.0.0, we caught a bunch of cases like this where the server 
was not correctly checking for and reading the end of the command, but 
unfortunately, we did not notice that this specific command was actually 
not terminating the command correctly on the client side.  I'll fix this 
now, but I may be too late for 7.0.2, which is nearly released.
--Dan

Daniel Forrest wrote:

I just installed Condor 7.1.0 and I am having a problem with
condor_config_val.  I am trying this command:

condor_config_val -schedd -set MAX_JOBS_RUNNING=4000

This worked fine with Condor 6.8.1.


The SchedLog shows this:

6/4 08:53:07 (fd:16) (pid:205850) DaemonCore received UNAUTHENTICATED command 60002.
6/4 08:53:07 (fd:16) (pid:205850) DaemonCore: Command received via TCP from host <192.168.0.149:44575>, access level ALLOW
6/4 08:53:07 (fd:16) (pid:205850) DaemonCore: received command 60002 (DC_CONFIG_PERSIST), calling handler (handle_config())
6/4 08:53:07 (fd:16) (pid:205850) Calling HandleReq <handle_config()> (0)
6/4 08:53:07 (fd:16) (pid:205850) Failed to read end of message from <192.168.0.149:44575>.
6/4 08:53:07 (fd:16) (pid:205850) handle_config: failed to read end of message
6/4 08:53:07 (fd:16) (pid:205850) Return from HandleReq <handle_config()>
6/4 08:53:07 (fd:16) (pid:205850) CLOSE <192.168.0.149:47089> fd=14


It doesn't look like "-debug" works with condor_config_val, but using
strace to capture some stuff gives me:

sendto(3, "\1\0\0\0000\0\0\0\0\0\0\352bmax_jobs_running\0MAX_JOBS_RUNNING=4000\0\n", 53, 0, NULL, 0) = 53
recvfrom(3, "", 5, 0, NULL, NULL)       = 0
write(4, "condor_read(): Socket closed when trying to read 5 bytes from <192.168.0.149:470"..., 84) = 84
write(4, "IO: EOF reading packet header\n", 30) = 30
write(4, "Stream::get(int) failed to read padding\n", 40) = 40
write(2, "Can\'t receive reply from schedd on condor.lmcg.wisc.edu <192.168.0.149:47089>\n", 78) = 78


The corresponding strace from the schedd looks like:

recvfrom(14, "\1\0\0\0", 4, MSG_PEEK, NULL, NULL) = 4
recvfrom(14, "\1\0\0\0000", 5, 0, NULL, NULL) = 5
recvfrom(14, "\0\0\0\0\0\0\352bmax_jobs_running\0MAX_JOBS_RUNNING=4000\0\n", 48, 0, NULL, NULL) = 48


So it looks like the schedd doesn't send a reply, but why?