[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] failed to fetch logs



This indicates that you don’t have the right to query some of the schedds in the -global list.

 

You need to talk to the administrator of the schedd yyy.yyy.yyy.yyy:9618,  There may be some more

information in the SchedLog on that machine that indicates why you do not have READ access.

 

-tj

 

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Weatherby,Gerard
Sent: Friday, October 7, 2022 1:08 PM
To: htcondor-users@xxxxxxxxxxx
Subject: [HTCondor-users] failed to fetch logs

 

The command:

 

condor_q  -better-analyze  --allusers -global

is failing with multiple

 

-- Failed to fetch ads from: <ip address:9618?addrs=ip-address-9618&noUDP&sock=3417_5b34_3> : hostname

I’ve run with debug and just before the error message I see:

10/07/22 13:40:16 (fd:5) (pid:27989) (D_HOSTNAME) Considering address candidate yyy.yyy.yyy.yyy:9618.

10/07/22 13:40:16 (fd:5) (pid:27989) (D_HOSTNAME) Found compatible candidate yyy.yyy.yyy.yyy:9618.

10/07/22 13:40:16 (fd:6) (pid:27989) (D_NETWORK) CONNECT bound to <xxx.xxx.xxx.xxx:41041> fd=5 peer=<yyy.yyy.yyy.yyy:9618>

10/07/22 13:40:16 (fd:6) (pid:27989) (D_NETWORK) condor_write(fd=5 schedd at <yyy.yyy.yyy.yyy:9618>,,size=46,timeout=20,flags=0,non_blocking=0)

10/07/22 13:40:16 (fd:6) (pid:27989) (D_SECURITY) SECMAN: command 516 QUERY_JOB_ADS to schedd at <yyy.yyy.yyy.yyy:9618> from TCP port 41041 (blocking).

10/07/22 13:40:16 (fd:6) (pid:27989) (D_SECURITY) SECMAN:: default CLIENT methods: FS,KERBEROS,GSI,CLAIMTOBE

10/07/22 13:40:16 (fd:6) (pid:27989) (D_NETWORK) condor_write(fd=5 schedd at <yyy.yyy.yyy.yyy:9618>,,size=43,timeout=20,flags=0,non_blocking=0)

10/07/22 13:40:16 (fd:6) (pid:27989) (D_NETWORK) condor_read(fd=5 schedd at <yyy.yyy.yyy.yyy:9618>,,size=5,timeout=20,flags=0,non_blocking=0)

10/07/22 13:40:16 (fd:6) (pid:27989) (D_NETWORK) Stream::get(int) failed to read padding

10/07/22 13:40:16 (fd:6) (pid:27989) (D_NETWORK) CLOSE TCP <xxx.xxx.xxx.xxx:41041> fd=5

10/07/22 13:40:16 (fd:5) (pid:27989) (D_HOSTNAME) Destroying Daemon object:

10/07/22 13:40:16 (fd:5) (pid:27989) (D_HOSTNAME) Type: 3 (schedd), Name: (null), Addr: <yyy.yyy.yyy.yyy:9618?addrs=yyy.yyy.yyy.yyy-9618&noUDP&sock=3417_5b34_3>

10/07/22 13:40:16 (fd:5) (pid:27989) (D_HOSTNAME) FullHost: (null), Host: (null), Pool: (null), Port: 9618

10/07/22 13:40:16 (fd:5) (pid:27989) (D_HOSTNAME) IsLocal: N, IdStr: schedd at <yyy.yyy.yyy.yyy:9618>, Error: (null)

10/07/22 13:40:16 (fd:5) (pid:27989) (D_HOSTNAME)  --- End of Daemon object info ---

Any suggestions on how to debug further?