Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] After negotiator problems and restart condor_userprio reports wrong values
- Date: Fri, 22 Aug 2008 10:50:41 +0200
- From: Henning Fehrmann <henning.fehrmann@xxxxxxxxxx>
- Subject: Re: [Condor-users] After negotiator problems and restart condor_userprio reports wrong values
On Fri, Aug 22, 2008 at 09:49:20AM +0200, Carsten Aulbert wrote:
Hello,
>
> in a set-up with multiple condor negotiators we had a situation where
> the system had this problem:
>
> condor_userprio: "Can't find address for negotiator"
>
> We (carefully) restarted condor on that machine and everything looked
> fine at first glance except:
>
> (1) the user prio factors were all reset to default values
> (2) we lost a lot of our "history" on that node:
>
> Number of users: 17 1178 530122.60 4/06/2008 14:31 ???
>
> on the other node:
> Number of users: 24 1179 4369145.74 5/27/2008 23:36 ???
>
> Finally:
>
> on the "deranged" node we have this line in userprio:
>
> user@xxxxxxxxxxx 500.00 0.50 1000.00 0 -202356.46
> 6/26/2008 15:41 7/15/2008 15:43
The negotiator runs on another host (n2). There we get the right results for
condor_userprio.
A strace condor_userprio on this node (n1) reports this line:
connect(3, {sa_family=AF_INET, sin_port=htons(9618), sin_addr=inet_addr("IP_n2")}, 16) = -1 EINPROGRESS (Operation now in progress)
>From one moment to another condor_userprio stopped working.
May it be the broken connection to n2? Unfortunately, we haven't done a strace while it was working.
Cheers,
Henning Fehrmann