[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] STARTD_AD_REEVAL_EXPR message in NegotiatorLog





I recently discovered this bug too.  There is a fix that will be
released with condor 7.0.5 and 7.1.3.

The consequence of the bug is that the CurMatches attribute is reset to
0 in some cases before a new update to the site classad is published.

--Dan

For 3 years Condor staff have been telling us that this was a  feature
of condor-g matchmaking and that it was supposed to work that way,
i.e. that CurMatches would reset to zero after every negotiation cycle
and that it only goes up during the negotiation cycle and will never
actually show up in the machine classad.  What changed?

Steve Timm





Warren Smith wrote:

Hi, I'm working on doing some matchmaking with Condor_G (Condor 7.1.0).
I get errors in the NegotiatorLog such as:

9/2 10:34:01 ---------- Started Negotiation Cycle ----------
9/2 10:34:01 Phase 1:  Obtaining ads from collector ...
9/2 10:34:01   Getting all public ads ...
9/2 10:34:01   Sorting 38 ads ...
9/2 10:34:01 Can't evaluate STARTD_AD_REEVAL_EXPR
target.UpdateSequenceNumber > my.UpdateSequenceNumber as a bool,
treating as TRUE
...
9/2 10:34:02 Can't evaluate STARTD_AD_REEVAL_EXPR
target.UpdateSequenceNumber > my.UpdateSequenceNumber as a bool,
treating as TRUE
9/2 10:34:02   Getting startd private ads ...
9/2 10:34:02 Got ads: 38 public and 0 private
9/2 10:34:02 Public ads include 1 submitter, 33 startd

This error doesn't seem to be affecting anything (classads are getting
updated), but I thought I'd double check since my web searching didn't
really turn up anything.

I get the message for, what looks like, each of the classads I inserted
into condor with condor_advertise. Here is an example class ad:

lslogin2$ condor_status -l tacc.lonestar.serial
MyType = "Machine"
TargetType = "Job"
Requirements = (TARGET.JobUniverse == 9)
Rank = 0.000000
CurrentRank = 0.000000
WantAdRevaluate = TRUE
CurMatches = 0
Name = "tacc.lonestar.serial"
Machine = "gatekeeper.lonestar.tacc.teragrid.org"
StartdIpAddr = "<129.114.50.32>"
GridResource = "gt2
gatekeeper.lonestar.tacc.teragrid.org:2119/jobmanager-lsf"
State = "Unclaimed"
Activity = "Idle"
UpdateSequenceNumber = 1220367368
Arch = "X86_64"
OpSys = "LINUX"
LoadAvg = 0.865580
TotalMemory = 11840721
Memory = 1725537
Queue = "serial"
Priority = 0.030000
MaxWallTime = 720
MaxProcessors = 1
MyAddress = "<192.5.198.172:0>"
LastHeardFrom = 1220367369
UpdatesTotal = 1328
UpdatesSequenced = 0
UpdatesLost = 0
UpdatesHistory = "0x00000000000000000000000000000000"


I'm setting the UpdateSequenceNumber using Unix time(). I did try to
temporarily change this to be just the last 5 digits of the current time
and I got the same error.

Thanks for the help,


Warren

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/


_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/