Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] strange condor_advertise behavior.
- Date: Fri, 14 Oct 2005 23:38:23 -0700 (PDT)
- From: Rod Walker <rwalker@xxxxxx>
- Subject: Re: [Condor-users] strange condor_advertise behavior.
Hi,
I had the same problem and it seems to be the vintage of the
condor_advertise binary which is the problem. If you update it will go
away, or you can actually put MyAddress in the classad you try to
advertise by hand.
Cheers,
Rod.
On Fri, 14 Oct 2005, Steven Timm wrote:
>
> I have a simple shell script (attached) to forward a classad from
> a number of clusters to a central collector/negotiator, from there
> to do matchmaking with Condor-G.
>
> On the first 2 clusters I tried, it worked and I can see the classAd.
>
> It is executing the command
>
> condor_advertise -pool fermigrid1.fnal.gov UPDATE_STARTD_AD GCclassad.txt
>
> and the contents of GCclassad.txt look like this:
>
> MyType = "Machine"
> Name = "fnpcg.fnal.gov:2119/jobmanager-condor"
> gatekeeper_url = "fnpcg.fnal.gov:2119/jobmanager-condor"
> TargetType = "Job"
> Requirements = TRUE
> Rank = 0.000000
> CurrentRank = 0.000000
> WantAdRevaluate = TRUE
> CurMatches = 0
> UpdateSequenceNumber = 1129319101
> gluehostapplicationsoftwareruntimeenvironment = "VO-atlas-release-9.0.3
> VO-atlas
> -lcg-release-0.0.2"
> glueceinfohostname = "fnal.gov"
> gluesubclustername = "fnal.gov"
> gluecestatestatus = "Production"
> gluecepolicymaxcputime = 2880
> gluecepolicymaxwallclocktime = 2880
> glueceaccesscontrolbaserule = "VO:*"
> GlueCEStateTotalCPUs = 27
> gluecestatefreecpus = 0
> GlueCEStateRunningJobs = 0
> GlueCEStateWaitingJobs = 0
> gluecestateestimatedresponsetime = 0
>
> So on the central collector/negotiator, condor_status looks like this:
>
>
> fngp-osg.fnal [?????????] [????] [????????] [???] [??] [Unknown]
> fnpcg.fnal.go [?????????] [????] [????????] [???] [??] [Unknown]
> vm1@fermigrid LINUX INTEL Unclaimed Idle 0.000 997
> 0+00:01:51
> vm2@fermigrid LINUX INTEL Unclaimed Idle 0.490 997
> 0+01:35:23
> vm3@fermigrid LINUX INTEL Unclaimed Idle 0.000 997
> 0+01:35:14
> vm4@fermigrid LINUX INTEL Unclaimed Idle 0.000 997
> 0+01:35:11
>
> Machines Owner Claimed Unclaimed Matched Preempting
>
> INTEL/LINUX 4 0 0 4 0 0
>
> Total 4 0 0 4 0 0
>
> (Omitted 2 malformed ads in computed attribute totals)
>
> -------------------------\\
>
> If I do the following:
>
> MyAddress = "<131.225.166.93:0>"
> LastHeardFrom = 1129319400
> UpdatesTotal = 4
> UpdatesSequenced = 0
> UpdatesLost = 0
> UpdatesHistory = "0x0000000000000000000000000000000
>
> I see that the two classads which successfully are seen by the collector
> have a field called MyAddress appended to the classad, a field which
> is not in the classad file.'
>
> There is a third node on which I am trying to run the same script.
> I do not see this one show up in the collector. Instead I see:
>
> 10/13 09:44:00 Got IP = '(null)'
> 10/13 09:44:00 No IP address in classAd
> 10/13 09:44:00 Error: Invalid StartAd
> 10/13 09:44:00 Could not make hashkey --- ignoring ad
> 10/13 09:44:00 Received malformed ad from command (0). Ignoring.
>
>
> I'm guessing from that, that the condor schedd on that node,
> which is an earlier version, 6.7.6, is configured slightly differently
> and is not including the MyAddress field in the classad for whatever
> reason.
>
> Any idea what the magic configuration tweak is to make it include
> MyAddress in the classad? Thanks for any help.
>
> Steve Timm
>
>
--
Rod Walker +1 6042913051