Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] (Yes & No!) Try to set DedicatedScheduler = NO JOBS EVER onto WN :[
- Date: Thu, 16 Jan 2020 11:02:42 +0000
- From: Winnie Lacesso <Winnie.Lacesso@xxxxxxxxxxxxx>
- Subject: Re: [HTCondor-users] (Yes & No!) Try to set DedicatedScheduler = NO JOBS EVER onto WN :[
Good morning!
Thank you again a zillion times Brian Lin! BLESS for your wise
helpfulness!
To refresh context:
On Mon, 13 Jan 2020, Brian Lin wrote:
> If you can identify jobs that are being passed through your CE, you can
> inspect their ClassAds [1] with `condor_q -l <JOB ID>` and compare those
> ClassAds with the ClassAds of your non-grid jobs to find an
> attribute/value pair that's unique to your grid jobs. For example, if
> Arc CE sets 'SourceCE = "lcgce01"' for all grid jobs, you could set the
> following on your worker nodes:
>
> START = SourceCE == "lcgce01"
So this looks like "job from lcgce01":
NordugridQueue = "gridAMD"
On a WN, .job.ad file from a grid job vs .job.ad file from a local
user was compared. Local user .job.ad file not have that!
Hope! Put in /etc/condor/config.d/20_workernode.config on experimental WN:
START = NordugridQueue == "gridAMD"
(Is that syntax correct?)
& ran condor_reconfig
Progress - grid jobs land+run on WN! Big Improvement over "no jobs, EVER"!
But then I submitted jobs as myself (= a local user) not via lcgce01,
& they *also* land+run on this experimental WN. DARN!
So, it doesn't seem to work either (to prevent local user jobs)! :[
I can't parse the htc00 /var/log/condor/*Log logs to understand where
the WN is supposed to be telling it "I only want jobs with blahblah set"
root@htc00> cd /var/log/condor
root@htc00> nice -n 19 grep -li NordugridQueue *Log
root@htc00> # nothing
Where might one find out what the WN is "advertising" to the MatchMaker?
(Assumption: it is logged somewhere! Could be wrong about that!)
In MatchLog is a definite "Matched 2005139.0 phpwl@xxxxxxxxxxxxxx
snip snip ... slot1@xxxxxxxxxxxxxxxxxxxxxxxx"
gridftp02 (yeah, was gridftp server, repurposed as WN) is the experimental
WN who's supposed to advertise
START = NordugridQueue == "gridAMD"
and my local-user job (this is confirmed) does NOT have that in its
condor_q -l output, or in the .job.ad file when it's running on the
experimental WN.
Note - my local user account doesn't even have an /etc/passwd entry on the
test WN (that's one somewhat sort of drastic way supposedly to try to
prevent local user jobs landing on WN where they're not supposed to).
It's a bit astonishing the local-user job was allowed to start+run with no
entry in /etc/passwd!...
Any further tips/clues/advice most gratefully welcomed!