Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] question on late materialization shadows
- Date: Tue, 8 Aug 2023 15:11:22 +0000
- From: John M Knoeller <johnkn@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] question on late materialization shadows
Hi Thomas,
There is nothing special about the Shadow for late materialization jobs. In fact the Negotiator does not even make matches for jobs until after they are materialized.
Really, the only difference between regular submit and late materialization is that with regular submit, the jobs are materialized by condor_submit before they are submitted to the Schedd. While with late materialization, the jobs are materialized by the Schedd after. In either case, only job that have been materialized are considered for matchmaking. There are no virtual shadows.
-tj
-----Original Message-----
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Thomas Hartmann
Sent: Tuesday, August 8, 2023 5:59 AM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: [HTCondor-users] question on late materialization shadows
Hi all,
I have a quick question on how shadows are instantiated for late
materialization jobs.
Thing is, that at first I got confused by an user's jobs, for which I
noticed >100 shadow starts (and exits) in the access point's/scheduler's
{Sched,Shadow}Log. Initially, I interpreted the log messages like [1],
that the shadow was going to be brokered to the logged execution
point/worker but exited with `JOB_EXCEPTION` (without an actual job
being instantiate on the worker).
But according to the negotiator all these transient shadows had no
match, so also no events logged on the EPs' Star(er)Logs. Since these
jobs were max_materialize jobs, I guess that the shadows were all
"virtual" shadows until the final matching succeeded and the shadow and
the job actual became real, or?
A corollary question would be, if one could somehow differentiated
between "virtual" shadows and "real" shadows from multiple job runs?
I.e., on the scheds we add a few additional (execution point) ads to the
jobs like [2] where the idea is to include benchmark performance info in
the job.
But since each "virtual" shadow without actual realization also
"inherits" the EP ads from the transient match, these "virtual"
shadow/worker details are piling up in the extended job ads [3].
Cheers,
Thomas
[1]
08/08/23 11:08:34 (pid:2305) match (slot2@xxxxxxxxxxxxxxxxx
<131.169.164.78:35712?addrs=131.169.164.78-35712+[2001-638-700-10a0--1-44e]-35712&alias=batch1378.desy.de>
for BIRD_atlas.lite.tadej) switching to job 19407249.2850
08/08/23 11:08:34 (pid:2305) Shadow pid 3161435 switching to job
19407249.2850.
08/08/23 11:08:34 (pid:2305) Starting add_shadow_birthdate(19407249.2850)
08/08/23 11:08:34 (pid:2305) Shadow pid 3161435 for job 19407249.2850
exited with status 4
[2]
JobMachineSpecAttrs = $(JobMachineSpecAttrs) HS06 HS06PerSlot
HS06perWatt ApelScaledPerSlot ClusterAvgCoreHS06
SYSTEM_JOB_MACHINE_ATTRS = $(SYSTEM_JOB_MACHINE_ATTRS)
$(JobMachineSpecAttrs)
SUBMIT_ATTRS = $(SUBMIT_ATTRS) $(JobMachineSpecAttrs)
SYSTEM_JOB_MACHINE_ATTRS_HISTORY_LENGTH = 5
[3]
MachineAttrHS06perWatt0 = 1.79
MachineAttrHS06perWatt1 = 3.7
MachineAttrHS06perWatt2 = 3.7
MachineAttrHS06perWatt3 = 3.7
MachineAttrHS06perWatt4 = 3.83
MachineAttrHS06perWatt5 = 3.7
MachineAttrHS06perWatt6 = 3.7
MachineAttrHS06perWatt7 = 3.83
MachineAttrHS06perWatt8 = 2.06
MachineAttrHS06perWatt9 = 3.7