We want to do similar condor_history scaling within LIGO but the underlying issue is that RemoteWallClockTime is cumulative over all machines that a job executed. How do you handle that? Is it an issue in your scheduling system at all?
One could hand roll something using the functionality where job attributes (incl. machine attributes added to the job) are retained for up to N matches, but that's a bit hacky. We've had informal discussions with HTCondor about something more sophisticated like Intel PCM. But nothing in detail.
It matters primarily for our ability to make accurate funding requests for modern hardware based upon data from a mix of contemporary and previous-generation hardware.
Tom
ïOn 1/21/19, 9:10 AM, "HTCondor-users on behalf of Stephen Jones" <htcondor-users-bounces@xxxxxxxxxxx on behalf of sjones@xxxxxxxxxxxxxxxx> wrote:
Hi Steven,
On 21/01/2019 14:37, Steven C Timm wrote:
>
> MachineRalScaling is not a default HTCondor attribute, it must be
> something that is being defined in GridPP clusters somehow.
>
It's defined for our APEL accounting. We attach it to "the job" via
SUBMIT_EXPRS on the head node.
MachineRalScaling = "$$([ifThenElse(isUndefined(RalScaling), 1.00, RalScaling)])"
SUBMIT_EXPRS = $(SUBMIT_EXPRS) MachineRalScaling
Since the $$ syntax is used, expansion is delayed until the job gets on
the worker-node. In the worker node, a local value (RalScaling) is
substituted in. This gives the power of the node, hence we can have
heterogeneous worker-nodes and the the power expression comes out in the
job data. The MachineRalScaling "emerges" in the condor_history data.
It's then a simple matter to multiply the wallclocktime by
MachineRalScaling to "normalise" the job, i.e. make them all the same.
# condor_history -long 1233764.0 | grep MATCH_EXP
MATCH_EXP_MachineRalScaling = "1.036000000000000E+00"
This has stopped happening, in the new condor I use. I was wondering if
the behaviour has been changed, somehow. I got it working with
SYSTEM_JOB_MACHINE_ATTRS.... but I'd like to keep things the same. I
have a feeling that MATCH_EXP_* is an "undocumented" feature, since I
can't see it anywhere in the manuals.
Cheers,
Ste
--
Steve Jones sjones@xxxxxxxxxxxxxxxx
Grid System Administrator office: 220
High Energy Physics Division tel (int): 43396
Oliver Lodge Laboratory tel (ext): +44 (0)151 794 3396
University of Liverpool http://www.liv.ac.uk/physics/hep/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
Attachment:
smime.p7s
Description: S/MIME cryptographic signature