[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Keeping track of RemoteHosts for restarted or preempted jobs



Hi Carles,

try on the sched: 

e.g. 

SYSTEM_JOB_MACHINE_ATTRS_HISTORY_LENGTH = 10

SYSTEM_JOB_MACHINE_ATTRS_HISTORY_LENGTHÂ

The integer number of run attempts to store in the job ClassAd when recording the values of machine attributes listed in SYSTEM_JOB_MACHINE_ATTRS. The default is 1. The history length may also be extended on a per-job basis by using the submit file command job_machine_attrs_history_length The larger of the system and per-job history lengths will be used. A history length of 0 disables recording of machine attributes.


Also maybe interesting: 

SYSTEM_JOB_MACHINE_ATTRS = " ... " 

If you want to use it in a START _expression_ e.g. do not start on the same machine twice: 

STARTD_ATTRS = JobMachineAttrs < ...> 

set_Requirements = Base2Requirements && Target.Machine =!= MachineAttrMachine0 && Target.Machine =!= MachineAttrMachine1 <.. > 

Best
christoph

--
Christoph Beyer
DESY Hamburg
IT-Department

Notkestr. 85
Building 02b, Room 009
22607 Hamburg

phone:+49-(0)40-8998-2317
mail: christoph.beyer@xxxxxxx


Von: "Carles Acosta" <cacosta@xxxxxx>
An: "HTCondor-Users Mail List" <htcondor-users@xxxxxxxxxxx>
Gesendet: Donnerstag, 10. Juli 2025 14:34:34
Betreff: [HTCondor-users] Keeping track of RemoteHosts for restarted or        preempted jobs

Dear all,

On our site, the jobs can be preempted or restarted several times for various reasons. When a job finishes, the only host information we can retrieve is from the LastRemoteHost attribute. We have no record of the other execution nodes where the job has previously run.

We're looking for a way to keep track of the full list of hosts on which a job has been running.

Weâve been playing with condor_chirp to implement a custom ExecutionHostHistory attribute. The idea is to append the current host to a history variable on the job wrapper. Something like this:

# Host history
host=$(hostname)

previous_history=$(/usr/libexec/condor/condor_chirp get_job_attr ExecutionHostHistory)

if [[ $previous_history != "UNDEFINED" ]]; then
    new_history="${previous_history},${host}"
else
    new_history="${host}"
fi

/usr/libexec/condor/condor_chirp set_job_attr ExecutionHostHistory "\"${new_history}\""

This works correctly on the first run, and we can see ExecutionHostHistory set with the hostname value. However, when the job is restarted again, the attribute appears as undefined.

Has anyone tried to do something similar? Or maybe there is already a variable with this information that I haven't found?

Thank you very much in advance.

Best regards,

Carles

--
Carles Acosta i Silva
PIC (Port d'Informacià CientÃfica)
Campus UAB, Edifici D
E-08193 Bellaterra, Barcelona
Tel: +34 93 581 33 08
Fax: +34 93 581 41 10
AvÃs - Aviso - Legal Notice:  http://legal.ifae.es

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe

The archives can be found at: https://www-auth.cs.wisc.edu/lists/htcondor-users/