You should be aware that SYSTEM_JOB_MACHINE_ATTRS_HISTORY_LENGTH will grow your job_queue.log file fairly quickly if jobs are repeatedly trying to start and failing.
The EPOCH history file or the job's LOG file is a better way to get a record of where the job has run.
I would recommend that you use SYSTEM_JOB_MACHINE_ATTRS_HISTORY_LENGTH only if you intend to reference the job attributes it creates in job policy expressions like Requirements. the history length should be no more than what you need for Requirements, etc.
Thank you very much! I'll try playing with SYSTEM_JOB_MACHINE_ATTRS_HISTORY_LENGTH :)
Hi Carles,
try on the sched:
e.g.
SYSTEM_JOB_MACHINE_ATTRS_HISTORY_LENGTH = 10
SYSTEM_JOB_MACHINE_ATTRS_HISTORY_LENGTH
¶
The integer number of run attempts to store in the job ClassAd when recording the values of machine attributes listed in
SYSTEM_JOB_MACHINE_ATTRS. The default is 1. The history length may also be extended on a per-job basis by using the submit file command
job_machine_attrs_history_length The larger of the system and per-job history lengths will be used. A history length of 0 disables recording of machine attributes.
Also maybe interesting:
SYSTEM_JOB_MACHINE_ATTRS = " ... "
If you want to use it in a START _expression_ e.g. do not start on the same machine twice:
STARTD_ATTRS = JobMachineAttrs < ...>
set_Requirements = Base2Requirements && Target.Machine =!= MachineAttrMachine0 && Target.Machine =!= MachineAttrMachine1 <.. >
Best
christoph
--
Christoph Beyer
DESY Hamburg
IT-Department
Notkestr. 85
Building 02b, Room 009
22607 Hamburg
phone:+49-(0)40-8998-2317
mail:
christoph.beyer@xxxxxxx
Von: "Carles Acosta" <
cacosta@xxxxxx>
An: "HTCondor-Users Mail List" <
htcondor-users@xxxxxxxxxxx>
Gesendet: Donnerstag, 10. Juli 2025 14:34:34
Betreff: [HTCondor-users] Keeping track of RemoteHosts for restarted or preempted jobs
Dear all,
On our site, the jobs can be preempted or restarted several times for various reasons. When a job finishes, the only host information we
can retrieve is from the LastRemoteHost attribute. We have no record of the other execution nodes where the job has previously run.
We're looking for a way to keep track of the full list of hosts on which a job has been running.
We’ve been playing with condor_chirp to implement a custom ExecutionHostHistory attribute. The idea is to append the current host to a
history variable on the job wrapper. Something like this:
# Host history
host=$(hostname)
previous_history=$(/usr/libexec/condor/condor_chirp get_job_attr ExecutionHostHistory)
if [[ $previous_history != "UNDEFINED" ]]; then
new_history="${previous_history},${host}"
else
new_history="${host}"
fi
/usr/libexec/condor/condor_chirp set_job_attr ExecutionHostHistory "\"${new_history}\""
This works correctly on the first run, and we can see ExecutionHostHistory set with the hostname value. However, when the job is restarted
again, the attribute appears as undefined.
Has anyone tried to do something similar? Or maybe there is already a variable with this information that I haven't found?
Thank you very much in advance.
Best regards,
Carles
--
Carles Acosta i Silva
PIC (Port d'Informació Científica)
Campus UAB, Edifici D
E-08193 Bellaterra, Barcelona
Tel: +34 93 581 33 08
Fax: +34 93 581 41 10