The integer number of run attempts to store in the job ClassAd when recording the values of machine attributes listed in SYSTEM_JOB_MACHINE_ATTRS. The default is 1. The history length may also be extended on a per-job basis by using the submit file command job_machine_attrs_history_length The larger of the system and per-job history lengths will be used. A history length of 0 disables recording of machine attributes.
Dear all,
On our site, the jobs can be preempted or restarted several times for various reasons. When a job finishes, the only host information we can retrieve is from the LastRemoteHost attribute. We have no record of the other execution nodes where the job has previously run.
We're looking for a way to keep track of the full list of hosts on which a job has been running.
Weâve been playing with condor_chirp to implement a custom ExecutionHostHistory attribute. The idea is to append the current host to a history variable on the job wrapper. Something like this:
# Host history
host=$(hostname)
previous_history=$(/usr/libexec/condor/condor_chirp get_job_attr ExecutionHostHistory)
if [[ $previous_history != "UNDEFINED" ]]; then
new_history="${previous_history},${host}"
else
new_history="${host}"
fi
/usr/libexec/condor/condor_chirp set_job_attr ExecutionHostHistory "\"${new_history}\""
This works correctly on the first run, and we can see ExecutionHostHistory set with the hostname value. However, when the job is restarted again, the attribute appears as undefined.
Has anyone tried to do something similar? Or maybe there is already a variable with this information that I haven't found?
Thank you very much in advance.
Best regards,
Carles