Dear all,
On our site, the jobs can be preempted or restarted several times for various reasons. When a job finishes, the only host information we can retrieve is from the LastRemoteHost attribute. We have no record of the other execution nodes where the job has previously run.
We're looking for a way to keep track of the full list of hosts on which a job has been running.
Weâve been playing with condor_chirp to implement a custom ExecutionHostHistoryÂattribute. The idea is to append the current host to a history variable on the job wrapper. Something like this:
# Host history
host=$(hostname)
previous_history=$(/usr/libexec/condor/condor_chirp get_job_attr ExecutionHostHistory)
if [[ $previous_history != "UNDEFINED" ]]; then
  new_history="${previous_history},${host}"
else
  new_history="${host}"
fi
/usr/libexec/condor/condor_chirp set_job_attr ExecutionHostHistory "\"${new_history}\""
This works correctly on the first run, and we can see ExecutionHostHistory set with the hostname value. However, when the job is restarted again, the attribute appears as undefined.
Has anyone tried to do something similar? Or maybe there is already a variable with this information that I haven't found?
Thank you very much in advance.
Best regards,
Carles