Good morning,
two weeks ago, while I was on vacation, one of our scheduler nodes died
horribly - but can probably be repaired.
I presume that all jobs that had been submitted are still known to the
schedd, and therefore would likely be restarted as soon as the machine
comes up again - but users may in the meantime have submitted identical
copies from another scheduler node, and the old copies would overwrite
their output data once they start running.
Is there a simple way to prevent this from happening?
(To learn which jobs were still in the queue would require firing up the
schedd, which would start a fresh negotiation for all of them. Catch 22?)