Hi folks,
Is there any way, as Condor administrator, to track machines where jobs are having trouble to run and where their application return code is not 0 (zero) as the example bellow?
005 (14007.000.000) 06/20 16:48:33 Job terminated.
(1) Normal termination (return value -1073741502)
Usr 0 00:00:00, Sys 0 00:00:00 - Run Remote Usage
Usr 0 00:00:00, Sys 0 00:00:00 - Run Local Usage
Usr 0 00:00:00, Sys 0 00:00:00 - Total Remote Usage
Usr 0 00:00:00, Sys 0 00:00:00 - Total Local Usage
0 - Run Bytes Sent By Job
8381818 - Run Bytes Received By Job
0 - Total Bytes Sent By Job
58672720 - Total Bytes Received By Job
...
Job says terminated with normal termination but RC value is different than 0.
What I want is when a job terminates in such condition it sends an email to condor_admin. Is there any way?
Thanks,
Klaus