Is there a way to get a little more information about condor jobs and
where they run, exactly what happened other than having separate log
files for each job e.g.
Log = log_$(PROCESS).log
In the submit file?
There’s an issue when we’re submitting 1000+ jobs and we need to know
which ones failed, and where they executed. We can of course get the
failures via the return codes and error output but it would be helpful
to know exactly where this job executed. All we have at the minute is
001 (021.000.000) 09/29 09:58:54 Job executing on host:
<xxx.xxx.xxx.xxx:1104>
And while this is useful, it would be helpful to have the execute node
actually in the following:
005 (021.000.000) 09/29 09:58:55 Job terminated.
(0) Abnormal termination (signal 53)
(0) No core file
Usr 0 00:00:00, Sys 0 00:00:00 - Run Remote Usage
Usr 0 00:00:00, Sys 0 00:00:00 - Run Local Usage
Usr 0 00:00:00, Sys 0 00:00:00 - Total Remote Usage
Usr 0 00:00:00, Sys 0 00:00:00 - Total Local Usage
0 - Run Bytes Sent By Job
384684 - Run Bytes Received By Job
0 - Total Bytes Sent By Job
384684 - Total Bytes Received By Job
.
Rather than just the job id. E.g. what about:
005 (021.000.000) 09/29 09:58:55 Job terminated (after executing on
node xxx.xxx.xxx.xxx)
This probably seems trivial, but if anyone can suggest other methods
I’d be more than happy to hear them.
Kind Regards,
Shaun
------------------------------------------------------------------------
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at either
https://lists.cs.wisc.edu/archive/condor-users/
http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR