Hello,
I have some Atlas jobs that are failling. I have look in the logs files.
I can see by example for this jobs number 93742.0. This job finished with a status 115 . What does means exactly this status ?
Bellow are some extract of logs outputs:
[root@gridarcce01 log]# grep -RH '93742' arc/arex-jobs* | more
arc/arex-jobs.log-20210211:2021-02-10 23:45:00 Finished - job id: 6PwKDm5cYTynOUEdEnzo691oABFKDmABFKDmzcfXDmDBFKDmDTZXHm, unix user: 41000:1307, name: "arc_pilot", owner: "/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN
=atlpilo1/CN=614260/CN=Robot: ATLAS Pilot1", lrms: condor, queue: grid, lrmsid: 93742.gridarcce01
[root@gridarcce01 log]# grep -RH '93742' condor/EventLog | more
condor/EventLog: 937428 - ResidentSetSize of job (KB)
condor/EventLog:006 (24968.000.000) 12/18 10:32:49 Image size of job updated: 937424
condor/EventLog:006 (26125.000.000) 12/19 11:22:07 Image size of job updated: 937424
condor/EventLog:006 (26254.000.000) 12/19 16:32:57 Image size of job updated: 937424
condor/EventLog:006 (26254.000.000) 12/19 16:37:57 Image size of job updated: 937424
condor/EventLog: 937424 - ResidentSetSize of job (KB)
condor/EventLog: 937420 - ResidentSetSize of job (KB)
condor/EventLog:006 (71776.000.000) 01/21 00:35:38 Image size of job updated: 937428
condor/EventLog:006 (73442.000.000) 01/22 02:29:37 Image size of job updated: 937428
condor/EventLog: 937428 - ResidentSetSize of job (KB)
condor/EventLog:006 (78058.000.000) 01/26 02:56:24 Image size of job updated: 937428
condor/EventLog:000 (93742.000.000) 02/09 04:12:28 Job submitted from host: <193.55.252.153:9618?addrs=193.55.252.153-9618&noUDP&sock=3115801_e73c_4>
condor/EventLog:001 (93742.000.000) 02/09 19:03:03 Job executing on host: <193.55.252.169:9618?addrs=193.55.252.169-9618&noUDP&sock=2279_c86d_3>
condor/EventLog:006 (93742.000.000) 02/09 19:03:11 Image size of job updated: 2304
condor/EventLog:006 (93742.000.000) 02/09 19:08:11 Image size of job updated: 67160
condor/EventLog:006 (93742.000.000) 02/09 19:13:12 Image size of job updated: 110340
condor/EventLog:006 (93742.000.000) 02/09 19:18:13 Image size of job updated: 1410420
condor/EventLog:006 (93742.000.000) 02/09 19:23:13 Image size of job updated: 1887892
condor/EventLog:006 (93742.000.000) 02/09 19:33:15 Image size of job updated: 1887892
condor/EventLog:005 (93742.000.000) 02/10 23:38:21 Job terminated.
condor/ShadowLog.old:02/10/21 11:43:04 (93742.0) (3863434): Time to redelegate short-lived proxy to starter.
condor/ShadowLog.old:02/10/21 23:38:21 (93742.0) (3863434): File transfer completed successfully.
condor/ShadowLog.old:02/10/21 23:38:21 (93742.0) (3863434): Job 93742.0 terminated: exited with status 0
condor/ShadowLog.old:02/10/21 23:38:21 (93742.0) (3863434): WriteUserLog checking for event log rotation, but no lock
condor/ShadowLog.old:02/10/21 23:38:21 (93742.0) (3863434): **** condor_shadow (condor_SHADOW) pid 3863434 EXITING WITH STATUS 115
[root@gridarcce01 log]# grep -RH '93742' condor/SchedLog | more
condor/SchedLog:02/10/21 23:38:21 (pid:3115849) Shadow pid 3863434 for job 93742.0 exited with status 115
condor/SchedLog:02/10/21 23:38:21 (pid:3115849) Match record (slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <193.55.252.169:9618?addrs=193.55.252.169-9618&noUDP&sock=2279_c86d_3> for group_ATLAS.atlasprd_score.atlasprd, 937
42.0) deleted
Any ideas are welcome.
Thanks
Jean-Caude
------------------------------------------------------------------------
Jean-Claude Chevaleyre < Jean-Claude.Chevaleyre(at)clermont.in2p3.fr >
Laboratoire de Physique Clermont
Campus Universitaire des CÃzeaux
4 Avenue Blaise Pascal
TSA 60026
CS 60026
63178 AubiÃre Cedex
Tel : 04 73 40 73 60
-------------------------------------------------------------------------
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/