Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[HTCondor-users] SIGQUIT / debugging
- Date: Tue, 19 Feb 2013 13:34:55 +0000
- From: "Shrum, Donald C" <DCShrum@xxxxxxxxxxxxx>
- Subject: [HTCondor-users] SIGQUIT / debugging
I periodically see jobs that fail with a SIGQUIT
In the scheduler:
SchedLog:02/18/13 19:47:35 (pid:25985) match (slot3@xxxxxxxxxxxxxxxxxx <10.178.6.101:54726> for nmg11) switching to job 5911.734
SchedLog:02/18/13 19:47:35 (pid:25985) Started shadow for job 5911.734 on slot3@xxxxxxxxxxxxxxxxxx <10.178.6.101:54726> for nmg11, (shadow pid = 14851)
SchedLog:02/18/13 19:47:37 (pid:25985) Negotiating for owner: nmg11@xxxxxxxxx
SchedLog:02/18/13 19:47:37 (pid:25985) Finished negotiating for nmg11 in local pool: 0 matched, 1 rejected
The processing node (slot3@xxxxxxxxxxxxxxxxxx in this case) I see:
02/18/13 19:47:36 Create_Process succeeded, pid=5788
02/18/13 21:10:27 Process exited, pid=5788, status=0
02/18/13 21:10:27 Got SIGQUIT. Performing fast shutdown.
02/18/13 21:10:27 ShutdownFast all jobs.
02/18/13 21:10:27 **** condor_starter (condor_STARTER) pid 5785 EXITING WITH STATUS 0
I'm inclined to think the job crashed or failed and the SIGQUIT was sent to condor as a result of the crash. Is there something else going on that I should debug. Google has not been much help thus far :)
Thanks,
Don
FSU Research Computing Center