[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Condor-users] Strange schedd crash (exit status 44)



Hmm. Well, we're running on windows. The driving script is a perl script wrapped in as a bat file. It's not that the jobs are dying. That doesn't bother me. That's our problem. It's that the shadow dies and then takes down the schedd process with it. That shouldn't happen.

Ian 

> -----Original Message-----
> From: condor-users-bounces@xxxxxxxxxxx 
> [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Alain EMPAIN
> Sent: November 24, 2004 3:50 PM
> To: Condor-Users Mail List
> Subject: Re: [Condor-users] Strange schedd crash (exit status 44)
> 
> My euro cent :
> 
> my jobs where canceled just because I forgot to add the #! 
> line as the first line of my bash script
> 
> #!/bin/bash
> 
> like we use #!/usr/bin/perl
> 
> 
> 	Cheers,
> 
> 	Alain
> -----------------------
> 
> 	Hoping this is as simple than that
> 
> Ian Chesal wrote:
> > Okay. Something is definitly wrong here. Shadows are dying 
> and they're 
> > taking out the schedd with it. That's not good. Can't 
> anyone offer any 
> > insight?
> > 
> > Thanks!
> > Ian
> > 
> > ----
> > 
> > his is an automated email from the Condor system on machine 
> > "TTC-MDEHKORD.altera.priv.altera.com".  Do not reply.
> > 
> > "d:\abc\condor/bin/condor_schedd.exe" on 
> > "TTC-MDEHKORD.altera.priv.altera.com" exited with status 44.
> > 
> > Condor will automatically restart this process in 10 seconds.
> > 
> > *** Last 100 line(s) of file SchedLog:
> > 11/24 14:31:04 ERROR: Shadow exited with job exception code!
> > 11/24 14:31:06 Started shadow for job 15.1 on 
> "<137.57.176.30:4411>", 
> > (shadow pid = 2260)
> > 11/24 14:31:06 DaemonCore: Command received via UDP from host 
> > <137.57.142.168:4226>
> > 11/24 14:31:06 DaemonCore: received command 60001 (DC_PROCESSEXIT), 
> > calling handler (HandleProcessExitCommand())
> > 
> > 11/24 14:31:06 ERROR: Shadow exited with job exception code!
> > 11/24 14:31:08 Started shadow for job 15.0 on 
> "<137.57.176.30:4411>", 
> > (shadow pid = 2484)
> > 11/24 14:31:08 DaemonCore: Command received via UDP from host 
> > <137.57.142.168:4230>
> > 11/24 14:31:08 DaemonCore: received command 60001 (DC_PROCESSEXIT), 
> > calling handler (HandleProcessExitCommand())
> > 
> > 11/24 14:31:08 ERROR: Shadow exited with job exception code!
> > 11/24 14:31:08 Match for cluster 15 has had 5 shadow exceptions, 
> > relinquishing.
> > 11/24 14:31:08 Sent RELEASE_CLAIM to startd on <137.57.176.30:4411>
> > 11/24 14:31:08 Match record (<137.57.176.30:4411>, 15, 0) deleted
> > 11/24 14:31:10 Started shadow for job 15.1 on 
> "<137.57.176.30:4411>", 
> > (shadow pid = 176)
> > 11/24 14:31:10 Sent ad to 1 collectors for mdehkord@xxxxxxxxxx
> > 11/24 14:31:10 DaemonCore: Command received via UDP from host 
> > <137.57.142.168:4238>
> > 11/24 14:31:10 DaemonCore: received command 60001 (DC_PROCESSEXIT), 
> > calling handler (HandleProcessExitCommand())
> > 
> > 11/24 14:31:10 ERROR: Shadow exited with job exception code!
> > 11/24 14:31:10 Match for cluster 15 has had 5 shadow exceptions, 
> > relinquishing.
> > 11/24 14:31:10 Sent RELEASE_CLAIM to startd on <137.57.176.30:4411>
> > 11/24 14:31:10 Match record (<137.57.176.30:4411>, 15, 1) deleted
> > 11/24 14:31:11 DaemonCore: Command received via TCP from host 
> > <137.57.176.30:2376>
> > 11/24 14:31:11 DaemonCore: received command 443 (VACATE_SERVICE), 
> > calling handler (vacate_service)
> > 11/24 14:31:12 Got VACATE_SERVICE from <137.57.176.30:2376>
> > 11/24 14:31:12 DaemonCore: Command received via TCP from host 
> > <137.57.176.30:2377>
> > 11/24 14:31:12 DaemonCore: received command 443 (VACATE_SERVICE), 
> > calling handler (vacate_service)
> > 11/24 14:31:12 Got VACATE_SERVICE from <137.57.176.30:2377>
> > 11/24 14:32:42 Activity on stashed negotiator socket
> > 11/24 14:32:42 Negotiating for owner: mdehkord@xxxxxxxxxx
> > 11/24 14:32:42 Checking consistency running and runnable jobs
> > 11/24 14:32:42 Tables are consistent
> > 11/24 14:32:42 Out of jobs - 2 jobs matched, 0 jobs idle, 
> flock level 
> > = 0
> > 11/24 14:32:42 Sent ad to 1 collectors for mdehkord@xxxxxxxxxx
> > 11/24 14:32:46 Started shadow for job 15.0 on 
> "<137.57.176.30:4411>", 
> > (shadow pid = 840)
> > 11/24 14:32:47 DaemonCore: Command received via UDP from host 
> > <137.57.142.168:4250>
> > 11/24 14:32:47 DaemonCore: received command 60001 (DC_PROCESSEXIT), 
> > calling handler (HandleProcessExitCommand())
> > 
> > 11/24 14:32:47 ERROR: Shadow exited with job exception code!
> > 11/24 14:32:48 Started shadow for job 15.1 on 
> "<137.57.176.30:4411>", 
> > (shadow pid = 3388)
> > 11/24 14:32:49 DaemonCore: Command received via UDP from host 
> > <137.57.142.168:4254>
> > 11/24 14:32:49 DaemonCore: received command 60001 (DC_PROCESSEXIT), 
> > calling handler (HandleProcessExitCommand())
> > 
> > 11/24 14:32:49 ERROR: Shadow exited with job exception code!
> > 11/24 14:32:50 Started shadow for job 15.0 on 
> "<137.57.176.30:4411>", 
> > (shadow pid = 2024)
> > 11/24 14:32:51 DaemonCore: Command received via UDP from host 
> > <137.57.142.168:4258>
> > 11/24 14:32:51 DaemonCore: received command 60001 (DC_PROCESSEXIT), 
> > calling handler (HandleProcessExitCommand())
> > 
> > 11/24 14:32:51 ERROR: Shadow exited with job exception code!
> > 11/24 14:32:52 Started shadow for job 15.1 on 
> "<137.57.176.30:4411>", 
> > (shadow pid = 3784)
> > 11/24 14:32:53 DaemonCore: Command received via UDP from host 
> > <137.57.142.168:4262>
> > 11/24 14:32:53 DaemonCore: received command 60001 (DC_PROCESSEXIT), 
> > calling handler (HandleProcessExitCommand())
> > 
> > 11/24 14:32:53 ERROR: Shadow exited with job exception code!
> > 11/24 14:32:54 Started shadow for job 15.0 on 
> "<137.57.176.30:4411>", 
> > (shadow pid = 3364)
> > 11/24 14:32:55 DaemonCore: Command received via UDP from host 
> > <137.57.142.168:4266>
> > 11/24 14:32:55 DaemonCore: received command 60001 (DC_PROCESSEXIT), 
> > calling handler (HandleProcessExitCommand())
> > 
> > 11/24 14:32:55 ERROR: Shadow exited with job exception code!
> > 11/24 14:32:56 Started shadow for job 15.1 on 
> "<137.57.176.30:4411>", 
> > (shadow pid = 3028)
> > 11/24 14:32:57 DaemonCore: Command received via UDP from host 
> > <137.57.142.168:4270>
> > 11/24 14:32:57 DaemonCore: received command 60001 (DC_PROCESSEXIT), 
> > calling handler (HandleProcessExitCommand())
> > 
> > 11/24 14:32:57 ERROR: Shadow exited with job exception code!
> > 11/24 14:32:58 Started shadow for job 15.0 on 
> "<137.57.176.30:4411>", 
> > (shadow pid = 3224)
> > 11/24 14:32:59 DaemonCore: Command received via UDP from host 
> > <137.57.142.168:4274>
> > 11/24 14:32:59 DaemonCore: received command 60001 (DC_PROCESSEXIT), 
> > calling handler (HandleProcessExitCommand())
> > 
> > 11/24 14:32:59 ERROR: Shadow exited with job exception code!
> > 11/24 14:33:00 Started shadow for job 15.1 on 
> "<137.57.176.30:4411>", 
> > (shadow pid = 908)
> > 11/24 14:33:01 DaemonCore: Command received via UDP from host 
> > <137.57.142.168:4278>
> > 11/24 14:33:01 DaemonCore: received command 60001 (DC_PROCESSEXIT), 
> > calling handler (HandleProcessExitCommand())
> > 
> > 11/24 14:33:01 ERROR: Shadow exited with job exception code!
> > 11/24 14:33:02 Started shadow for job 15.0 on 
> "<137.57.176.30:4411>", 
> > (shadow pid = 756)
> > 11/24 14:33:03 DaemonCore: Command received via UDP from host 
> > <137.57.142.168:4282>
> > 11/24 14:33:03 DaemonCore: received command 60001 (DC_PROCESSEXIT), 
> > calling handler (HandleProcessExitCommand())
> > 
> > 11/24 14:33:03 ERROR: Shadow exited with job exception code!
> > 11/24 14:33:03 Match for cluster 15 has had 5 shadow exceptions, 
> > relinquishing.
> > 11/24 14:33:03 Sent RELEASE_CLAIM to startd on <137.57.176.30:4411>
> > 11/24 14:33:03 Match record (<137.57.176.30:4411>, 15, 0) deleted
> > 11/24 14:33:03 DaemonCore: Command received via TCP from host 
> > <137.57.176.30:2392>
> > 11/24 14:33:03 DaemonCore: received command 443 (VACATE_SERVICE), 
> > calling handler (vacate_service)
> > 11/24 14:33:03 Got VACATE_SERVICE from <137.57.176.30:2392>
> > 11/24 14:33:04 Started shadow for job 15.1 on 
> "<137.57.176.30:4411>", 
> > (shadow pid = 3928)
> > 11/24 14:33:04 Sent ad to 1 collectors for mdehkord@xxxxxxxxxx
> > 11/24 14:33:05 DaemonCore: Command received via UDP from host 
> > <137.57.142.168:4290>
> > 11/24 14:33:05 DaemonCore: received command 60001 (DC_PROCESSEXIT), 
> > calling handler (HandleProcessExitCommand())
> > 
> > 11/24 14:33:05 ERROR: Shadow exited with job exception code!
> > 11/24 14:33:05 Match for cluster 15 has had 5 shadow exceptions, 
> > relinquishing.
> > 11/24 14:33:05 Sent RELEASE_CLAIM to startd on <137.57.176.30:4411>
> > 11/24 14:33:05 Match record (<137.57.176.30:4411>, 15, 1) deleted
> > 11/24 14:33:05 DaemonCore: Command received via TCP from host 
> > <137.57.176.30:2393>
> > 11/24 14:33:05 DaemonCore: received command 443 (VACATE_SERVICE), 
> > calling handler (vacate_service)
> > 11/24 14:33:05 Got VACATE_SERVICE from <137.57.176.30:2393>
> > 11/24 14:34:42 Activity on stashed negotiator socket
> > 11/24 14:34:42 Negotiating for owner: mdehkord@xxxxxxxxxx
> > 11/24 14:34:43 Checking consistency running and runnable jobs
> > 11/24 14:34:43 Tables are consistent
> > 11/24 14:34:43 Out of jobs - 2 jobs matched, 0 jobs idle, 
> flock level 
> > = 0
> > 11/24 14:34:43 Sent ad to 1 collectors for mdehkord@xxxxxxxxxx
> > 11/24 14:34:48 Started shadow for job 15.0 on 
> "<137.57.176.30:4411>", 
> > (shadow pid = 2188)
> > 11/24 14:34:48 DaemonCore: Command received via UDP from host 
> > <137.57.142.168:4305>
> > 11/24 14:34:48 DaemonCore: received command 60001 (DC_PROCESSEXIT), 
> > calling handler (HandleProcessExitCommand())
> > 
> > 11/24 14:34:48 ERROR: Shadow exited with job exception code!
> > 11/24 14:34:50 Started shadow for job 15.1 on 
> "<137.57.176.30:4411>", 
> > (shadow pid = 2796)
> > 11/24 14:34:50 DaemonCore: Command received via UDP from host 
> > <137.57.142.168:4309>
> > 11/24 14:34:50 DaemonCore: received command 60001 (DC_PROCESSEXIT), 
> > calling handler (HandleProcessExitCommand())
> > 
> > 11/24 14:34:50 ERROR: Shadow exited with job exception code!
> > 11/24 14:34:52 Started shadow for job 15.0 on 
> "<137.57.176.30:4411>", 
> > (shadow pid = 712)
> > *** End of file SchedLog
> > 
> > 
> >>-----Original Message-----
> >>From: condor-users-bounces@xxxxxxxxxxx 
> >>[mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Ian Chesal
> >>Sent: November 24, 2004 12:10 PM
> >>To: Condor-Users Mail List
> >>Subject: RE: [Condor-users] Strange schedd crash (exit status 44)
> >>
> >>We got the same crash again with schedd on Windows. This is 
> the 6.7.2 
> >>branch. Is there something in the output that might tip us off to a 
> >>problem? It looks like it's dying trying to fork a 
> condor_shadown for 
> >>a new job in both cases.
> >>
> >>Thanks!
> >>Ian
> >>
> >>----
> >>This is an automated email from the Condor system on machine 
> >>"TTC-GQUAN3.altera.priv.altera.com".  Do not reply.
> >>
> >>"d:\abc\condor/bin/condor_schedd.exe" on 
> >>"TTC-GQUAN3.altera.priv.altera.com" exited with status 44.
> >>Condor will automatically restart this process in 10 seconds.
> >>
> >>*** Last 100 line(s) of file SchedLog:
> >>11/24 09:14:42 attempt to add pre-existing match 
> >>"<137.57.176.183:4197>#1099203124#1706" ignored
> >>11/24 09:14:42 attempt to add pre-existing match 
> >>"<137.57.176.179:2712>#1099202607#1606" ignored
> >>11/24 09:14:42 Sent RELEASE_CLAIM to startd on <137.57.176.180:1047>
> >>11/24 09:14:42 Match record (<137.57.176.180:1047>, 20, 234) deleted
> >>11/24 09:14:49 DaemonCore: Command received via UDP from host 
> >><137.57.142.51:1319>
> >>11/24 09:14:49 DaemonCore: received command 60001 (DC_PROCESSEXIT), 
> >>calling handler (HandleProcessExitCommand())
> >>
> >>11/24 09:14:52 Started shadow for job 20.232 on 
> >>"<137.57.176.180:1047>", (shadow pid = 2152)
> >>11/24 09:14:52 Sent ad to 1 collectors for gquan@xxxxxxxxxx
> >>11/24 09:14:53 DaemonCore: Command received via TCP from host 
> >><137.57.176.180:1877>
> >>11/24 09:14:53 DaemonCore: received command 443 (VACATE_SERVICE), 
> >>calling handler (vacate_service)
> >>11/24 09:14:53 Got VACATE_SERVICE from <137.57.176.180:1877>
> >>11/24 09:14:53 Sent RELEASE_CLAIM to startd on <137.57.176.180:1047>
> >>11/24 09:14:53 Match record (<137.57.176.180:1047>, 20, 232) deleted
> >>11/24 09:14:53 DaemonCore: Command received via UDP from host 
> >><137.57.142.51:1331>
> >>11/24 09:14:53 DaemonCore: received command 60001 (DC_PROCESSEXIT), 
> >>calling handler (HandleProcessExitCommand())
> >>
> >>11/24 09:14:54 DaemonCore: Command received via UDP from host 
> >><137.57.142.51:1332>
> >>11/24 09:14:54 DaemonCore: received command 60001 (DC_PROCESSEXIT), 
> >>calling handler (HandleProcessExitCommand())
> >>
> >>11/24 09:14:54 Scheduler::Relinquish - mrec is NULL, can't 
> relinquish
> >>11/24 09:14:54 Null parameter --- match not deleted
> >>11/24 09:14:56 Started shadow for job 20.233 on 
> >>"<137.57.176.180:1047>", (shadow pid = 2720)
> >>11/24 09:14:58 Started shadow for job 20.234 on 
> >>"<137.57.176.180:1047>", (shadow pid = 2100)
> >>11/24 09:14:58 Sent ad to 1 collectors for gquan@xxxxxxxxxx
> >>11/24 09:16:41 Response problem from startd.
> >>11/24 09:16:41 Sent RELEASE_CLAIM to startd on <137.57.176.183:4197>
> >>11/24 09:16:41 Match record (<137.57.176.183:4197>, 20, 235) deleted
> >>11/24 09:16:42 Activity on stashed negotiator socket
> >>11/24 09:16:42 Negotiating for owner: gquan@xxxxxxxxxx
> >>11/24 09:16:42 Checking consistency running and runnable jobs
> >>11/24 09:16:42 Tables are consistent
> >>11/24 09:16:43 Out of servers - 0 jobs matched, 36 jobs idle,
> >>1 jobs rejected
> >>11/24 09:16:43 Response problem from startd.
> >>11/24 09:16:43 Sent RELEASE_CLAIM to startd on <137.57.176.179:2712>
> >>11/24 09:16:43 Match record (<137.57.176.179:2712>, 20, 236) deleted
> >>11/24 09:17:28 Sent ad to 1 collectors for gquan@xxxxxxxxxx
> >>11/24 09:18:43 Activity on stashed negotiator socket
> >>11/24 09:18:43 Negotiating for owner: gquan@xxxxxxxxxx
> >>11/24 09:18:43 Checking consistency running and runnable jobs
> >>11/24 09:18:43 Tables are consistent
> >>11/24 09:18:43 Out of servers - 0 jobs matched, 36 jobs idle,
> >>1 jobs rejected
> >>11/24 09:19:43 DaemonCore: Command received via UDP from host 
> >><137.57.142.51:1395>
> >>11/24 09:19:43 DaemonCore: received command 60001 (DC_PROCESSEXIT), 
> >>calling handler (HandleProcessExitCommand())
> >>
> >>11/24 09:19:43 DaemonCore: Command received via UDP from host 
> >><137.57.142.51:1398>
> >>11/24 09:19:43 DaemonCore: received command 60001 (DC_PROCESSEXIT), 
> >>calling handler (HandleProcessExitCommand())
> >>
> >>11/24 09:19:45 Started shadow for job 20.232 on 
> >>"<137.57.176.180:1047>", (shadow pid = 2672)
> >>11/24 09:19:47 Started shadow for job 20.235 on 
> >>"<137.57.176.180:1047>", (shadow pid = 1448)
> >>11/24 09:19:47 Sent ad to 1 collectors for gquan@xxxxxxxxxx
> >>11/24 09:20:43 Activity on stashed negotiator socket
> >>11/24 09:20:43 Negotiating for owner: gquan@xxxxxxxxxx
> >>11/24 09:20:43 Checking consistency running and runnable jobs
> >>11/24 09:20:43 Tables are consistent
> >>11/24 09:20:44 Out of servers - 4 jobs matched, 30 jobs idle,
> >>1 jobs rejected
> >>11/24 09:20:58 DaemonCore: Command received via UDP from host 
> >><137.57.142.51:1427>
> >>11/24 09:20:58 DaemonCore: received command 60001 (DC_PROCESSEXIT), 
> >>calling handler (HandleProcessExitCommand())
> >>
> >>11/24 09:21:01 Started shadow for job 20.236 on 
> >>"<137.57.176.183:4197>", (shadow pid = 2108)
> >>11/24 09:21:01 Sent ad to 1 collectors for gquan@xxxxxxxxxx
> >>11/24 09:21:07 DaemonCore: Command received via TCP from host 
> >><137.57.176.183:2328>
> >>11/24 09:21:07 DaemonCore: received command 443 (VACATE_SERVICE), 
> >>calling handler (vacate_service)
> >>11/24 09:21:07 Got VACATE_SERVICE from <137.57.176.183:2328>
> >>11/24 09:21:07 Sent RELEASE_CLAIM to startd on <137.57.176.183:4197>
> >>11/24 09:21:07 Match record (<137.57.176.183:4197>, 20, 236) deleted
> >>11/24 09:21:07 DaemonCore: Command received via UDP from host 
> >><137.57.142.51:1440>
> >>11/24 09:21:07 DaemonCore: received command 60001 (DC_PROCESSEXIT), 
> >>calling handler (HandleProcessExitCommand())
> >>
> >>11/24 09:21:07 Scheduler::Relinquish - mrec is NULL, can't 
> relinquish
> >>11/24 09:21:07 Null parameter --- match not deleted
> >>11/24 09:21:10 Started shadow for job 20.238 on 
> >>"<137.57.176.183:4197>", (shadow pid = 2772)
> >>11/24 09:21:10 Sent ad to 1 collectors for gquan@xxxxxxxxxx
> >>11/24 09:22:34 DaemonCore: Command received via UDP from host 
> >><137.57.142.51:1462>
> >>11/24 09:22:34 DaemonCore: received command 60001 (DC_PROCESSEXIT), 
> >>calling handler (HandleProcessExitCommand())
> >>
> >>11/24 09:22:37 Started shadow for job 20.236 on 
> >>"<137.57.176.179:2712>", (shadow pid = 2292)
> >>11/24 09:22:37 Sent ad to 1 collectors for gquan@xxxxxxxxxx
> >>11/24 09:22:42 DaemonCore: Command received via TCP from host 
> >><137.57.176.179:2089>
> >>11/24 09:22:42 DaemonCore: received command 443 (VACATE_SERVICE), 
> >>calling handler (vacate_service)
> >>11/24 09:22:42 Got VACATE_SERVICE from <137.57.176.179:2089>
> >>11/24 09:22:42 Sent RELEASE_CLAIM to startd on <137.57.176.179:2712>
> >>11/24 09:22:42 Match record (<137.57.176.179:2712>, 20, 236) deleted
> >>11/24 09:22:43 DaemonCore: Command received via UDP from host 
> >><137.57.142.51:1473>
> >>11/24 09:22:43 DaemonCore: received command 60001 (DC_PROCESSEXIT), 
> >>calling handler (HandleProcessExitCommand())
> >>
> >>11/24 09:22:43 Scheduler::Relinquish - mrec is NULL, can't 
> relinquish
> >>11/24 09:22:43 Null parameter --- match not deleted
> >>11/24 09:22:44 Activity on stashed negotiator socket
> >>11/24 09:22:44 Negotiating for owner: gquan@xxxxxxxxxx
> >>11/24 09:22:44 Checking consistency running and runnable jobs
> >>11/24 09:22:45 Tables are consistent
> >>11/24 09:22:45 Out of servers - 3 jobs matched, 29 jobs idle,
> >>1 jobs rejected
> >>11/24 09:22:45 attempt to add pre-existing match 
> >>"<137.57.176.180:1047>#1100637096#502" ignored
> >>11/24 09:22:45 attempt to add pre-existing match 
> >>"<137.57.176.180:1047>#1100637096#501" ignored
> >>11/24 09:22:45 attempt to add pre-existing match 
> >>"<137.57.176.179:2712>#1099202607#1607" ignored
> >>11/24 09:22:45 Started shadow for job 20.239 on 
> >>"<137.57.176.179:2712>", (shadow pid = 1144)
> >>11/24 09:22:45 Sent ad to 1 collectors for gquan@xxxxxxxxxx
> >>11/24 09:24:36 DaemonCore: Command received via UDP from host 
> >><137.57.142.51:1505>
> >>11/24 09:24:36 DaemonCore: received command 60001 (DC_PROCESSEXIT), 
> >>calling handler (HandleProcessExitCommand())
> >>
> >>11/24 09:24:37 DaemonCore: Command received via UDP from host 
> >><137.57.142.51:1508>
> >>11/24 09:24:37 DaemonCore: received command 60001 (DC_PROCESSEXIT), 
> >>calling handler (HandleProcessExitCommand())
> >>
> >>11/24 09:24:39 DaemonCore: Command received via TCP from host 
> >><137.57.176.180:2306>
> >>11/24 09:24:39 DaemonCore: received command 443 (VACATE_SERVICE), 
> >>calling handler (vacate_service)
> >>11/24 09:24:39 Got VACATE_SERVICE from <137.57.176.180:2306>
> >>11/24 09:24:39 Sent RELEASE_CLAIM to startd on <137.57.176.180:1047>
> >>11/24 09:24:39 Match record (<137.57.176.180:1047>, 20, 236) deleted
> >>11/24 09:24:39 match or classad for job 20.236 was deleted - not 
> >>forking a shadow
> >>11/24 09:24:39 Started shadow for job 20.237 on 
> >>"<137.57.176.180:1047>", (shadow pid = 3912)
> >>*** End of file SchedLog
> >>
> >>
> >>
> >>-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> >>Questions about this message or Condor in general?
> >>Email address of the local Condor administrator: 
> >>swttcabca@xxxxxxxxxx The Official Condor Homepage is 
> >>http://www.cs.wisc.edu/condor
> >>
> >>
> >> 
> >>
> >>
> >>>-----Original Message-----
> >>>From: condor-users-bounces@xxxxxxxxxxx 
> >>>[mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Ian Chesal
> >>>Sent: November 23, 2004 2:45 PM
> >>>To: Condor-Users Mail List
> >>>Subject: [Condor-users] Strange schedd crash (exit status 44)
> >>>
> >>>I get a schedd crash from this users machine every time he
> >>
> >>queues up
> >>
> >>>100 or more jobs. What does exit status 44 indicate?
> >>>
> >>>Thanks!
> >>>Ian
> >>>
> >>>-----Original Message-----
> >>>From: SYSTEM@xxxxxxxxxx [mailto:SYSTEM@xxxxxxxxxx]
> >>>Sent: November 23, 2004 2:32 PM
> >>>To: SW TOR Batch System Admins
> >>>Subject: [Condor] Problem
> >>>
> >>>This is an automated email from the Condor system on machine 
> >>>"TTC-GQUAN3.altera.priv.altera.com".  Do not reply.
> >>>
> >>>"d:\abc\condor/bin/condor_schedd.exe" on 
> >>>"TTC-GQUAN3.altera.priv.altera.com" exited with status 44.
> >>>Condor will automatically restart this process in 10 seconds.
> >>>
> >>>*** Last 100 line(s) of file SchedLog:
> >>>11/23 14:28:58 attempt to add pre-existing match 
> >>>"<137.57.176.180:1047>#1100637096#282" ignored
> >>>11/23 14:28:58 attempt to add pre-existing match 
> >>>"<137.57.176.182:1151>#1099422886#1224" ignored
> >>>11/23 14:28:58 attempt to add pre-existing match 
> >>>"<137.57.176.182:1151>#1099422886#1223" ignored
> >>>11/23 14:28:58 attempt to add pre-existing match 
> >>>"<137.57.176.183:4197>#1099203124#1580" ignored
> >>>11/23 14:28:58 attempt to add pre-existing match 
> >>>"<137.57.176.183:4197>#1099203124#1579" ignored
> >>>11/23 14:28:58 attempt to add pre-existing match 
> >>>"<137.57.176.185:1407>#1099202749#1981" ignored
> >>>11/23 14:28:58 attempt to add pre-existing match 
> >>>"<137.57.176.185:1407>#1099202749#1982" ignored
> >>>11/23 14:28:58 attempt to add pre-existing match 
> >>>"<137.57.176.177:1213>#1100703290#277" ignored
> >>>11/23 14:28:58 attempt to add pre-existing match 
> >>>"<137.57.176.186:2147>#1099203682#1256" ignored
> >>>11/23 14:28:58 attempt to add pre-existing match 
> >>>"<137.57.176.177:1213>#1100703290#276" ignored
> >>>11/23 14:28:58 attempt to add pre-existing match 
> >>>"<137.57.176.186:2147>#1099203682#1257" ignored
> >>>11/23 14:28:58 attempt to add pre-existing match 
> >>>"<137.57.176.178:3591>#1099202664#1406" ignored
> >>>11/23 14:28:58 attempt to add pre-existing match 
> >>>"<137.57.176.179:2712>#1099202607#1468" ignored
> >>>11/23 14:28:59 attempt to add pre-existing match 
> >>>"<137.57.176.179:2712>#1099202607#1467" ignored
> >>>11/23 14:29:31 DaemonCore: Command received via UDP from host 
> >>><137.57.142.51:4119>
> >>>11/23 14:29:31 DaemonCore: received command 60001 
> (DC_PROCESSEXIT), 
> >>>calling handler (HandleProcessExitCommand())
> >>>
> >>>11/23 14:29:36 Started shadow for job 19.130 on 
> >>>"<137.57.176.179:2712>", (shadow pid = 472)
> >>>11/23 14:29:36 Sent ad to 1 collectors for gquan@xxxxxxxxxx
> >>>11/23 14:29:36 timed out requesting claim from 
> <137.57.176.180:1047>
> >>>11/23 14:29:36 Sent RELEASE_CLAIM to startd on 
> <137.57.176.180:1047>
> >>>11/23 14:29:36 timed out requesting claim from 
> <137.57.176.180:1047>
> >>>11/23 14:29:36 Sent RELEASE_CLAIM to startd on 
> <137.57.176.180:1047>
> >>>11/23 14:29:40 DaemonCore: Command received via TCP from host 
> >>><137.57.176.179:4906>
> >>>11/23 14:29:40 DaemonCore: received command 443 (VACATE_SERVICE), 
> >>>calling handler (vacate_service)
> >>>11/23 14:29:40 Got VACATE_SERVICE from <137.57.176.179:4906>
> >>>11/23 14:29:40 Sent RELEASE_CLAIM to startd on 
> <137.57.176.179:2712>
> >>>11/23 14:29:40 Match record (<137.57.176.179:2712>, 19, 
> 130) deleted
> >>>11/23 14:29:40 DaemonCore: Command received via UDP from host 
> >>><137.57.142.51:4133>
> >>>11/23 14:29:40 DaemonCore: received command 60001 
> (DC_PROCESSEXIT), 
> >>>calling handler (HandleProcessExitCommand())
> >>>
> >>>11/23 14:29:40 Scheduler::Relinquish - mrec is NULL, can't
> >>
> >>relinquish
> >>
> >>>11/23 14:29:40 Null parameter --- match not deleted
> >>>11/23 14:29:44 Started shadow for job 19.159 on 
> >>>"<137.57.176.179:2712>", (shadow pid = 2972)
> >>>11/23 14:29:44 Sent ad to 1 collectors for gquan@xxxxxxxxxx
> >>>11/23 14:29:45 timed out requesting claim from 
> <137.57.176.180:1047>
> >>>11/23 14:29:45 Sent RELEASE_CLAIM to startd on 
> <137.57.176.180:1047>
> >>>11/23 14:29:45 timed out requesting claim from 
> <137.57.176.180:1047>
> >>>11/23 14:29:45 Sent RELEASE_CLAIM to startd on 
> <137.57.176.180:1047>
> >>>11/23 14:30:02 DaemonCore: Command received via UDP from host 
> >>><137.57.142.51:4146>
> >>>11/23 14:30:02 DaemonCore: received command 60001 
> (DC_PROCESSEXIT), 
> >>>calling handler (HandleProcessExitCommand())
> >>>
> >>>11/23 14:30:05 condor_read(): recv() returned -1, errno = 10054, 
> >>>assuming failure.
> >>>11/23 14:30:05 Response problem from startd.
> >>>11/23 14:30:05 Sent RELEASE_CLAIM to startd on 
> <137.57.176.182:1151>
> >>>11/23 14:30:05 Match record (<137.57.176.182:1151>, 19, 
> 129) deleted
> >>>11/23 14:30:07 Started shadow for job 19.130 on 
> >>>"<137.57.176.182:1151>", (shadow pid = 1036)
> >>>11/23 14:30:07 Sent ad to 1 collectors for gquan@xxxxxxxxxx
> >>>11/23 14:30:07 timed out requesting claim from 
> <137.57.176.180:1047>
> >>>11/23 14:30:08 Sent RELEASE_CLAIM to startd on 
> <137.57.176.180:1047>
> >>>11/23 14:30:08 timed out requesting claim from 
> <137.57.176.180:1047>
> >>>11/23 14:30:08 Sent RELEASE_CLAIM to startd on 
> <137.57.176.180:1047>
> >>>11/23 14:30:13 DaemonCore: Command received via TCP from host 
> >>><137.57.176.182:4778>
> >>>11/23 14:30:13 DaemonCore: received command 443 (VACATE_SERVICE), 
> >>>calling handler (vacate_service)
> >>>11/23 14:30:13 Got VACATE_SERVICE from <137.57.176.182:4778>
> >>>11/23 14:30:13 Sent RELEASE_CLAIM to startd on 
> <137.57.176.182:1151>
> >>>11/23 14:30:13 Match record (<137.57.176.182:1151>, 19, 
> 130) deleted
> >>>11/23 14:30:13 DaemonCore: Command received via UDP from host 
> >>><137.57.142.51:4176>
> >>>11/23 14:30:13 DaemonCore: received command 60001 
> (DC_PROCESSEXIT), 
> >>>calling handler (HandleProcessExitCommand())
> >>>
> >>>11/23 14:30:13 Scheduler::Relinquish - mrec is NULL, can't
> >>
> >>relinquish
> >>
> >>>11/23 14:30:13 Null parameter --- match not deleted
> >>>11/23 14:30:17 Started shadow for job 19.133 on 
> >>>"<137.57.176.182:1151>", (shadow pid = 2300)
> >>>11/23 14:30:17 Sent ad to 1 collectors for gquan@xxxxxxxxxx
> >>>11/23 14:30:17 timed out requesting claim from 
> <137.57.176.180:1047>
> >>>11/23 14:30:17 Sent RELEASE_CLAIM to startd on 
> <137.57.176.180:1047>
> >>>11/23 14:30:17 timed out requesting claim from 
> <137.57.176.180:1047>
> >>>11/23 14:30:17 Sent RELEASE_CLAIM to startd on 
> <137.57.176.180:1047>
> >>>11/23 14:30:42 DaemonCore: Command received via UDP from host 
> >>><137.57.142.51:4190>
> >>>11/23 14:30:42 DaemonCore: received command 60001 
> (DC_PROCESSEXIT), 
> >>>calling handler (HandleProcessExitCommand())
> >>>
> >>>11/23 14:30:45 Started shadow for job 19.130 on 
> >>>"<137.57.176.180:1047>", (shadow pid = 3624)
> >>>11/23 14:30:45 Sent ad to 1 collectors for gquan@xxxxxxxxxx
> >>>11/23 14:30:45 timed out requesting claim from 
> <137.57.176.180:1047>
> >>>11/23 14:30:46 Sent RELEASE_CLAIM to startd on 
> <137.57.176.180:1047>
> >>>11/23 14:30:46 timed out requesting claim from 
> <137.57.176.180:1047>
> >>>11/23 14:30:46 Sent RELEASE_CLAIM to startd on 
> <137.57.176.180:1047>
> >>>11/23 14:30:52 DaemonCore: Command received via TCP from host 
> >>><137.57.176.180:3514>
> >>>11/23 14:30:52 DaemonCore: received command 443 (VACATE_SERVICE), 
> >>>calling handler (vacate_service)
> >>>11/23 14:30:52 Got VACATE_SERVICE from <137.57.176.180:3514>
> >>>11/23 14:30:52 Sent RELEASE_CLAIM to startd on 
> <137.57.176.180:1047>
> >>>11/23 14:30:52 Match record (<137.57.176.180:1047>, 19, 
> 130) deleted
> >>>11/23 14:30:52 DaemonCore: Command received via UDP from host 
> >>><137.57.142.51:4204>
> >>>11/23 14:30:52 DaemonCore: received command 60001 
> (DC_PROCESSEXIT), 
> >>>calling handler (HandleProcessExitCommand())
> >>>
> >>>11/23 14:30:52 Scheduler::Relinquish - mrec is NULL, can't
> >>
> >>relinquish
> >>
> >>>11/23 14:30:52 Null parameter --- match not deleted
> >>>11/23 14:30:55 Response problem from startd.
> >>>11/23 14:30:55 Sent RELEASE_CLAIM to startd on 
> <137.57.176.180:1047>
> >>>11/23 14:30:55 Match record (<137.57.176.180:1047>, 19, 
> 131) deleted
> >>>11/23 14:30:56 Response problem from startd.
> >>>11/23 14:30:56 Sent RELEASE_CLAIM to startd on 
> <137.57.176.185:1407>
> >>>11/23 14:30:56 Match record (<137.57.176.185:1407>, 19, 
> 151) deleted
> >>>11/23 14:30:56 Response problem from startd.
> >>>11/23 14:30:56 Sent RELEASE_CLAIM to startd on 
> <137.57.176.183:4197>
> >>>11/23 14:30:56 Match record (<137.57.176.183:4197>, 19, 
> 147) deleted
> >>>11/23 14:30:56 Response problem from startd.
> >>>11/23 14:30:56 Sent RELEASE_CLAIM to startd on 
> <137.57.176.183:4197>
> >>>11/23 14:30:56 Match record (<137.57.176.183:4197>, 19, 
> 149) deleted
> >>>11/23 14:30:56 Response problem from startd.
> >>>11/23 14:30:56 Sent RELEASE_CLAIM to startd on 
> <137.57.176.185:1407>
> >>>11/23 14:30:56 Match record (<137.57.176.185:1407>, 19, 
> 150) deleted
> >>>11/23 14:30:57 Response problem from startd.
> >>>11/23 14:30:57 Sent RELEASE_CLAIM to startd on 
> <137.57.176.186:2147>
> >>>11/23 14:30:57 Match record (<137.57.176.186:2147>, 19, 
> 155) deleted
> >>>11/23 14:30:57 Started shadow for job 19.130 on 
> >>>"<137.57.176.180:1047>", (shadow pid = 2692)
> >>>*** End of file SchedLog
> >>>
> >>>
> >>>
> >>>-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> >>>Questions about this message or Condor in general?
> >>>Email address of the local Condor administrator: 
> >>>swttcabca@xxxxxxxxxx The Official Condor Homepage is 
> >>>http://www.cs.wisc.edu/condor
> >>>
> >>>
> >>>
> >>>_______________________________________________
> >>>Condor-users mailing list
> >>>Condor-users@xxxxxxxxxxx
> >>>http://lists.cs.wisc.edu/mailman/listinfo/condor-users
> >>>
> >>
> >>_______________________________________________
> >>Condor-users mailing list
> >>Condor-users@xxxxxxxxxxx
> >>http://lists.cs.wisc.edu/mailman/listinfo/condor-users
> >>
> > 
> > 
> > _______________________________________________
> > Condor-users mailing list
> > Condor-users@xxxxxxxxxxx
> > http://lists.cs.wisc.edu/mailman/listinfo/condor-users
> > 
> > 
> 
> --
> ------------------------------------------------------------
> Dr Alain EMPAIN  <alain.empain@xxxxxxxxx> <alain@xxxxxxxxxx>
>        Bioinformatics, Molecular Genetics,
>        Fac. Med. Vet., University of Liège, Belgium
>        Bd de Colonster, B43   B-4000 Liège (Sart-Tilman)
> WORK: +32 4 366 3821  FAX: +32 4 366 4122
> HOME: rue des Martyrs,7  B- 4550 Nandrin
>    +32 85 51 23 41  GSM: +32 497 70 17 64
> --------------------------------------------------------------
> -----------------
> [ Creative Commons ]
> Ne pas confondre 'Piraterie' et 'Partage des connaissances' :
> Faire circuler la connaissance est au coeur même de 
> l'activité de création et d'invention. La connaissance 
> scientifique est basée sur des siècles de partage créatif.
> 'Du bon usage de la piraterie'  F. Latrive (PDF) 
> http://www.freescape.eu.org/piraterie/complet.html
> --------------------------------------------------------------
> -----------------
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> http://lists.cs.wisc.edu/mailman/listinfo/condor-users
>