Subject: [Condor-users] Local PVM process dies with status 0x6e00
Hi everyone,
I am having this problem and can't figure out how to solve it. When
using the PVM Universe, the shadow process at the submitting machine
always dies with status 0x6e00 before any execution of the task on the
matched machine starts..This results in the matched machine not
executing the task..I have been searching through the achieve but can't
get any answers.. Attached are my log files.
6/14 08:06:42 (pid:4883) IO: Failed to read packet header
6/14 08:06:42 (pid:4883) DaemonCore: Command received via UDP from host <192.168.1.102:36529>
6/14 08:06:42 (pid:4883) DaemonCore: received command 421 (RESCHEDULE), calling handler (reschedule_negotiator)
6/14 08:06:42 (pid:4883) Sent ad to central manager for condor@xxxxxxxxxxx
6/14 08:06:42 (pid:4883) Sent ad to 1 collectors for condor@xxxxxxxxxxx
6/14 08:06:42 (pid:4883) Called reschedule_negotiator()
6/14 08:06:42 (pid:4883) Activity on stashed negotiator socket
6/14 08:06:42 (pid:4883) Negotiating for owner: condor@xxxxxxxxxxx
6/14 08:06:42 (pid:4883) Checking consistency running and runnable jobs
6/14 08:06:42 (pid:4883) Tables are consistent
6/14 08:06:42 (pid:4883) Out of jobs - 1 jobs matched, 0 jobs idle, flock level = 0
6/14 08:06:44 (pid:4883) About to Create_Process(
/home/condor/condor-install/sbin/condor_shadow.pvm, condor_shadow.pvm
<192.168.1.102:33041>, ... )
6/14 08:06:44 (pid:4883) In parent, shadow pid = 19812
6/14 08:06:44 (pid:4883) Starting add_shadow_birthdate(28.0)
6/14 08:06:44 (pid:4883) shadow_fd = 12
6/14 08:06:44 (pid:4883) Sending job 28.0 to shadow pid 19812
6/14 08:06:44 (pid:4883) First Line: 28 0 1
6/14 08:06:44 (pid:4883) sending <192.168.1.103:32973> <192.168.1.103:32973>#1150182634#38 0 wolf3
6/14 08:06:45 (pid:4883) IO: Failed to read packet header
6/14 08:06:45 (pid:4883) IO: Failed to read packet header
6/14 08:06:45 (pid:4883) IO: Failed to read packet header
6/14 08:06:45 (pid:4883) IO: Failed to read packet header
6/14 08:06:45 (pid:4883) IO: Failed to read packet header
6/14 08:06:45 (pid:4883) DaemonCore: Command received via TCP from host <192.168.1.103:33082>
6/14 08:06:45 (pid:4883) DaemonCore: received command 443 (VACATE_SERVICE), calling handler (vacate_service)
6/14 08:06:45 (pid:4883) Got VACATE_SERVICE from <192.168.1.103:33082>
6/14 08:06:45 (pid:4883) Sent RELEASE_CLAIM to startd on <192.168.1.103:32973>
6/14 08:06:45 (pid:4883) Match record (<192.168.1.103:32973>, 28, 0) deleted
6/14 08:06:45 (pid:4883) Shadow pid 19812 for job 28.0 exited with status 100
6/14 08:06:47 (pid:4883) Sent owner (0 jobs) ad to 1 collectors
6/14 08:06:48 (pid:4883) IO: Failed to read packet header
6/14 09:05:28 (pid:4883) Cleaning job queue...
It seems at some point the Shadow process dies and in turn the matched machine is released before any execution takes place..
Please look through and help..