All - I have a simulation that starts running on Condor and
reads in all the data files just fine. But once it starts executing, it issues
a "SIGQUIT" and condor thinks it's done. If I run this by
hand on cf02012 (one of our Condor boxes), it will run just fine. Nothing in
the error file or log file to indicate that something's wrong. Tec232 is our "submit only" machine, cf02012 is
the box it tries to run on. We just upgraded to RHEL 4.0 and Condor 6-6-10. we've
been running Condor for almost a year now, with great results. Similar Sims have run just fine. There is just something quirky
about this one. Any ideas would be a tremendous help. Thanks, Jim tec232 Shadow log 8/18 11:18:58 (884.0)
(20447): Request to run on <172.31.2.12:33355> was ACCEPTED 8/18 11:19:06 (884.0)
(20447): DaemonCore: PERMISSION DENIED to unknown user from host
<172.31.2.12:33478> for command 71000 (SHADOW_UPDATEINFO) 8/18 11:39:06 (884.0)
(20447): DaemonCore: PERMISSION DENIED to unknown user from host
<172.31.2.12:33478> for command 71000 (SHADOW_UPDATEINFO) 8/18 11:55:25 (884.0)
(20447): Job 884.0 terminated: exited with status 0 8/18 11:55:25 (884.0)
(20447): **** condor_shadow (condor_SHADOW) EXITING WITH STATUS 100 tec232 Schedlog 8/18 11:18:57 Started shadow
for job 884.0 on "<172.31.2.12:33355>", (shadow pid = 20447) 8/18 11:18:59 Sent ad to
central manager for solis@xxxxxxxxxxxxxxxxxxxxxxxxxxxx 8/18 11:23:59 Sent ad to central
manager for solis@xxxxxxxxxxxxxxxxxxxxxxxxxxxx 8/18 11:28:59 Sent ad to
central manager for solis@xxxxxxxxxxxxxxxxxxxxxxxxxxxx 8/18 11:33:59 Sent ad to
central manager for solis@xxxxxxxxxxxxxxxxxxxxxxxxxxxx 8/18 11:38:59 Sent ad to
central manager for solis@xxxxxxxxxxxxxxxxxxxxxxxxxxxx 8/18 11:43:59 Sent ad to
central manager for solis@xxxxxxxxxxxxxxxxxxxxxxxxxxxx 8/18 11:48:59 Sent ad to
central manager for solis@xxxxxxxxxxxxxxxxxxxxxxxxxxxx 8/18 11:53:59 Sent ad to
central manager for solis@xxxxxxxxxxxxxxxxxxxxxxxxxxxx 8/18 11:55:25 Shadow pid
20447 for job 884.0 exited with status 100 cf02012 StartLog 8/18 11:55:23 Failed to
obtain keyboard or mouse idle information. 8/18 11:55:23 Assuming the
keyboard and mouse to be infinitely idle. 8/18 11:55:25 DaemonCore:
Command received via TCP from host <192.56.136.232:45119> 8/18 11:55:25 DaemonCore:
received command 404 (DEACTIVATE_CLAIM_FORCIBLY), calling handler
(command_handler) 8/18 11:55:25 vm1: Called
deactivate_claim_forcibly() 8/18 11:55:25 Starter pid
7409 exited with status 0 cf02012 SterterLog.vm1 8/18 11:18:58 Starting a
VANILLA universe job with ID: 884.0 8/18 11:18:58 IWD:
/home/wbs/studies/pmma/scenario/CS20/runs/PMMA-Production 8/18 11:18:58 Error file:
/home/wbs/studies/pmma/scenario/CS20/output/cs20-bc2.uav/error.cs20-bc2 8/18 11:18:58 About to exec
/home/wbs/studies/pmma/scenario/CS20/runs/PMMA-Production/cmd.uav cs20-bc2.uav
01 8/18 11:18:58 Create_Process
succeeded, pid=7412 8/18 11:55:25 Process
exited, pid=7412, status=0 8/18 11:55:25 Got
SIGQUIT. Performing fast shutdown. 8/18 11:55:25 ShutdownFast
all jobs. 8/18 11:55:25 ****
condor_starter (condor_STARTER) EXITING WITH STATUS 0 |