Michael Thomas wrote: > Once again I started seeing high loads on my gatekeeper due to a large > number of globus-job-manager processes. > [...] After moving all of the user home directories from a NFS mount to a local disk, this no longer seems to be a problem. However, I'm seeing some other odd behaviour that doesn't make sense to me. I have a number of jobs coming through the OSG managed fork queue that seem to get disconnected from the actual process. If I look up the PID for the condor queue id, I notice that the process isn't running anymore. When I look at the condor_q -l output for the job, I notice that the files for RemoteSpoolDir, UserLog, Out, Err all don't exist. Yet condor_q says that the job is still in the Running state. I also see the same symtoms from the occasional grid-monitor job that doesn't exit after an hour (still running after 24 hours). Why would condor think the job is still running when the process is dead? --Mike
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature