PS: condor_q -global -debug prints the queue from the 1st submit host right away, then hangs for 20+ seconds, then prints 06/19/13 11:52:18 condor_read(): timeout reading 5 bytes from schedd at <my.ip:31286>. 06/19/13 11:52:18 IO: Failed to read packet header 06/19/13 11:52:18 SECMAN: no classad from server, failing -- Failed to fetch ads from: <my.ip:31286> : 2nd.submit.host SECMAN:2007:Failed to end classad message. There is a bunch of jobs that got condor_rm'ed but are stuck someplace: ps -AF shows a lot of condor_scheduniv_exec.238602.0 -f -l . -Lockfile moldag3.dag.lock -AutoRescue 1 -DoRescueFrom 0 -Dag moldag3.dag -Suppress_notification -CsdVersion $CondorVersion: 8.0.0 May 29 2013 BuildID: 133173 $ -Dagman /usr/bin/condor_dagman -Update_submit If I stop condor they go away and come back when I start condor. condor_rm returns the same "SECMAN:2007:Failed to end classad message." How do I clean this up? -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
Attachment:
signature.asc
Description: OpenPGP digital signature