Hi everyone, I have, on one 64-core centos 6/condor-8.6.13-1.el6.x86_64 worker machine: > # ps -AF | grep condor > condor 3920 1 0 14180 3744 18 Mar14 ? 00:00:31 condor_master -pidfile /var/run/condor/condor_master.pid > root 4096 3920 0 8202 10592 9 Mar14 ? 03:36:19 condor_procd -A /var/run/condor/procd_pipe -L /var/log/condor/ProcLog -R 1000000 -S 60 -C 501 > condor 4097 3920 0 14149 3792 9 Mar14 ? 00:01:06 condor_shared_port -f > condor 4105 3920 0 16260 12624 8 Mar14 ? 05:40:03 condor_startd -f > condor 4130 3920 0 19245 5568 14 Mar14 ? 00:01:06 condor_schedd -f > condor 651153 4105 0 16517 4188 15 Apr03 ? 00:00:07 condor_starter -f -a slot1_2 exocet.bmrb.wisc.edu > bbee 651156 651153 0 16515 1840 32 Apr03 ? 00:00:00 condor_starter -f -a slot1_2 exocet.bmrb.wisc.edu > bbee 651157 651156 0 21411 2280 0 Apr03 ? 00:00:33 /usr/libexec/condor/curl_plugin http://proxy.chtc.wisc.edu/SQUID/bmrb/3.8/combined.tgz.enc /var/lib/condor/execute/dir_651153/combined.tgz.enc ... > condor 661937 4105 0 16517 4188 0 Apr04 ? 00:00:06 condor_starter -f -a slot1_64 exocet.bmrb.wisc.edu > bbee 661940 661937 0 16515 1844 16 Apr04 ? 00:00:00 condor_starter -f -a slot1_64 exocet.bmrb.wisc.edu > bbee 661941 661940 0 21411 2276 42 Apr04 ? 00:00:36 /usr/libexec/condor/curl_plugin http://proxy.chtc.wisc.edu/SQUID/bmrb/3.8/combined.tgz.enc /var/lib/condor/execute/dir_661937/combined.tgz.enc i.e. all 64 slots have been hanging for a couple of weeks waiting for a file. First question: is there an easy way to see what state condor thinks a job is in, based on its PID? Although in this case, based on execute host will work as they're all stuck. Second question: is there a way to set a timeout on curl-plugin transfers? As distinct from the overall periodic_remove? Also is this plugin-specific as there is no FILE_TRANSFER_QUEUE_AGE or anything we've changed for these jobs, and 2 weeks is way more than the default? TIA -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
Attachment:
signature.asc
Description: OpenPGP digital signature