Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[HTCondor-users] condor_rm problems
- Date: Fri, 31 Jan 2014 09:33:48 +0100
- From: Pek Daniel <pekdaniel@xxxxxxxxx>
- Subject: [HTCondor-users] condor_rm problems
Hi,
I had ~600 000 jobs in a single schedd queue, and I tried to delete
the jobs with condor_rm -all.
It marked all the jobs for removal, but after a while, this happened:
This is an automated email from the Condor system
on machine "btbeater001.xxx.xx". Do not reply.
"/usr/sbin/condor_schedd" on "btbeater001.xxx.xx" was killed because
it was no longer responding.
Condor will automatically restart this process in 10 seconds.
*** Last 20 line(s) of file /var/log/condor/SchedLog:
01/31/14 08:10:10 (pid:3082203) Sent ad to 1 collectors for xxx@xxxxxx
01/31/14 08:12:31 (pid:3082203) Can't find address for startd btbeater001.xxx.xx
01/31/14 08:12:31 (pid:3082203) Can't find address for negotiator
01/31/14 08:12:31 (pid:3082203) Failed to send RESCHEDULE to unknown daemon:
01/31/14 08:18:41 (pid:3082203) TransferQueueManager stats: active
up=0/10 down=0/10; waiting up=0 down=0; wait time up=0s down=0s
01/31/14 08:18:41 (pid:3082203) TransferQueueManager upload 1m I/O
load: 0 bytes/s 0.000 disk load 0.000 net load
01/31/14 08:18:41 (pid:3082203) TransferQueueManager download 1m I/O
load: 0 bytes/s 0.000 disk load 0.000 net load
01/31/14 08:18:41 (pid:3082203) Sent ad to central manager for xxx@xxxxxx
01/31/14 08:18:41 (pid:3082203) Sent ad to 1 collectors for xxx@xxxxxx
01/31/14 08:20:57 (pid:3082203) Can't find address for startd btbeater001.xxx.xx
01/31/14 08:20:57 (pid:3082203) Can't find address for negotiator
01/31/14 08:20:57 (pid:3082203) Failed to send RESCHEDULE to unknown daemon:
01/31/14 08:36:12 (pid:3082203) TransferQueueManager stats: active
up=0/10 down=0/10; waiting up=0 down=0; wait time up=0s down=0s
01/31/14 08:36:12 (pid:3082203) TransferQueueManager upload 1m I/O
load: 0 bytes/s 0.000 disk load 0.000 net load
01/31/14 08:36:12 (pid:3082203) TransferQueueManager download 1m I/O
load: 0 bytes/s 0.000 disk load 0.000 net load
01/31/14 08:36:12 (pid:3082203) Sent ad to central manager for xxx@xxxxxx
01/31/14 08:36:12 (pid:3082203) Sent ad to 1 collectors for xxx@xxxxxx
01/31/14 08:43:01 (pid:3082203) Can't find address for startd btbeater001.xxx.xx
01/31/14 08:43:01 (pid:3082203) Can't find address for negotiator
01/31/14 08:43:01 (pid:3082203) Failed to send RESCHEDULE to unknown daemon:
*** End of file SchedLog
By the way, is there a rule of thumb for figuring out the number of
jobs a single schedd can safely take care of? For example, if I have
the peak value of the queued jobs in the system as an input, how can I
calculate the number of neccessary schedd (knowing the hardware
available) for reliably serve that amount of jobs?
Thanks,
Daniel