Cheers,
Tim
On Tue, 2010-06-29 at 11:57 +0100, Alan wrote:
Sounds like a similar issue reported here:
http://www.escience.cam.ac.uk/projects/camgrid/upgrade.html
Alan
On Tue, Jun 29, 2010 at 10:47, Diana Lousa <dlousa@xxxxxxxxxxx> wrote:
Hello,
We have installed condor version 7.4.2 in a cluster composed
of machines with Fedora and Ubuntu 10.04 OS. Our installation
is in shared directories and we have different binaries for
Fedora and Ubuntu
(condor-7.4.2-linux-x86-rhel3-dynamic and
condor-7.4.2-linux-x86-debian50-dynamic, respectively). We
also have the home dir of condor and the configuration files
in a shared directory. The local dir of our central
manager/dedictaed sched id in a local directory and for all
the other machines it is in a shared directory. We have been
experiencing some serious problems:
1- The condor_submit command gets hung:
Sometimes when I submit jobs, condor_submit gets stuck,
althoug the job enters the queue, the command doesn't stop and
I have to kill it with ctrl+c
2. Jobs return to Idle state and can't be removed:
One of our users has jobs that return to the Idle state after
they terminate or die. He then tries to remove these jobs from
the queue, but that action causes condor to go crazy. Condor_q
stops responding and shows the message:
-- Failed to fetch ads from: <192.168.127.3:39790> :
zyon.itqb.unl.pt
and then all the jobs die.
It is worth pointing out that everything works fine when we
use an older version of condor (6.8.4) in our central
manager/dedicated sched. However, we only have Fedora binaries
for these version and these means that we can not run this
version in a machine with Ubuntu (due to libraries
incompatibility) and our goal is to have a machine with Ubuntu
10.04 as central manager/dedicated sched..
Can anyone help?