Subject: [Condor-users] Error with executing simple job via Condor
I wrote a simple python executable to
submit with Condor. The job is submitted and these jobs' state change to
run for a second or so but then change to idle. If I wait, they are resubmitted,
change to Run state but they never run. The log files for each queue never
have content. Based on the shadowlog, I get errno
= 10054, which means a socket was closed. All of our machines are windows
xp including the central manager. As you can tell from the log, we are
using NTSSPI and SSL. When I run condor_Status everything looks fine with
regard to see cores/slots, claimed and unclaimed machines. I am not seeing
any errors in the masterlog and as far as I can tell everything looks ok.
Does anyone have any ideas of what
might be causing this. We first set up condor without ssl and did not have
any issues and now we are working on a more secured system, which is likely
causing the problems. This might not be related, but we also had our CM
routed through a 100MB switch, while our network is 1GB. The CM was not
working and we still cannot see two machines on this 100MB router. However,
once we moved the CM off the 100MB router we were able to see all machines
in our pool (currently we are testing and working out the configuration
and therefore only have about 6 machines in our pool).
Thank you,
Mike
When I run the following command I get:
condor_q -analyze 88
088.009: Run analysis summary.
Of 10 machines,
0 are rejected
by your job's requirements
0 reject your
job because of their own requirements
6 match but are
serving users with a better priority in the pool
4 match but reject
the job for unknown reasons
0 match but will
not currently preempt their existing job
0 match but are
currently offline
0 are available
to run your job
Last successful
match: Wed Apr 28 07:39:24 2010