I've installed condor-7.2.3 on Windows Server 2008 HPC edition 64bit on a dual core, dual processor system. (condor pool contain only this single system).
The submitted job keep in idle state and never turn into running state.
E:\condor723\mpi-test>condor_q -analyze
-- Submitter:
master.hpc.com : <
10.129.150.44:49193> :
master.hpc.com
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
---
001.000: Run analysis summary. Of 4 machines,
0 are rejected by your job's requirements
0 reject your job because of their own requirements
0 match but are serving users with a better priority in the pool
4 match but reject the job for unknown reasons
0 match but will not currently preempt their existing job
0 are available to run your job
Last successful match: Thu May 14 13:15:38 2009
1 jobs; 1 idle, 0 running, 0 held
E:\condor723\mpi-test>type log_ansys
000 (001.000.000) 05/14 13:10:38 Job submitted from host: <
10.129.150.44:49193>
...
022 (001.000.000) 05/14 13:10:39 Job disconnected, attempting to reconnect
Socket between submit and execute hosts closed unexpectedly
Trying to reconnect to
slot1@xxxxxxxxxxxxxx <
10.129.150.44:49194>
...
024 (001.000.000) 05/14 13:10:39 Job reconnection failed
Job not found at execution machine
Can not reconnect to
slot1@xxxxxxxxxxxxxx, rescheduling job
...
022 (001.000.000) 05/14 13:15:39 Job disconnected, attempting to reconnect
Socket between submit and execute hosts closed unexpectedly
Trying to reconnect to
slot1@xxxxxxxxxxxxxx <
10.129.150.44:49194>
...
024 (001.000.000) 05/14 13:15:39 Job reconnection failed
Job not found at execution machine
Can not reconnect to
slot1@xxxxxxxxxxxxxx, rescheduling job
...
Is condor tested on 64bit Wondows Sytems?
On Sat, May 2, 2009 at 11:50 AM, Sangamesh B
<forum.san@xxxxxxxxx> wrote:
Dear all,
Condor-7.0.5 - central manager is installed on Windows XP 32bit (single core machine) and execution machine on Win Server 2008 64bit HPC Edition (dual core, dual processor = total 4 cores). The job is submitted from master node, and should run on hpc server 2008. But its failing with following error:
E:\condor705\con-mpi-test\sleep-test1>type log
000 (080.000.000) 05/02 11:29:40 Job submitted from host: <
10.129.150.82:1043>
...
022 (080.000.000) 05/02 11:29:55 Job disconnected, attempting to reconnect
Socket between submit and execute hosts closed unexpectedly
Trying to reconnect to
slot1@xxxxxxxxxxxxxxx <
10.129.150.44:56466>
...
024 (080.000.000) 05/02 11:30:00 Job reconnection failed
Job not found at execution machine
Can not reconnect to
slot1@xxxxxxxxxxxxxxx, rescheduling job
...
022 (080.000.000) 05/02 11:34:46 Job disconnected, attempting to reconnect
Socket between submit and execute hosts closed unexpectedly
Trying to reconnect to
slot1@xxxxxxxxxxxxxxx <
10.129.150.44:56466>
...
024 (080.000.000) 05/02 11:34:46 Job reconnection failed
Job not found at execution machine
Can not reconnect to
slot1@xxxxxxxxxxxxxxx, rescheduling job
...
E:\condor705\con-mpi-test\sleep-test1>
-- Submitter: support-2 : <
10.129.150.82:1043> : support-2
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
---
080.000: Run analysis summary. Of 5 machines,
1 are rejected by your job's requirements
0 reject your job because of their own requirements
0 match but are serving users with a better priority in the pool
4 match but reject the job for unknown reasons
0 match but will not currently preempt their existing job
0 are available to run your job
Last successful match: Sat May 02 11:34:41 2009
1 jobs; 1 idle, 0 running, 0 held
E:\condor705\con-mpi-test\sleep-test1>
Any hint, why its not able to connect?
But, it works for other 32 bit XP systems.
Thanks in advance..