Hello, I have installed Condor 6.8.6 few weeks ago (so I am still new to condor). We are running Condor on small pool of 6 machines. One of them is central manager, submit, scheduler (also acts as dedicated scheduler) and execute machine. The rest of the pool are execute machines (configured as dedicated resources). Execute machines are 4-core machines (2xdual-core CPUs). We are experiencing 2 problems with parallel jobs
submissions. 1 ) I submit job1 which requires 4 CPU on, say, node1. After some time it is executed. Then I submit job2 witch again requires 4 CPUs on
node1. This one stays in idle state, because no more CPUs are
available on node1. As last I submit a job3 to node2. The strange is that this job stays idle until job2 is executed. But because node2 is free I do not see a reason why it should stay idle and wait for job2. It looks like the job queue for parallel universe is processed strictly in FIFO policy. Is this normal behavior for
parallel universe or am I missing something? Note: In vanilla universe job management work as expected – the job3 will be executed right after submission. 2) After the job for parallel universe is submitted to queue it stays idle for some time. Sometimes it is executed in 10s of seconds, sometimes in few minutes. We usually use condor_reschedule, which helps to execute the job (at least we think it helps). The jobs for vanilla universe are executed right after they are submitted (assuming there are free CPUs to run the job). Is this normal behavior of parallel universe or is it just due to configuration of Condor? If it is configuration, how can I change it? If you need some configuration files, log files or whatever, just tell me, I will send it. Thanks in advance for any help or suggestion. Cheers, Martin |