[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] MPI Jobs strange problem



Hi,

Greg Thain wrote:

Chaitanya V. Hazarey wrote:

Hi all,


and when I submit 100 jobs 1 seems to run and the rest sit idle there.


I assume you have enough machines to run these jobs? Is it always the same execute machines that are selected to run on? After the first one completes, does the subsequent one start?

-greg


Yes the machines are unclaimed. I have put a variable REQUEST_CLAIM_TIMEOUT = 10 to make sure that the machines get free. This does happen with the jobs with multiple machines. But also with jobs which require single machine. It happens to all the jobs I submit. The starngest thing is that, this problem occured all of a sudden, Without any reason. Atleast it seems so. The jobs were executing perfectly before, but now no matter what they do not.


The other thing is that all the machines 9 of them are all execute and submit machines too with 1 Condor Master. And all of them are in TESTING_MODE configuraton to be able to run the jobs all the time.


Chaitanya V. Hazarey



----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Lab C-212
School of Technology and Computer Science Email : cvh@xxxxxxxxxxxxxxxx
Tata Institute of Fundamental Research Phone : 022 - 22782550
Colaba, Homi Bhabha Road Mobile : 9869360938
Mumbai, Maharashtra, 400005 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------