HTCondor Project List Archives



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-devel] Jobs lying idle when machines are available.



Hi all,
  While experimenting with Condor on our cluster, we encountered the
following problem.

After we submit a bunch of jobs, Condor is able to schedule a few of them
but for the others we get:

100 match but reject the job for unknown reasons.

There are machines lying idle on the cluster not running anything but Condor
does not schedule jobs on them. Also, all the jobs are similar in nature and
so are the machines. Meaning that if one job matches a  machine so should
the other. 
Also, this happens every 2nd or 3rd time we submit a bunch of jobs to the
cluster. Sometimes it does run all the jobs in the queue.

After taking a look at the Negotiator logs I saw, this 

Attempting to use cached MatchList: Failed (MatchList length: 0,
Autocluster: 0, Schedd Name: ****, Schedd Address: *****)
11/8 22:40:43       Rejected jobid schedd_name schedd_ip : no match found

The above is for the jobs that condor_q -analyze says " match but reject the
job for unknown reasons".

Could this be possible due to some misconfiguration? 
Is there anyway to debug why this would be happening? Any tools to find out
why a job is not matching a specific startd? It would be great if someone
could point me to some debugging tools for this.

Thanks 
Mahadev