[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Some jobs of batch stays in idle state for longer time



Hi Vikrant.
I'm glad that I'm not the only one.

If you will turn on debug you will see that the negotiator has a cache of job list which is probably not accurate. the submitter will refresh the job list when you hold and release or submit a new job. 

We should help the condor team to find the reason. 

I have not seen this recently. 

Thanks 
David 



Get Outlook for Android


From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Vikrant Aggarwal <ervikrant06@xxxxxxxxx>
Sent: Monday, November 7, 2022, 18:03
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] Some jobs of batch stays in idle state for longer time

Hi Thomas, 

Yes we did, it was showing available cores, though, we didn't try with a particular machine. 

I am suspecting that sometimes leftover jobs from the batch take a lot of time to get matched, One hacky solution works most of the time: holding/releasing the idle jobs makes them match quickly. 

Thanks & Regards,
Vikrant Aggarwal


On Mon, Nov 7, 2022 at 8:25 PM Thomas Hartmann <thomas.hartmann@xxxxxxx> wrote:
Hi Vikrant,

have you checked for more matching details with
   > condor_q -better-analyze job.id
that should give a bit more details

vice versa, with -reverse-analyze/-better-analyze:reverse you can
compare a machine/slot against a job's requirements.

Cheers,
   Thomas

On 07/11/2022 15.08, Vikrant Aggarwal wrote:
> Hello Experts,
>
> We have seen issues where some jobs of the batch stays in idle status, We are using scheduler level splitting.
>
> Let's say a batch of 300 jobs is submitted, cores are available to do match making of let's say 280-290 jobs, negotiator do the match making of 280 jobs, 20 jobs stay in idle status even when the cores are available in the cluster but if we submit new batch it's getting scheduled immediately.
>
> IsHTcondor  also considering the time spent by the job in the queue before scheduling, maybe it's considering new jobs more quickly?
>
> Time spent by jobs in queue sometimes goes upto 45 mins despite having cores available in cluster.
>
> Master logs only show no match found for 20 idle jobs but it can find resources for new jobs.
>
>
> Thanks & Regards,
> Vikrant Aggarwal
>
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/