Hi Group,
My company has reach a point where we needed a queue management system like Condor, flow tracer, LSF.
I have convinced most of the developers in the VLSI group that condor is the right application for them.
After implementing Condor in the Research Department cluster, my users were truly satisfied with the results.
However, trying to implement this in the VLSI cluster didn't go so well.
I would like to share with you some of my problems, so maybe someone have an idea how to achieve what I need:
1. The cluster include 20 machines with 24 core each. so total of 480 cores.
2. Each machine has 24GB of RAM.
3. All machines are connected to a NetApp File Server over NFS.
4. All machines are running RHES 6.0 and belong to the same UID domain.
Now,
My users would like to have the cluster managing there jobs as followed:
They would like to have two kind of jobs:
1. Jobs that run right away when submitted
2. Jobs that run in certion scenarios (more below)
However all jobs is depend on a FlexLM license (Matlab, Synopsys VCS etc....)
So say I have 100 licenses of Matlab, and I want to share the licenses in a specific way based on the type of jobs so I would have the following:
1. When users submit a job (that can divided into 400 jobs) I would like him to limit the number of parallel jobs ( so he will not get all the licenses and will leave some for other users)
Setting concurrency limits is not a good option here, since it is a global definition and not per user. It is true that this can limit the number of parallel jobs (if the limit is reach) but it cannot prevent from user to get all the license available.
I can divided the concurrency limits to be concurrency limits_A concurrency limits_B concurrency limits_C etc.. and split the licenses, but this will prevent from the system to use all the available licenses. So it can be that concurrency limits_A has reach is limit but concurrency limits_B is free.
2. I want that some jobs will run only if the FlexLM license has minimum 10 free licenses not in use. this will insure that real time jobs will start once they are submitted cause they have a free lic. I don't know how to achieve this.
Maybe someone can help here...
Thanks
Sassy