[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] htcondor-ce BDII



Hi Stephen,

Laurence Field(CC'ed) is one of the authors of the BDII provider and I suppose I'm the maintainer of HTCondor-CE as a whole.

Feel free to submit a PR (the file can be found here) and we can discuss the specifics there.

- Brian

On 2/14/19 10:53 AM, Stephen Jones wrote:
Hi Brain,

Thanks for the clarification.

Would you please read the snippet of text below. It tells about a problem with the way CPU counts are calculated in the current version (3.2) of that BDII code. I've had to fix it.

Can you tell me who maintains that BDII; perhaps they want to include this?

Cheers,

Ste

--- WHAT I WROTE ABOUT THE BDII ---


     Fix the BDII provider to use real slots provided, not just
     detected CPUS

The BDII provider, /var/lib/bdii/gip/provider/htcondor-ce-provider, that comes with the system is a bit naive. It just says that if a node that has, say, 16 possible hyperthreads (cores * 2, what have you) then it must have 16 job slots on it. This is not so. Some of our 16 hyperthread nodes can only have max 12 jobs running, due to memory.

To fix this in a set-up like ours, with partitionable slots, this patch is used (see below). What does this mean? It means Iâm counting up only the slots with ID = 1. This tells me the allocation, which the right logic for this job.

To put in the patch, around line 129 of /var/lib/bdii/gip/provider/htcondor-ce-provider, change this constraint 'State=!="Owner"' to 'SlotTypeID == 1'. And, from line 129 to line 145, change all âDetectedCpusâ to âTotalSlotCpusâ. You will then get the right counts in the BDII output.





On 14/02/2019 16:21, Brian Lin wrote:
Stephen,

Sorry, I forgot to address that earlier! The version of HTCondor-CE in
the 8.8.x and 8.9.x HTCondor repositories will include the BDII package.

- Brian

On 2/14/19 10:18 AM, Stephen Jones wrote:
Hi Brian,

The trouble is that we might be going to have quite a few sites taking
on HTCondor-CE in the UK, since the APEL accounting is now ready.

https://twikiai07.cern.ch/twiki/bin/view/LCG/HtCondorCeAccounting

And CREAM will be gone in under two years.

For the time being, we will want that htcondor-ce-bdii package.

Will these 8.8.x and 8.9.x repositories have the full suite, including
the bdii, or will we still need to make it from the source code?

Cheers,

Ste


On 14/02/2019 15:32, Brian Lin wrote:
Hi Stephen,

We don't ship htcondor-ce-bdii in the OSG repositories since BDII isn't
used in the OSG. However, we do ship HTCondor-CE in the HTCondor
repositories but there have been some build inefficiencies that caused a
hiccup in getting the latest versions into our yum repositories. We
should have HTCondor-CE 3.2.1 available in the 8.8.x and 8.9.x
repositories either today or tomorrow!

- Brian

On 2/14/19 8:09 AM, Stephen Jones wrote:
Hi,

I'm using HTCondor-CE here at our T2 grid site. I did not find
htcondor-ce-bdii-3.2.0-1.el7.centos.noarch.rpm in the OSG repo, e.g.

http://repo.opensciencegrid.org/osg/3.2/el7/development/x86_64/

Hence I had to make it from scratch (please see below.)

This took a lot of time. I'm sure there must be an easier way to get
hold of htcondor-ce-bdii-3.2.0-1.el7.centos.noarch.rpm

Does anybody know how this product is managed?

Cheers,

Ste

------------------------------------------------


     Making the HTCondor-CE RPMs (to get the BDII rpm)

Ordinarily, once might expect the htcondor-ce-bdii rpm to be in the
OSG release repositories, for example:

http://repo.opensciencegrid.org/osg/3.2/el7/development/x86_64/

Although the htcondor-ce components are there, the bdii component is
not. So since the HTCondor-CE BDII rpm is not available, I made the
whole system, from source. Here are my notes for doing so. BTW: It
might be worth checking at OSG, CERN or elsewhere to see if the BDII
RPM is now part of the release. If so, you can get them directly from
one of those places. Anyway, this is the run down to make your own.

On some CentOS7.n system software development system, make a user
called rpmuser and install the rpm, boost-devel and cmake packages.

Clone the htcondor-ce git repo.

# mkdir dev
# cd dev
# git clonehttps://github.com/opensciencegrid/htcondor-ce

Find the commit tag associated with 3.2 (Note: It is
a9c1104febcbaf20a8380284d9d1213eb504afa5)

# cd htcondor-ce/
# git log > /tmp/log
# vi /tmp/log

Prepare the source material;

# cd ..
# mv htcondor-ce/ htcondor-ce-3.2.0
# tar -cvf htcondor-ce-3.2.0.tar htcondor-ce-3.2.0/
# gzip htcondor-ce-3.2.0.tar
# cp htcondor-ce-3.2.0.tar.gz ~/rpmbuild/SOURCES/
# cp htcondor-ce-3.2.0/rpm/htcondor-ce.spec ~/rpmbuild/SPECS/

Make the RPMs

# cd ~/rpmbuild/SPECS/
# rpmbuild -ba htcondor-ce.spec

The rpms will wind up in ~/rpmbuild/RPMS/noarch/htcondor-ce-*

# ls -rt  ~/rpmbuild/RPMS/noarch/htcondor-ce-*
htcondor-ce-3.2.0-1.el7.centos.noarch.rpm
htcondor-ce-bdii-3.2.0-1.el7.centos.noarch.rpm
htcondor-ce-view-3.2.0-1.el7.centos.noarch.rpm
htcondor-ce-condor-3.2.0-1.el7.centos.noarch.rpm
htcondor-ce-pbs-3.2.0-1.el7.centos.noarch.rpm
htcondor-ce-lsf-3.2.0-1.el7.centos.noarch.rpm
htcondor-ce-sge-3.2.0-1.el7.centos.noarch.rpm
htcondor-ce-slurm-3.2.0-1.el7.centos.noarch.rpm
htcondor-ce-bosco-3.2.0-1.el7.centos.noarch.rpm
htcondor-ce-client-3.2.0-1.el7.centos.noarch.rpm
htcondor-ce-collector-3.2.0-1.el7.centos.noarch.rpm

Put these in the local repo.

Special note: whenever you put an rpm into the local repo, you have to
run this command in that directory.

# createrepo .