Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Backfill on an OpenStack system

Date: Mon, 04 Oct 2021 07:20:42 -0700 (PDT)
From: Marcus Ebert <mebert@xxxxxxx>
Subject: Re: [HTCondor-users] Backfill on an OpenStack system

Hi Matt,

Sorry, I only now came across this thread.

Let me add one possibility on how to do what you want to the ways on howto do it shown by others:

At the University of Victoria we developed for the reason you mentionedCloudscheduler. What it does is that it looks to an HTCondor instance andif there are jobs in there determines the job resource requests and thenstarts a VM with enough resources on a cloud; if there are no more jobsthen the VMs get terminated.In this system, jobs see a normal batch system and and the worker nodesare started on demand by Cloudscheduler; opportunistic usage is possiblein a way that when there are VMs started outside of Cloudscheduler and thetotal core usage in the cloud project is above a configured limit, thenCloudscheduler will automatically retire the VMs (lets jobs finish butdoesn't allow HTCondor to start new jobs) and terminates those VMs once nomore jobs are running on it. Cloudscheduler is fully accessible via webinterface as well as cli.

For reference:
https://link.springer.com/epdf/10.1007/s41781-020-0036-1

(in full support and developing still new features, so some informationfrom the publication may have changes since then)It's Opensource and we would be happy to assist anyone in setting up anown instance:

https://github.com/hep-gc/cloudscheduler

We run this successfully since many years using Openstack systems aroundthe world and also commercial clouds like Amazon. Jobs we run are mostlyfor Physic's experiments like Atlas, Belle-II, and Dune, and we also runthe Cloudscheduler instance as a service for others which then onlyprovide their own HTCondor instance if wanted.


Cheers,
  Marcus

On Sat, 4 Sep 2021, West, Matthew wrote:

Hi All,

Here at Exeter, IT is setting up an OpenStack system to support researchers who want DRAM heavy bespoke workstation-like environments. Because I don't expect the system to be full up with active users 24/7, I am wondering what the optimal way to setup an HTCondor pool on it to run jobs as backfill. Would this be similar to how you would do it for any other spare resources: have a VM start up on a node and announce itself to the collector daemon as an available worker if idle conditions of the machine are met?

It reminds me of the method to expand one's resources into corporate cloud servers but I am not sure what tools are useful in this case.

Cheers,
Matt

Prev by Date: Re: [HTCondor-users] Backfill on an OpenStack system
Next by Date: [HTCondor-users] STARTD_ATTRS when CONURRENCY_LIMIT defined
Previous by thread: Re: [HTCondor-users] Backfill on an OpenStack system
Next by thread: [HTCondor-users] STARTD_ATTRS when CONURRENCY_LIMIT defined
Index(es):
- Date
- Thread

Mailing List Archives

Authenticated access

Re: [HTCondor-users] Backfill on an OpenStack system