Hi Steve,
In Pegasus we are dealing with this by adding a parameter in CErequirements and parsing it on the remote site in the slurm_local_submit_attributes.sh
I have attached a .sub file and this is the link to the pegasus slurm_local_submit_attributes.sh: https://github.com/pegasus-isi/pegasus/blob/master/share/pegasus/htcondor/glite/slurm_local_submit_attributes.sh
On the remote_cequirements section of the .sub file there is a parameter called "EXTRA_ARGUMENTS", with multiple slurm batch commands. If you check Pegasus' slurm local attributes, towards the end of the script is the section
parsing this.
You can check out Pegasus online docs for more information https://pegasus.isi.edu/documentation/glite.php
Regards,
George
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Steven C Timm <timm@xxxxxxxx>
Sent: Tuesday, June 11, 2019 10:06:26 AM To: HTCondor-Users Mail List Subject: Re: [HTCondor-users] Using BLAH to submit/monitor/handle jobs to different slurm clusters (Cross-Cluster Operations) We (Fermilab) have been doing this in an ad-hoc way for a while at NERSC by installing multiple different bosco clients in different sub directories on the NERSC side and passing an extra argument to tell bosco to use a different subdirectory.. this is how we kept Cori and Edison separate and also how we differentiate between the "KNL" and "Haswell" nodes of Cori as well.
We would much prefer a less hacky way to do things. Really the most general would be to be able to push an arbitrary line of SLURM batch commands into the final slurm submission file. There are some features on bosco already to pass through some of the slurm parameters (particularly memory and node count) but we haven't had time to test them yet.
Steve Timm
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Jaime Frey <jfrey@xxxxxxxxxxx>
Sent: Tuesday, June 11, 2019 11:52:22 AM To: HTCondor-Users Mail List Subject: Re: [HTCondor-users] Using BLAH to submit/monitor/handle jobs to different slurm clusters (Cross-Cluster Operations)
It looks like we can add support for multiple Slurm clusters fairly easily. We are beginning work on this to be included in an upcoming release. If anyone on this list is interested in this feature, let us know.
Thanks and regards,
Jaime Frey
UW-Madison HTCondor Project
|
###################################################################### # PEGASUS WMS GENERATED SUBMIT FILE # DAG : namd_wf, Index = 0, Count = 1 # SUBMIT FILE NAME : namd2_ID0000001.sub ###################################################################### stream_error = false stream_output = false environment = "" +remote_cerequirements = JOBNAME=="namd2ID0000001" && PASSENV==1 && CORES=="32" && WALLTIME=="00:20:00" && PROJECT=="m2187" && PRIORITY==20 && EXTRA_ARGUMENTS=="--qos realtime --constraint=haswell --licenses=SCRATCH --mail-type=ALL --mail-user=georgpap@xxxxxxx --exclusive" +remote_environment = "PEGASUS_HOME=/global/common/software/m2187/pegasus/pegasus-4.9.2dev CONDOR_JOBID=$(cluster).$(process) PEGASUS_WF_UUID=35eb9d14-6f86-416a-a78d-4ac8e2b04750 PEGASUS_WF_LABEL=namd_wf PEGASUS_DAG_JOB_ID=namd2_ID0000001 PEGASUS_SITE=cori PEGASUS_RUNTIME=1200 PEGASUS_CORES=32 PEGASUS_PROJECT=m2187" copy_to_spool = false error = /home/georgpap/GitHub/papajim/sns-namd-shifter-example/submit/georgpap/pegasus/namd_wf/run0002//00/00/namd2_ID0000001.err executable = /home/georgpap/GitHub/papajim/sns-namd-shifter-example/submit/georgpap/pegasus/namd_wf/run0002/00/00/namd2_ID0000001.sh grid_resource = batch slurm papajim@xxxxxxxxxxxxxxxx log = /home/georgpap/GitHub/papajim/sns-namd-shifter-example/submit/georgpap/pegasus/namd_wf/run0002/namd_wf-0.log notification = NEVER output = /home/georgpap/GitHub/papajim/sns-namd-shifter-example/submit/georgpap/pegasus/namd_wf/run0002//00/00/namd2_ID0000001.out periodic_release = False periodic_remove = (JobStatus == 5) && ((CurrentTime - EnteredCurrentStatus) > 1800) priority = 20 should_transfer_files = YES submit_event_user_notes = pool:cori transfer_executable = true transfer_input_files = /home/georgpap/GitHub/papajim/sns-namd-shifter-example/submit/georgpap/pegasus/namd_wf/run0002/00/00/stage_in_remote_cori_0_0.meta,/home/georgpap/Software/Pegasus/pegasus-4.9.2dev/share/pegasus/sh/pegasus-lite-common.sh universe = grid when_to_transfer_output = ON_EXIT +pegasus_generator = "Pegasus" +pegasus_root_wf_uuid = "35eb9d14-6f86-416a-a78d-4ac8e2b04750" +pegasus_wf_uuid = "35eb9d14-6f86-416a-a78d-4ac8e2b04750" +pegasus_version = "4.9.2dev" +pegasus_wf_name = "namd_wf-0" +pegasus_wf_app = "namd_example" +pegasus_wf_time = "20190611T111232-0700" +pegasus_wf_xformation = "namd2" +pegasus_wf_dax_job_id = "ID0000001" +pegasus_wf_dag_job_id = "namd2_ID0000001" +pegasus_job_class = 1 +pegasus_site = "cori" +pegasus_job_runtime = 1200 +pegasus_cores = 32 +pegasus_cluster_size = 1 queue ###################################################################### # END OF SUBMIT FILE ######################################################################