OK will try all of this outside of a glideinwms factory and make sure I have it reproducible and then will probably have more questions.
Steve Timm
From: Carl Edquist <edquist@xxxxxxxxxxx>
Sent: Tuesday, March 3, 2020 3:17 PM To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx> Cc: Steven C Timm <timm@xxxxxxxx>; Brian Lin <blin@xxxxxxxxxxx> Subject: Re: [HTCondor-users] HTCondor - Slurm integration So, Jaime has brought to my attention that apparently for "Queue" the
preferred way to specify it in the submit file is "batch_queue", which condor recognizes and translates to the "BatchQueue" attribute in the classad for remote batch systems. As for the other attributes, you can specify them with a leading "+", like "+NodeNumber = 8", and the leading "remote_" is not needed for any of these. Carl On Tue, 3 Mar 2020, Brian Lin wrote: > Hi Steve, > > Since this is for a CE, you'll want to use `set_remote_queue` or > `eval_set_remote_queue` in your job router configuration. Carl's going > to double-check that prefixing `remote_` is applicable to the other > attributes in question. > > As for the remote CE requirements, HTCondor-CE 4 with HTCondor 8.8 has a > simpler format > (https://urldefense.proofpoint.com/v2/url?u=https-3A__htcondor-2Dce.readthedocs.io_en_latest_releases_-23400&d=DwIDbA&c=gRgGjJ3BkIsb5y6s49QqsA&r=10BCTK25QMgkMYibLRbpYg&m=79F1C8FyHDRNsCc6-8ExM85S63M2bOnPcUBxUnjvyqI&s=5zORLCX9K3kXRkR6MLaLdUwf5rfU2XEhcboKvLpje1M&e= ), so you > could set something like the following in your job route: > > set_Container = "cmssw/cms:rhel7"; > set_default_CERequirements = "Container"; > > And use the $Container variable in your slurm_local_submit_attributes.sh. > It's not substituted at submit time per se but rather at the time that > Bosco/BLAHP generates the submit file. > > Reviewing your local submit attributes further, you can simplify some of > those lines (this is assuming the need for the "remote_" prefix): > > echo "#SBATCH --account=m2612" > --> 'set_remote_BatchProject = "m2612";' in your job route > > echo "#SBATCH -N 1" > --> this is hardcoded so you can eliminate this line > > echo "#SBATCH -t 48:00:00" > --> 'set_remote_BatchRuntime = 2880;' in your job route > > Let us know if you have any additional questions! > > - Brian > > On 3/2/20 4:19 PM, Carl Edquist wrote: >> Hi Steve, >> >>> I am now using htcondor 8.9.5 and the newest bosco/blahp on the remote end >>> (bosco 1.3.0). >> >> Ok, as far as I can tell the only significant addition to slurm_submit.sh >> between condor 8.8.4 and 8.9.5 was the ability to specify a job cluster, >> which translates to a line with "#SBATCH -M $cluster_name". I don't see >> that any of the parameters have gone away though. >> >> >>> I tried all 5 of the parameters Carl has got here and none of them made it >>> through into the slurm job that got submitted.. >> >> On the condor side, I think you may need to prefix those attribute names >> with "+remote_", if I understand correctly what I see in the manual here: >> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__htcondor.readthedocs.io_en_stable_grid-2Dcomputing_grid-2Duniverse.html-23htcondor-2Dc-2Djob-2Dsubmission&d=DwIDbA&c=gRgGjJ3BkIsb5y6s49QqsA&r=10BCTK25QMgkMYibLRbpYg&m=79F1C8FyHDRNsCc6-8ExM85S63M2bOnPcUBxUnjvyqI&s=XSxe0D17J726qWsWtxN4YWBBSXc1CFE5EdrulYY6CQ4&e= >> >> >>> Brian also pointed out that in 8.9 and the newer versions of htcondor-ce >>> there is a variable substitution feature via the >>> set_default_remotece_requirements. >> >>> and then modified my condor submit file to have >>> >>> set_default_remote_cerequirements = strcat(Container == cmssw/cms:rhel7) >> >> So, a couple details that catch my attention are, >> >> - you mention "set_default_remotece_requirements" -- maybe just a typo in >> the email; it's "remote_cerequirements" not "remotece_requirements" >> >> - and, from my read of the "Setting batch system directives" section in the >> manual that you linked, "set_default_remote_cerequirements" goes in the >> "JOB_ROUTER_ENTRIES" configuration (defined in >> /etc/condor-ce/config.d/02-ce-*.conf and >> /etc/condor-ce/config.d/99-local.conf), but note that the attribute itself >> is called "default_remote_cerequirements" (without the "set_" prefix). So, >> i'm thinking putting "set_default_remote_cerequirements" in the submit file >> itself might not do the right thing. >> >> Brian, can you confirm about whether set_default_remote_cerequirements or >> default_remote_cerequirements can be used in a submit file? >> >> Thanks, >> Carl >> >> On Mon, 24 Feb 2020, Steven Timm wrote: >> >>> I am just looking at this again now. >>> >>> "Queue" is a reserved word in the condor submit language so it can't >>> possibly be used to also specify the remote queue, can it? (I got an error >>> when I tried). >>> >>> I am now using htcondor 8.9.5 and the newest bosco/blahp on the remote end >>> (bosco 1.3.0). >>> >>> I tried all 5 of the parameters Carl has got here and none of them made it >>> through into the slurm job that got submitted.. I am still investigating >>> as to why that was. >>> >>> Brian also pointed out that in 8.9 and the newer versions of htcondor-ce >>> there is a variable substitution feature via the >>> set_default_remotece_requirements. >>> >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__htcondor-2Dce.readthedocs.io_en_latest_batch-2Dsystem-2Dintegration_-23setting-2Dbatch-2Dsystem-2Ddirectives&d=DwIDbA&c=gRgGjJ3BkIsb5y6s49QqsA&r=10BCTK25QMgkMYibLRbpYg&m=79F1C8FyHDRNsCc6-8ExM85S63M2bOnPcUBxUnjvyqI&s=03OVceO_mDuBVxvmTSFkXB_jWr4fV4MxikCTePBMBwI&e= >>> >>> Below is what our slurm_local_submit_attributes.sh looks like at NERSC >>> right now. All of those attributes can and do sometimes change. >>> >>> >>> echo "#SBATCH --account=m2612" >>> #echo "#SBATCH --reservation=xrootd_debug" >>> echo "#SBATCH -N 1" >>> echo "#SBATCH -q regular" >>> echo "#SBATCH -C knl,cache,quad" >>> echo "#SBATCH --image=cmssw/cms:rhel7" >>> echo "#SBATCH -L cscratch1,cvmfs" >>> echo "#SBATCH --module=cvmfs" >>> echo "#SBATCH >>> --volume=\"/global/cscratch1/sd/uscms/node_cache:/tmp:perNodeCache=size=680G\"" >>> echo "#SBATCH -t 48:00:00" >>> >>> >>> So do I understand correctly that if >>> I modified my script to be >>> >>> echo "#SBATCH --image=$Container" >>> >>> and then modified my condor submit file to have >>> >>> set_default_remote_cerequirements = strcat(Container == cmssw/cms:rhel7) >>> >>> that the Container variable would be substituted in at submit time? >>> >>> If not, then how does it work? >>> >>> >>> Steve Timm >>> >>> >>> >>> >>> >>> >>> On Mon, 30 Sep 2019, Carl Edquist wrote: >>> >>>> Hi Asvija, >>>> >>>> Brian asked me to look into this - sorry for the delay getting back to >>>> you. >>>> >>>> The mappings I find based on the condor 8.8.4 version of slurm_submit.sh >>>> are: >>>> >>>> "BatchProject" -> >>>> #SBATCH -A $bls_opt_project >>>> >>>> "BatchRuntime" -> >>>> #SBATCH -t $((bls_opt_runtime / 60)) >>>> >>>> "RequestMemory" -> >>>> #SBATCH --mem=${bls_opt_req_mem} >>>> >>>> "Queue" -> >>>> #SBATCH -p $bls_opt_queue >>>> >>>> "NodeNumber" -> >>>> #SBATCH -N $bls_opt_mpinodes >>>> >>>> Carl >>>> >>>> On Thu, 5 Sep 2019, Asvija B wrote: >>>> >>>>> Hi Brian, >>>>> >>>>> Condor version is 8.8.4 >>>>> >>>>> >>>>> Thanks and regards, >>>>> >>>>> Asvija >>>>> >>>>> On 9/5/2019 2:33 AM, Brian Lin wrote: >>>>>> Hi Asvija, >>>>>> >>>>>> Unfortunately, there isn't much in terms for documentation but I could >>>>>> give you a mapping if you give me the version of HTCondor you're >>>>>> running. >>>>>> >>>>>> Thanks, >>>>>> Brian >>>>>> >>>>>> On 8/19/19 12:12 AM, Asvija B wrote: >>>>>>> Thanks a lot Brian... I am able to see the +remote_NodeNumber getting >>>>>>> translated properly. >>>>>>> >>>>>>> Can you also please indicate the corresponding directives for other >>>>>>> SLURM related attributes as well (like --nodes, ntasks etc.) >>>>>>> >>>>>>> It would be great if you can point me to some documentation related to >>>>>>> this info.. >>>>>>> >>>>>>> Additionally, the slurm_submit.sh file from BLAH's github directory ( >>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_prelz_BLAH_blob_master_src_scripts_slurm-5Fsubmit.sh&d=DwIFbA&c=gRgGjJ3BkIsb5y6s49QqsA&r=10BCTK25QMgkMYibLRbpYg&m=VCj3itsHHqD4WL7jaj14STI_RiA3yPFQuYkHOeb9zfM&s=uSoCpZIHSkJbWZvxQFc38hmbXxpxB11Zcgi6nOZorLs&e= >>>>>>> ) has additional capabilities of GPU support and MIC support. Do we >>>>>>> have any documentation which points to the corresponding Condor >>>>>>> directives for these ? >>>>>>> >>>>>>> Thanks again for the information. >>>>>>> >>>>>>> Regards, >>>>>>> >>>>>>> Asvija >>>>>>> >>>>>>> >>>>>>> On 8/16/2019 8:53 PM, Brian Lin wrote: >>>>>>>> Hi Asvjia, >>>>>>>> >>>>>>>> You'll want to specify '+remote_NodeNumber' in your original grid job >>>>>>>> submit file. However, you should note that the Slurm directives we >>>>>>>> set >>>>>>>> will be changing in future releases of HTCondor 8.9 to the following: >>>>>>>> >>>>>>>> "#SBATCH --nodes=1" >>>>>>>> "#SBATCH --ntasks=1" >>>>>>>> "#SBATCH --cpus-per-task=$bls_opt_mpinodes" >>>>>>>> >>>>>>>> - Brian >>>>>>>> >>>>>>>> On 8/13/19 12:32 AM, Asvija B wrote: >>>>>>>>> Dear Condor users, >>>>>>>>> >>>>>>>>> We are planning to use HT-Condor for submitting jobs to some of our >>>>>>>>> SLURM managed clusters. As I digged into the documentation, I >>>>>>>>> understood that HT-Condor uses BLAH GAHP for supporting job >>>>>>>>> submission >>>>>>>>> to SLURM. >>>>>>>>> >>>>>>>>> We are interested in submitting MPI jobs to SLURM through HT-Condor. >>>>>>>>> In this regard, I am unable to look at the configuration parameters >>>>>>>>> in >>>>>>>>> the condor submission script for indicating MPI related information >>>>>>>>> (for eg. number of nodes etc.) >>>>>>>>> >>>>>>>>> I have seen the script file >>>>>>>>> $CONDOR_HOME/libexec/glite/bin/slurm_submit.sh . It does include >>>>>>>>> statements with $bls_opt_mpinodes which translate to "SBATCH -N " >>>>>>>>> directives. However I am not clear about the equivalent condor >>>>>>>>> directives that will result in the proper SLURM directives. Hence it >>>>>>>>> would be great if any of the SLURM users can comment on this. >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks and regards, >>>>>>>>> >>>>>>>>> Asvija B >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>> >>>>> >>>>> ------------------------------------------------------------------------------------------------------------ >>>>>>>>> >>>>>>>>> >>>>>>>>> [ C-DAC is on Social-Media too. Kindly follow us at: >>>>>>>>> Facebook: >>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.facebook.com_CDACINDIA&d=DwIFbA&c=gRgGjJ3BkIsb5y6s49QqsA&r=10BCTK25QMgkMYibLRbpYg&m=VCj3itsHHqD4WL7jaj14STI_RiA3yPFQuYkHOeb9zfM&s=uvVH3LcThEuGbesE0n2o3_BwAhhAFvrhFuoGZIVbviw&e= >>>>>>>>> & Twitter: @cdacindia ] >>>>>>>>> >>>>>>>>> This e-mail is for the sole use of the intended recipient(s) and may >>>>>>>>> contain confidential and privileged information. If you are not the >>>>>>>>> intended recipient, please contact the sender by reply e-mail and >>>>>>>>> destroy >>>>>>>>> all copies and the original message. Any unauthorized review, use, >>>>>>>>> disclosure, dissemination, forwarding, printing or copying of this >>>>>>>>> is strictly prohibited and appropriate legal action will be taken. >>>>>>>>> >>>>> >>>>> >>>>> ------------------------------------------------------------------------------------------------------------ >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> HTCondor-users mailing list >>>>>>>>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx >>>>>>>>> with a >>>>>>>>> subject: Unsubscribe >>>>>>>>> You can also unsubscribe by visiting >>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.cs.wisc.edu_mailman_listinfo_htcondor-2Dusers&d=DwIFbA&c=gRgGjJ3BkIsb5y6s49QqsA&r=10BCTK25QMgkMYibLRbpYg&m=VCj3itsHHqD4WL7jaj14STI_RiA3yPFQuYkHOeb9zfM&s=WBQKEaMHUAFVqImfbLGU1P8F_wjAZQRDNkKVZSRfaVU&e= >>>>>>>>> The archives can be found at: >>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.cs.wisc.edu_archive_htcondor-2Dusers_&d=DwIFbA&c=gRgGjJ3BkIsb5y6s49QqsA&r=10BCTK25QMgkMYibLRbpYg&m=VCj3itsHHqD4WL7jaj14STI_RiA3yPFQuYkHOeb9zfM&s=sMGjIfjYSKnCI3pGrWIMpuctjLWtvfAv5yg6eFUthJ0&e= >>>>>>>> >>>>>>>> >>>>>>> >>>>> >>>>> >>>>> ------------------------------------------------------------------------------------------------------------ >>>>>>> >>>>>>> [ C-DAC is on Social-Media too. Kindly follow us at: >>>>>>> Facebook: >>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.facebook.com_CDACINDIA&d=DwIFbA&c=gRgGjJ3BkIsb5y6s49QqsA&r=10BCTK25QMgkMYibLRbpYg&m=VCj3itsHHqD4WL7jaj14STI_RiA3yPFQuYkHOeb9zfM&s=uvVH3LcThEuGbesE0n2o3_BwAhhAFvrhFuoGZIVbviw&e= >>>>>>> & Twitter: @cdacindia ] >>>>>>> >>>>>>> This e-mail is for the sole use of the intended recipient(s) and may >>>>>>> contain confidential and privileged information. If you are not the >>>>>>> intended recipient, please contact the sender by reply e-mail and >>>>>>> destroy >>>>>>> all copies and the original message. Any unauthorized review, use, >>>>>>> disclosure, dissemination, forwarding, printing or copying of this >>>>>>> is strictly prohibited and appropriate legal action will be taken. >>>>>>> >>>>> >>>>> >>>>> ------------------------------------------------------------------------------------------------------------ >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> ------------------------------------------------------------------------------------------------------------ >>>>> [ C-DAC is on Social-Media too. Kindly follow us at: >>>>> Facebook: >>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.facebook.com_CDACINDIA&d=DwIFbA&c=gRgGjJ3BkIsb5y6s49QqsA&r=10BCTK25QMgkMYibLRbpYg&m=VCj3itsHHqD4WL7jaj14STI_RiA3yPFQuYkHOeb9zfM&s=uvVH3LcThEuGbesE0n2o3_BwAhhAFvrhFuoGZIVbviw&e= >>>>> & Twitter: @cdacindia ] >>>>> >>>>> This e-mail is for the sole use of the intended recipient(s) and may >>>>> contain confidential and privileged information. If you are not the >>>>> intended recipient, please contact the sender by reply e-mail and >>>>> destroy >>>>> all copies and the original message. Any unauthorized review, use, >>>>> disclosure, dissemination, forwarding, printing or copying of this email >>>>> is strictly prohibited and appropriate legal action will be taken. >>>>> >>>>> >>>>> ------------------------------------------------------------------------------------------------------------ >>>>> >>>>> _______________________________________________ >>>>> HTCondor-users mailing list >>>>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx >>>>> with a >>>>> subject: Unsubscribe >>>>> You can also unsubscribe by visiting >>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.cs.wisc.edu_mailman_listinfo_htcondor-2Dusers&d=DwIFbA&c=gRgGjJ3BkIsb5y6s49QqsA&r=10BCTK25QMgkMYibLRbpYg&m=VCj3itsHHqD4WL7jaj14STI_RiA3yPFQuYkHOeb9zfM&s=WBQKEaMHUAFVqImfbLGU1P8F_wjAZQRDNkKVZSRfaVU&e= >>>>> The archives can be found at: >>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.cs.wisc.edu_archive_htcondor-2Dusers_&d=DwIFbA&c=gRgGjJ3BkIsb5y6s49QqsA&r=10BCTK25QMgkMYibLRbpYg&m=VCj3itsHHqD4WL7jaj14STI_RiA3yPFQuYkHOeb9zfM&s=sMGjIfjYSKnCI3pGrWIMpuctjLWtvfAv5yg6eFUthJ0&e= >>>> >>>> _______________________________________________ >>>> HTCondor-users mailing list >>>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with >>>> a >>>> subject: Unsubscribe >>>> You can also unsubscribe by visiting >>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.cs.wisc.edu_mailman_listinfo_htcondor-2Dusers&d=DwIFbA&c=gRgGjJ3BkIsb5y6s49QqsA&r=10BCTK25QMgkMYibLRbpYg&m=VCj3itsHHqD4WL7jaj14STI_RiA3yPFQuYkHOeb9zfM&s=WBQKEaMHUAFVqImfbLGU1P8F_wjAZQRDNkKVZSRfaVU&e= >>>> The archives can be found at: >>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.cs.wisc.edu_archive_htcondor-2Dusers_&d=DwIFbA&c=gRgGjJ3BkIsb5y6s49QqsA&r=10BCTK25QMgkMYibLRbpYg&m=VCj3itsHHqD4WL7jaj14STI_RiA3yPFQuYkHOeb9zfM&s=sMGjIfjYSKnCI3pGrWIMpuctjLWtvfAv5yg6eFUthJ0&e= >>> >>> ------------------------------------------------------------------ >>> Steven C. Timm, Ph.D (630) 840-8525 >>> timm@xxxxxxxx http://home.fnal.gov/~timm/ >>> Office: Feynman Computing Center 243 >>> Fermilab Scientific Computing Division, >>> Scientific Computing Facilities Quadrant., >>> Experimental Computing Facilities Dept., >>> Grid and Cloud Operations Group >>> > |