[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Q: BLAH configuration for non-shared submission to Slurm?



I'm trying to use HTCondor to submit jobs to our Scarf HPC. At

present, this uses Platform LSF, and (following initial work by Andrew

Lahiff) I've managed to get this to work (to some extent). However,

Scarf is replacing Platform LSF with Slurm, and I'm having trouble

getting submission to work with Slurm in the case where the jobscript

is in a directory that is not shared with the worker nodes. (I am

submitting from a custom Scarf node that has Condor

installed. Ultimately, jobs will be submitted to this node from an

HTCondor node that is external to Scarf, so sharing won't be an

option.)

 

The problem seems to be that the jobscript that is generated by BLAH's

slurm_submit.sh assumes that the original jobscript has been copied to

a (unique) filename in a sandbox folder, but the copy never happens.

The lsf_submit.sh script generates BSUB directives that (I think)

instruct LSF to perform the intial copy, but I see no equivalent in

slurm_submit.sh.

 

None of this is reflected in the files created by HTCondor: the log

file implies that the job ran OK (but consumed no resources), and the

output and error files are always empty. Only by modifying the blah

scripts to log to somewhere other than /dev/null (and copying the

generated jobscripts to file) was I able to get more information about

what was going wrong!

 

batch_gahp.config has many options for defining which directories are

shared, and for overriding default locations for sandboxes etc. I have

tried numerous permutations, to no avail.

 

Is there a better guide to configuration than the comments in batch_gahp.config?

What special considerations are required for Slurm?

 

Thanks,

  Brian