Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] Condor/SGE cluster
> Hi,
Hi Lukas.
> okay, I've found the error.
>
> I had to add a line to /usr/libexec/condor/glite/bin/sge_submit.sh which
> includes the location of "qsub" to PATH.
This *should* in principle have been taken care of by executing SGE's
'settings.sh'. As I mentioned in my earlier post, you can make sure
it is found by setting sge_rootpath and sge_cellname appropriately in
batch_gahp.config:
if [ -z "$sge_rootpath" ]; then sge_rootpath="/usr/local/sge/pro"; fi
if [ -r "$sge_rootpath/${sge_cellname:-default}/common/settings.sh" ]
then
. $sge_rootpath/${sge_cellname:-default}/common/settings.sh
fi
Or, is settings.sh not setting the path ?
> By the way, there is some pointless code in this script:
>
> jobID=`qsub $bls_tmp_file 2> /dev/null | perl -ne 'print $1 if /^Your job
> (\d+)/;'` # actual submission
> retcode=$?
> if [ "$retcode" != "0" -o -z "$jobID" ] ; then
> rm -f $bls_tmp_file
> exit 1
> fi
Agree, thanks. This is now fixed in the upstream code.
> And for readability reasons you could use awk '{ print $3 }' instead of perl
> -ne 'print $1 if /^Your job (\d+) /;'.
This depends on what else qsub can output to stdout, and, having no direct
SGE experience, I'd be cautious in changing it.
> Furthermore, it would be nice if this script would generate some error
> messages or an error log.
I have trouble contacting the SGE script authors. The main issue here
is whether they had a reason to redirect the STDERR of the qsub command to
/dev/null. As STDERR is recorded in the logs it may have provided valuable
information. Would it be too much if I asked you to please test removing the
"2> /dev/null" and see if you hit any side effects ? This may be useful
for other users in the future.
Thank you for sharing your findings.
Francesco Prelz
INFN Milano