[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] condor-ce: troubleshooting and jobRouter



Hello,

I'm practicing with HTCondor-ce and need some help as i'm not very fluent at troubleshooting / configuration.
Test pilot jobs submitted by a CMS factory are failing a validation 
shell script when running in the execute node.
Apparently, the reason is that no env var is passed to the job:

Environment = ""

I verified that the shell script succeeds if I submit it from the condor-ce itself by adding environment = "PATH=/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin" in the submit file.
However, if i submit the same from an external machine, again no 
environment is passed to the job in the exec node.
That seems to suggest that a few parameters are trimmed away. I think 
that JobRouter should be where such submission
parameters might be altered but i'm not sure at all and some simpler 
misconfiguration could explain this problem.
A couple of questions:

1) For jobs I submit there are logfiles such as /var/log/condor-ce/GridmanagerLog.dteam039
containing a line such as:

09/17/18 15:08:10 (D_ALWAYS:2) [4098033] GAHP[4098037] <- 'CONDOR_JOB_SUBMIT [SNIP] Environment\ =\ "PATH=/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin"; [SNIP]
where i can see the submit file content,
however there is no similar file for the cms user: /var/log/condor-ce/GridmanagerLog.pilcms017 Is there a way to compare the job parameters "before" and "after" the routing?
2) Does someone have a few examples of jobrouting configuration for a 
WLCG like HTCondor-CE ?
Currently i'm looking at 
https://opensciencegrid.org/docs/compute-element/job-router-recipes/ .
If the examples there are mostly adequate for a non OSG CE I can go on 
and refere to those ones.
Thanks for any help, bye

Stefano