Hello all, an update on this:
I replicated the nonworking rules to a condor-ce with little
load (it serves only one VO) and these work as expected.
This ensures that the rule syntax is correct.
Then i noticed that in the other CEs there were several
nonrouted jobs from a VO who recently started using token
credentials, and
whose jobrouter rule was not yet token aware. After fixing that
rule, the pending jobs were routed and my rule also started
working.
For a while, only. This morning i found several nonroutedjobs
(Qdate --> midnight, routing rule correct, i.e. those jobs
SHOULD have been routed).
I manually removed those stuck jobs and next fresh ones were
being routed flawlessly. The route i'm adding, however, still
does not work.
Questions:
- is there a maximum lenght for the active routes listed in JOB_ROUTER_ROUTE_NAMES
?
- is there a "cache effect" so that fixing an error in a
JOB_ROUTER_ROUTE_<name>
entry does not take effect until <some_cache>
expiration?
- is there a (short) timeout for a scitoken job to be routed,
after that no more chances exist of being routed?
- if I rename an existing route does that help with "caching"
problems? (spoiler: no, i just verified that).
Stefano
On 28/03/23 23:55, Stefano Dal Pra wrote:
Hi Todd, thanks for the advices;
yes, I issued condor_ce_reconfig. The suggested command says
[root@ce07-htc ~]# condor_ce_history -l 3250138.0 |Â
condor_ce_job_router_info -match-jobs -ignore-prior-routing
-jobads -
Matching jobs against routes to find candidate jobs.
And the same for other test jobs in the queue:
[root@ce03-htc ~]# condor_ce_q 6384655.0 -l |
condor_ce_job_router_info -match-jobs -ignore-prior-routing
-jobads -
Matching jobs against routes to find candidate jobs.
Since the REQUIREMENTS _expression_ evaluates to True, my guess is
that a routing is attempted but fails, possibly because
of some residual problem with that specific token issuer. In
fact, there are token only jobs flowing regularly; for example
these ones from atlas:
[root@ce06-htc ~]# cccv JOB_ROUTER_ROUTE_atlas_sam
REQUIREMENTS (x509UserProxyVoName =?= "atlas" &&
x509UserProxyFirstFQAN =?=
"/atlas/Role=lcgadmin/Capability=NULL") || (AuthTokenIssuer =?=
"https://atlas-auth.web.cern.ch/"
&& AuthTokenSubject =?=
"5c5d2a4d-9177-3efa-912f-1b4e5c9fb660")
UNIVERSE VANILLA
SET Requirements (TARGET.t1_allow_sam =?= true) &&
(!StringListMember("gpfs_atlas",t1_GPFS_CHECK ?: "",":"))
[root@ce06-htc ~]# condor_ce_q -cons 'x509userproxyvoname =?=
undefined && AuthTokenSubject ==
"5c5d2a4d-9177-3efa-912f-1b4e5c9fb660"' -af:j owner jobstatus
routedtojobid qdate 'formattime(qdate)'
5394875.0 atlassgm006 1 8837395.0 1680039294 Tue Mar 28 23:34:54
2023
The above is a "token only job". However this other one remains
idle:
[sdalpra@ui-htc CE5]$ export
_condor_SEC_CLIENT_AUTHENTICATION_METHODS=SCITOKENS ;
condor_submit -pool ce06-htc.cr.cnaf.infn.it:9619 -remote
ce06-htc.cr.cnaf.infn.it -append '+WantRoute = "herd_cloud"'
ce_scitok308.sub
Submitting job(s).
1 job(s) submitted to cluster 5394871.
[root@ce06-htc ~]# cccv JOB_ROUTER_ROUTE_herdcloud
REQUIREMENTS (AuthTokenIssuer =?= "https://iam-herd.cloud.cnaf.infn.it/"
&& AuthTokenSubject =?=
"6f925657-f9aa-4cb6-b264-a3b1ee78df57")
UNIVERSE VANILLA
SET Requirements (TARGET.t1_group =?= "herd_cloud")
SET RequestMemory 400
SET MaxJobs 35
SET MaxIdleJobs 12
[root@ce06-htc ~]# condor_ce_q 5394871.0 -af:j owner
routedtojobid '(AuthTokenIssuer =?= "https://iam-herd.cloud.cnaf.infn.it/"
&& AuthTokenSubject =?=
"6f925657-f9aa-4cb6-b264-a3b1ee78df57")'
5394871.0 herd006 undefined true
[root@ce06-htc ~]# condor_ce_q 5394871.0 -l |
condor_ce_job_router_info -match-jobs -ignore-prior-routing
-jobads -
Matching jobs against routes to find candidate jobs.
Stefano
Â
On 28/03/23 21:36, Todd Tannenbaum
wrote:
On 3/28/2023 5:42 AM, Stefano Dal
Pra wrote:
When using (only) x509 and no token, the job is mapped (by
argus) to dteam026.
StringListMember should work the same with dteam007 or
dteam026
however it only matches with dteam026 (i.e. GSI). and not
with dteam007.
I normally check for issuer and subject in the jobrouter;
i tried with StringListMember to
restrict the check to Owner only.
Hi Stefano -
After changing the route to try StringListMember, did you
remember to issue a "condor_ce_reconfig"
command?Â
For job 3250138.0 below, it sure looks like the owner mapping
from the token worked fine... perhaps this command will give
a clue:
root@host # condor_ce_history -l
3250138.0 | condor_ce_job_router_info -match-jobs -ignore-prior-routing -jobads -
Also see the CE Manual for troubleshooting tips when a job
does not route at URL:
 https://htcondor.com/htcondor-ce/v4/troubleshooting/troubleshooting/#jobs-stay-idle-on-the-ce
Hope the above helps, let us know how it goes, feel free to
ask for more help if you continue to be stuck.
regards,
Todd
Adding a detail on the submit
file used for GSI and SCITOKENS
#submit file for GSI
[sdalpra@ui-htc
CE5]$ cat ce_gsi308.sub
universe = vanilla
use_x509userproxy = true
+Owner = undefined
[...]
[sdalpra@ui-htc
CE5]$ cat ce_scitok308.sub Â
universe = vanilla
use_scitokens = true
+Owner = undefined
Stefano
On 28/03/23 11:56, Thomas Hartmann wrote:
Hi
Stefano,
how does your token mapping look like? ð
Just a suspicion, but maybe the token subject is mapped to
another user than the X509 mapped user and the requirement
 REQUIREMENTS StringListMember(Owner,
"dteam007|dteam026|cmssgm017","|")
does not get triggered?
Cheers,
 Thomas
On 27/03/2023 22.50, Stefano Dal Pra wrote:
Hello to all,
htcondor-ce-5.1.6 + condor-9.0.17 Here.
I'm having problems with HTCondor-CE not routing jobs
submitted with iam token [1]. The same routing rule [2]
or [3] working with GSI does not work with tokens.
More notes in [4].
USING GSI
#This works
[sdalpra@ui-htc CE5]$ export
_condor_SEC_CLIENT_AUTHENTICATION_METHODS=GSI ;
condor_submit -pool ce07-htc.cr.cnaf.infn.it:9619
-remote ce07-htc.cr.cnaf.infn.it ce_gsi308.sub
Submitting job(s).
1 job(s) submitted to cluster 3250129.
#the job is routed and submitted to condor; note the
local user (dteam026), that is mapped by argus
[root@ce07-htc ~]# condor_ce_q 3250129. -af:j owner
routedtojobid
3250129.0 dteam026 4991835.0
USING SCITOKENS
#This does not work
[sdalpra@ui-htc CE5]$ export
_condor_SEC_CLIENT_AUTHENTICATION_METHODS=SCITOKENS ;
condor_submit -pool ce07-htc.cr.cnaf.infn.it:9619
-remote ce07-htc.cr.cnaf.infn.it ce_scitok308.sub
Submitting job(s).
1 job(s) submitted to cluster 3250138.
#the job is never routed. Note that the REQUIREMENTS
_expression_ evaluates to true.
[root@ce07-htc ~]# condor_ce_q 3250138. -af:j owner
routedtojobid 'StringListMember(Owner,
"dteam007|dteam026|cmssgm017","|")'
3250138.0 dteam007 undefined true
[1] The token being used
[sdalpra@ui-htc CE5]$ cat Â$BEARER_TOKEN_FILE|jwt.py -v
{
ÂÂ"alg": "RS256",
ÂÂ"kid": "rsa1"
}
{
ÂÂ"sub": "9662c0b5-31a1-4478-963e-bdf3783232ed",
ÂÂ"iss": "https://wlcg.cloud.cnaf.infn.it/",
ÂÂ"wlcg.groups": [
ÂÂÂÂ"/wlcg",
ÂÂÂÂ"/wlcg/pilots",
ÂÂÂÂ"/wlcg/xfers"
ÂÂ],
ÂÂ"wlcg.ver": "1.0",
ÂÂ"jti": "4270f069-81d9-48fb-88ef-817a83b98c6a",
ÂÂ"exp": 1679943559,
ÂÂ"iat": 1679939959,
ÂÂ"client_id": "ad852b22-e517-44a4-99e8-7c0660f878a1",
ÂÂ"scope": "openid compute.create profile compute.read
storage.read:/ compute.modify eduperson_entitlement wlcg
storage.create:/ offline_access compute.cancel eduperson
_scoped_affiliation storage.modify:/ email wlcg.groups",
ÂÂ"nbf": 1679939959,
ÂÂ"aud": "https://wlcg.cern.ch/jwt/v1/any"
}
exp: Mon Mar 27 20:59:19 2023
[2],[3] Jobrouter rules
JOB_ROUTER_ROUTE_routestsci @=jrt
REQUIREMENTS StringListMember(Owner,
"dteam007|dteam026|cmssgm017","|")
ÂÂ UNIVERSE VANILLA
SET Requirements (TARGET.t1_group=?= "myfancygroup")
ÂÂÂSET RequestMemory 400
ÂÂÂSET MaxJobs 5
ÂÂÂSET MaxIdleJobs 10
@jrt
JOB_ROUTER_ROUTE_routestgsi @=jrt
REQUIREMENTS (x509UserProxyVOName== "dteam") ||
(AuthTokenIssuer =?= "https://wlcg.cloud.cnaf.infn.it/"&&
AuthTokenSubject =?=
"9662c0b5-31a1-4478-963e-bdf3783232ed")
ÂÂUNIVERSE VANILLA
SET Requirements (TARGET.t1_group=?= "testgroup")
@jrt
JOB_ROUTER_ROUTE_NAMES= routestsci routestgsi
$(JOB_ROUTER_ROUTE_NAMES)
[4] Notes
- scitoken is "partially" valid as the mapping to the
local user succeeds.
- the REQUIREMENTS _expression_ matches with the condor-ce
job, i.e.
ÂÂÂÂ condor_ce_q <jobid> -af
StringListMember(Owner,
"dteam007|dteam026|cmssgm017","|")
ÂÂ returns True.
- These rules used to work as far as i know. More
complex REQUIREMENTS expressions where successfully used
with tokens.
- I checked rule [2] against a condor-ce at another site
where a colleague accepted to test it; the result is the
same: using GSI the job is routed, using SCITOKENS it is
not.
- I find nothing useful in the condor-ce logs:
[root@ce07-htc ~]# grep 3250492. /var/log/condor-ce/*Log
/var/log/condor-ce/AuditLog:03/27/23 21:54:54 (cid:18395186) (D_AUDIT)
Submitting new job 3250492.0
/var/log/condor-ce/AuditLog:03/27/23 21:54:54 (cid:18395188) (D_AUDIT)
Transferring files for jobs 3250492.0
/var/log/condor-ce/SchedLog:03/27/23 21:54:55 (D_ALWAYS)
Job 3250492.0 released from hold: Data files spooled
Also at maximum verbosity nothing is found in the
JobRouterLog.
I'm out of ideas now. Any hint to find out what's wrong?
Thanks
Stefano
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
--
Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
Center for High Throughput Computing Department of Computer Sciences
Calendar: https://tinyurl.com/yd55mtgd 1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132 Madison, WI 53706-1685
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/