[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] We can not make JobRouter work



In the new syntax the way to set requirements for the route is to use REQUIREMENTS without an = after it.  The same for NAME, and UNIVERSE although NAME will default to the config knob suffix. 

Effects that are applied after the route has been matched, most of which can be expressions such as EditJobInPlace are still set using =,   but NAME, UNIVERSE and REQUIREMENTS are transform commands, Requirements = would be treated as a temporary variable and it would not affect the matching of the route to a job. 

If you run 

    condor_ce_job_router_info you should see that your routes don't have any route REQUIREMENTS, and that the NAME of the route is not what you spacified by Name =, since the default route name is derived from the knob name.

So your routes should be

JOB_ROUTER_ROUTE_T1_DE_KIT @=rtkit
   NAME Overflow:T1_DE_KIT
   REQIREMENTS  (DESIRED_SITES=="T1_DE_KIT")
   EditJobInPlace = True
   SET NEW_SITES "T2_DE_DESY"
   SET HasBeenOverflowRouted True
@rtkit

Sorry about the confusion.   An editing pass to improve the clarity of the manual incorrectly added the =  after REQUIREMENTS in the documentation.  

-tj


From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Stefano Belforte <stefano.belforte@xxxxxxx>
Sent: Tuesday, November 19, 2024 3:16 PM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: [HTCondor-users] We can not make JobRouter work
 
Dear experts,

we (CMS-CRAB) have used JobRouter since years to edit queued jobs in CMS
global pool to redirect then from busy sites to sites with possibly
available slots, something that we call "overflow". Details of why and
what happens to those jobs when they run do not matter, simply to
introduce the name.

Alas we have been using the deprecated JOB_ROUTER_ENTRIES_CMD macro
where custom script of ours was called to create routing on the fly So
last month we rewrote our stuff using static routing table in the
configuration file reproducing the same desiderata. But once put in
production things started to go nuts and we had to disable.

Unfortunately we have no experience with the new configuration, and
nobody else in CMS or glideinWms is using it.

We are running $CondorVersion: 23.9.6 2024-08-08 BuildID: 748275
PackageID: 23.9.6-1 GitSHA: dfdd9eaa $
and followed the example in
https://urldefense.com/v3/__https://htcondor.readthedocs.io/en/latest/grid-computing/job-router.html__;!!Mak6IKo!Ndnci6frzldmCRTkXOsrXkJa-bPOduq7SLTWqgJJY0VJBiqcb2ey0Fk_xPxhFVKfJeaIA2sryrJu__DZ4yxjeZeM2qwqBg$ 
defining a list of mutually exclusive routes as in [1]

But once the daemon starts the first route is matched to all possible
jobs, even if the Requirements = (DESIRED_SITES=="sitename") is not
satisfied.

Some things work as expected i.e. the edit in place, attribute setting,
and the route listed first in JOB_ROUTER_ROUTE_NAMES is clearly the one
used first. But that first rout is also applied to jobs where
DESIRED_SITES is a different string.

Basically all my idle jobs get the same value of NEW_SITES and I can
change that by changing the order or routes in JOB_ROUTER_ROUTE_NAMES [2].

Can you spot something which we do wrong here ?

I also noted that while jobs are routed, the RoutedBy attributes is not
set (ref
https://urldefense.com/v3/__https://htcondor.readthedocs.io/en/latest/admin-manual/configuration-macros.html*JOB_ROUTER_SOURCE_JOB_CONSTRAINT__;Iw!!Mak6IKo!Ndnci6frzldmCRTkXOsrXkJa-bPOduq7SLTWqgJJY0VJBiqcb2ey0Fk_xPxhFVKfJeaIA2sryrJu__DZ4yxjeZffkN5dBA$ 
and
https://urldefense.com/v3/__https://htcondor.readthedocs.io/en/latest/admin-manual/configuration-macros.html*JOB_ROUTER_NAME__;Iw!!Mak6IKo!Ndnci6frzldmCRTkXOsrXkJa-bPOduq7SLTWqgJJY0VJBiqcb2ey0Fk_xPxhFVKfJeaIA2sryrJu__DZ4yxjeZfx0ys_3A$ 
)

Let us know if there's any more information which I can send.

Thanks!!!

Stefano


[1]

[root@vocms059 config.d]# pwd
/etc/condor/config.d
[root@vocms059 config.d]# cat 90_jobrouter.config
# Configuration file for the JobRouter
#
JOB_ROUTER_NAME = OverflowRouter

JOB_ROUTER_SOURCE_JOB_CONSTRAINT = ((JobUniverse==5) && (jobstatus==1))

# Static route names for each T1
JOB_ROUTER_ROUTE_NAMES =  T1_DE_KIT T1_IT_CNAF T1_UK_RAL T1_ES_PIC
T1_FR_CCIN2P3

JOB_ROUTER_ROUTE_T1_DE_KIT @=rtkit
   Name = "Overflow:T1_DE_KIT"
   EditJobInPlace = True
   Requirements = (DESIRED_SITES=="T1_DE_KIT")
   SET NEW_SITES "T2_DE_DESY"
   SET HasBeenOverflowRouted True
@rtkit

JOB_ROUTER_ROUTE_T1_ES_PIC @=rtpic
   Name = "Overflow:T1_ES_PIC"
   EditJobInPlace = True
   Requirements = (DESIRED_SITES=="T1_ES_PIC")
   SET NEW_SITES "T2_ES_CIEMAT"
   SET HasBeenOverflowRouted True
@rtpic

JOB_ROUTER_ROUTE_T1_FR_CCIN2P3 @=rtin2p3
   Name = "Overflow:T1_FR_CCIN2P3"
   EditJobInPlace = True
   Requirements = (DESIRED_SITES=="T1_FR_CCIN2P3")
   SET NEW_SITES "T2_FR_GRIF,T2_FR_IPHC"
   SET HasBeenOverflowRouted True
@rtin2p3

JOB_ROUTER_ROUTE_T1_IT_CNAF @=rtcnaf
   Name = "Overflow:T1_IT_CNAF"
   EditJobInPlace = True
   Requirements = (DESIRED_SITES=="T1_IT_CNAF")
   SET NEW_SITES "T2_IT_Pisa,T2_IT_Rome"
   SET HasBeenOverflowRouted True
@rtcnaf

JOB_ROUTER_ROUTE_T1_UK_RAL @=rtral
   Name = "Overflow:T1_UK_RAL"
   EditJobInPlace = True
   Requirements = (DESIRED_SITES=="T1_UK_RAL")
   SET NEW_SITES "T2_UK_London_IC,T2_UK_SGrid_RALPP"
   SET HasBeenOverflowRouted True
@rtral

# How often to poll the job queue to route jobs
JOB_ROUTER_POLLING_PERIOD = 5*60

# Start the Job Router
DAEMON_LIST = $(DAEMON_LIST) JOB_ROUTER
[root@vocms059 config.d]#

[2]

belforte@vocms059/HTCondor> condor_q -con HasBeenOverflowRouted -af:h
jobstatus desired_sites  new_sites
jobstatus desired_sites             new_sites
1         T1_DE_KIT                 T2_DE_DESY
1         T1_ES_PIC                 T2_DE_DESY
1         T1_FR_CCIN2P3             T2_DE_DESY
1         T1_IT_CNAF                T2_DE_DESY
1         T1_UK_RAL                 T2_DE_DESY
1         T1_US_FNAL                T2_DE_DESY
1         T2_CH_CERN_HLT            T2_DE_DESY
1         T2_CH_CERN_P5             T2_DE_DESY
1         T2_IN_TIFR                T2_DE_DESY
1         T2_IT_Rome                T2_DE_DESY
1         T2_LB_HPC4L               T2_DE_DESY
1         T2_PK_NCP                 T2_DE_DESY
1         T2_TR_METU                T2_DE_DESY
1         T2_UK_SGrid_Bristol       T2_DE_DESY
1         T3_BG_UNI_SOFIA           T2_DE_DESY
1         T3_IN_TIFRCloud           T2_DE_DESY
1         T3_IT_Opportunistic_dodas T2_DE_DESY
1         T3_MX_Cinvestav           T2_DE_DESY
1         T3_TW_TIDC                T2_DE_DESY
1         T3_US_FNALLPC             T2_DE_DESY
1         T3_US_Ookami              T2_DE_DESY
1         T3_US_Test                T2_DE_DESY
1         T3_US_UMD                 T2_DE_DESY
1         T1_IT_CNAF                T2_DE_DESY
1         T1_IT_CNAF                T2_DE_DESY
1         T1_ES_PIC                 T2_DE_DESY
1         T1_FR_CCIN2P3             T2_DE_DESY
1         T1_UK_RAL                 T2_DE_DESY
1         T1_UK_RAL                 T2_DE_DESY
belforte@vocms059/HTCondor>


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe

The archives can be found at: https://www-auth.cs.wisc.edu/lists/htcondor-users/