Thanks Zach, Iâll give your suggestion a try. So many of my scripts/programs include kludges as workarounds for certain situations!
ð Iâd be surprised if any piece of code doesnât have a hack in it somewhere. Cheers Greg From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx>
On Behalf Of Zach Miller Hi Greg, We don't currently have anything like _ALLOW or _DENY for NETWORK_INTERFACE. And actually, I think your solution of enforcing this at the Central Manager is the better
solution, as it prevents people from potentially running their own personal HTCondor with their own configuration on the laptop (through the VPN). If HTCondor is sitting idle on the laptop, I don't believe it would be using a lot of resources but it would still be attempting to send updates every five minutes, so
you probably don't want it to be running at all. My best suggestion there is to put something like this in condor_config: MASTER.DAEMON_SHUTDOWN = (regexp("xxx\.yyy\.zzz\.", MyAddress)) It's a bit of a hack maybe, but seems to work in a quick test. Hope that helps! Cheers, -zach
-----Original Message----- Hi All Just wondering if there is some way of having the equivalent of allow and deny for NETWORK_INTERFACE Our situation: Due to covid-19 and the major increase in staff working from home our organisation has mandated that the default machine allocated to new workers in now a laptop, rather than a desktop. They have also rolled out a desktop-to-laptop replacement program for existing workers. Everyone has/will have delivered desk/chair/webcam/dock/keyboard/mouse/wireless headphones delivered to their house. Previously we have only included desktops in our HTCondor pools. Our HTCondor deployment script now includes laptops. This presents some issues we need to deal with as we do NOT want laptops at home as part of the pools. This can work mostly OK by using: NETWORK_INTERFACE = xxx.yyy.* where xxx.yyy.* is the internal IP subnet space of our organisation. If a laptop at home boots up it will initially have a âhomeâ IP of 192.168.something, or maybe 10.0.something. So HTCondor will not even start. Once this laptop connects to work via VPN, then thatâs still OK. However if HTCondor is then somehow started it will be quite happy to join the pool, as it will have an IP xxx.yyy.zzz.*, where xxx.yyy.zzz.* is the specific subnet for VPN connections. So ideally it would be nice to be able to do something like the following on the execute nodes: NETWORK_INTERFACE_ALLOW = xxx.yyy.* NETWORK_INTERFACE_DENY = xxx.yyy.zzz.* We currently kludge around this by having the Central Manager Collector deny VPN IPs: ALLOW_READ = xxx.yyy.* ALLOW_WRITE = xxx.yyy.* DENY_READ = xxx.yyy.zzz.* DENY_WRITE = xxx.yyy.zzz.* so that laptops with a VPN IP are not âseenâ in the pool, even though they are running the HTCondor service. Thanks. Cheers Greg From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Todd Tannenbaum Sent: Thursday, 11 February 2021 11:49 PM To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>; Jean-Claude CHEVALEYRE <jean-claude.chevaleyre@xxxxxxxxxxxxxxxxx> Cc: Jean-Claude CHEVALEYRE <chevaleyre@xxxxxxxxxxxxxxxxx> Subject: Re: [HTCondor-users] Job finished with status 115 On 2/11/2021 3:34 AM, Jean-Claude CHEVALEYRE wrote: Hello, I have some Atlas jobs that are failling. I have look in the logs files. I can see by example for this jobs number 93742.0. This job finished with a status 115 . What does means exactly this status ? Hi Jean-Caude, Looking at your investigation below (thank you for including this), I think the confusion here is the job did not exit with a status 115. The condor_shadow process (a component of the HTCondor
service) exited with a status 115, but that is not the job process. To see the exit status for a job, you could look in the EventLog or use the condor_history command. Below I see that you grepped the event log and there is a Job Terminate event for job 93742.0... the exit status for that job will appear in the next line. In other words, events in the event
log are multi-line, and thus your grep did not show it. Alternatively, you can use the "condor_history" command. This command is similar to condor_q, but allows you to see attributes about jobs that have left the queue (due to completion or removal). From
your submit machine enter the following to see the exitcode: condor_history 93742.0 -limit 1 -af exitcode Or to see all attributes about this completed job do: condor_history 93742.0 -limit 1 -l See the condor_history manual page (man condor_history) for more options, and documentation about most of the available job attributes can be found in the Manual appendix here: Hope the above helps, Todd Bellow are some extract of logs outputs: [root@gridarcce01 log]# grep -RH '93742' arc/arex-jobs* | more arc/arex-jobs.log-20210211:2021-02-10 23:45:00 Finished - job id: 6PwKDm5cYTynOUEdEnzo691oABFKDmABFKDmzcfXDmDBFKDmDTZXHm, unix user: 41000:1307, name: "arc_pilot", owner: "/DC=ch/DC=cern/OU=Organic
Units/OU=Users/CN =atlpilo1/CN=614260/CN=Robot: ATLAS Pilot1", lrms: condor, queue: grid, lrmsid: 93742.gridarcce01 [root@gridarcce01 log]# grep -RH '93742' condor/EventLog | more condor/EventLog: 937428 - ResidentSetSize of job (KB) condor/EventLog:006 (24968.000.000) 12/18 10:32:49 Image size of job updated: 937424 condor/EventLog:006 (26125.000.000) 12/19 11:22:07 Image size of job updated: 937424 condor/EventLog:006 (26254.000.000) 12/19 16:32:57 Image size of job updated: 937424 condor/EventLog:006 (26254.000.000) 12/19 16:37:57 Image size of job updated: 937424 condor/EventLog: 937424 - ResidentSetSize of job (KB) condor/EventLog: 937420 - ResidentSetSize of job (KB) condor/EventLog:006 (71776.000.000) 01/21 00:35:38 Image size of job updated: 937428 condor/EventLog:006 (73442.000.000) 01/22 02:29:37 Image size of job updated: 937428 condor/EventLog: 937428 - ResidentSetSize of job (KB) condor/EventLog:006 (78058.000.000) 01/26 02:56:24 Image size of job updated: 937428 condor/EventLog:000 (93742.000.000) 02/09 04:12:28 Job submitted from host: <193.55.252.153:9618?addrs=193.55.252.153-9618&noUDP&sock=3115801_e73c_4> condor/EventLog:001 (93742.000.000) 02/09 19:03:03 Job executing on host: <193.55.252.169:9618?addrs=193.55.252.169-9618&noUDP&sock=2279_c86d_3> condor/EventLog:006 (93742.000.000) 02/09 19:03:11 Image size of job updated: 2304 condor/EventLog:006 (93742.000.000) 02/09 19:08:11 Image size of job updated: 67160 condor/EventLog:006 (93742.000.000) 02/09 19:13:12 Image size of job updated: 110340 condor/EventLog:006 (93742.000.000) 02/09 19:18:13 Image size of job updated: 1410420 condor/EventLog:006 (93742.000.000) 02/09 19:23:13 Image size of job updated: 1887892 condor/EventLog:006 (93742.000.000) 02/09 19:33:15 Image size of job updated: 1887892 condor/EventLog:005 (93742.000.000) 02/10 23:38:21 Job terminated. condor/ShadowLog.old:02/10/21 11:43:04 (93742.0) (3863434): Time to redelegate short-lived proxy to starter. condor/ShadowLog.old:02/10/21 23:38:21 (93742.0) (3863434): File transfer completed successfully. condor/ShadowLog.old:02/10/21 23:38:21 (93742.0) (3863434): Job 93742.0 terminated: exited with status 0 condor/ShadowLog.old:02/10/21 23:38:21 (93742.0) (3863434): WriteUserLog checking for event log rotation, but no lock condor/ShadowLog.old:02/10/21 23:38:21 (93742.0) (3863434): **** condor_shadow (condor_SHADOW) pid 3863434 EXITING WITH STATUS 115 [root@gridarcce01 log]# grep -RH '93742' condor/SchedLog | more condor/SchedLog:02/10/21 23:38:21 (pid:3115849) Shadow pid 3863434 for job 93742.0 exited with status 115 condor/SchedLog:02/10/21 23:38:21 (pid:3115849) Match record (slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <193.55.252.169:9618?addrs=193.55.252.169-9618&noUDP&sock=2279_c86d_3>
for group_ATLAS.atlasprd_score.atlasprd, 937 42.0) deleted Any ideas are welcome. Thanks Jean-Caude ------------------------------------------------------------------------ Jean-Claude Chevaleyre < Jean-Claude.Chevaleyre(at)clermont.in2p3.fr >
Laboratoire de Physique Clermont Campus Universitaire des CÃzeaux 4 Avenue Blaise Pascal TSA 60026 CS 60026 63178 AubiÃre Cedex Tel : 04 73 40 73 60 ------------------------------------------------------------------------- _______________________________________________HTCondor-users mailing listTo unsubscribe, send a message to
htcondor-users-request@xxxxxxxxxxx with asubject: UnsubscribeYou can also unsubscribe by visitinghttps://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:https://lists.cs.wisc.edu/archive/htcondor-users/ |