[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] self-contained grid access point/submitter to HTCondorCE



Hi all,

we are looking into how to best set up a somewhat self-contained grid access point. I.e., a node from which a user can submit their jobs to a CondorCE and retrieve their job outputs 'easily'.

We prepared a node (aiming for a containerized environment for users) with master+collector+scheduler daemons running locally plus a gridmanager/ghap daemon(?). Idea would be, that one could submit grid jobs to the local collector, which relays them with the GHAP to a CondorCE as remote access point.

Unfortunately, test jobs [1] do not progress beyond the local queue. The job are picked up by the grid helper [2] - however, the ghap helper only sees the remote AP as always down :-/ (actually, I have not see IPv4,6 traffic towards the CE with tcpdump or the submit node/IPs in the CE logs)

Maybe there is a puzzle piece missing? ð

On the longer run, would be a Sci/WLCG token submission for users work with the ghap helpers? I.e. instead of x509*, export/include as a user BEARER_TOKEN_FILE in the submit?

Cheers,
  Thomas

[1]
universe = grid
grid_resource = condor grid-htcondorce0.desy.de localhost
use_x509userproxy = true
X509UserProxy=$ENV(X509_USER_PROXY)
executable = x.sh
output = stdout
error = stderr
log = logs
ShouldTransferFiles = YES
WhenToTransferOutput = ON_EXIT
+remote_jobuniverse = 5
+remote_requirements = True
+remote_ShouldTransferFiles = "YES"
+remote_WhenToTransferOutput = "ON_EXIT"
queue

[2]
   CGroup: /system.slice/condor.service
           ââ12562 /usr/sbin/condor_master -f
ââ12603 condor_procd -A /var/run/condor/procd_pipe -L /var/log/condor/ProcLog -R 1000000 -S 60 -C 25411
           ââ12605 condor_shared_port
           ââ12606 condor_collector
           ââ12607 condor_schedd
ââ12638 condor_gridmanager -f -C (Owner=?="grid"&&JobUniverse==9) -o grid -S /tmp/condor_g_scratch.0x55f349f65100.12607 ââ12643 /usr/sbin/condor_c-gahp -f -s grid-htcondorce0.desy.de -P localhost ââ12645 /usr/sbin/condor_c-gahp_worker_thread -f -s grid-htcondorce0.desy.de -P localhost ââ12646 /usr/sbin/condor_c-gahp_worker_thread -f -s grid-htcondorce0.desy.de -P localhost

[3]
01/26/23 16:35:15 [12638] Found job 9.0 --- inserting
01/26/23 16:35:15 [12638] Found job 8.0 --- inserting
01/26/23 16:35:15 [12638] Found job 11.0 --- inserting
01/26/23 16:35:15 [12638] Found job 7.0 --- inserting
01/26/23 16:35:15 [12638] Found job 10.0 --- inserting
01/26/23 16:35:15 [12638] (9.0) doEvaluateState called: gmState GM_INIT, remoteState 0 01/26/23 16:35:15 [12638] (8.0) doEvaluateState called: gmState GM_INIT, remoteState 0 01/26/23 16:35:15 [12638] (11.0) doEvaluateState called: gmState GM_INIT, remoteState 0 01/26/23 16:35:15 [12638] (7.0) doEvaluateState called: gmState GM_INIT, remoteState 0 01/26/23 16:35:15 [12638] (10.0) doEvaluateState called: gmState GM_INIT, remoteState 0
01/26/23 16:38:25 [12638] resource grid-htcondorce0.desy.de is still down


Attachment: smime.p7s
Description: S/MIME Cryptographic Signature