Hi all,we are looking into how to best set up a somewhat self-contained grid access point. I.e., a node from which a user can submit their jobs to a CondorCE and retrieve their job outputs 'easily'.
We prepared a node (aiming for a containerized environment for users) with master+collector+scheduler daemons running locally plus a gridmanager/ghap daemon(?). Idea would be, that one could submit grid jobs to the local collector, which relays them with the GHAP to a CondorCE as remote access point.
Unfortunately, test jobs [1] do not progress beyond the local queue. The job are picked up by the grid helper [2] - however, the ghap helper only sees the remote AP as always down :-/ (actually, I have not see IPv4,6 traffic towards the CE with tcpdump or the submit node/IPs in the CE logs)
Maybe there is a puzzle piece missing? ðOn the longer run, would be a Sci/WLCG token submission for users work with the ghap helpers? I.e. instead of x509*, export/include as a user BEARER_TOKEN_FILE in the submit?
Cheers,
Thomas
[1]
universe = grid
grid_resource = condor grid-htcondorce0.desy.de localhost
use_x509userproxy = true
X509UserProxy=$ENV(X509_USER_PROXY)
executable = x.sh
output = stdout
error = stderr
log = logs
ShouldTransferFiles = YES
WhenToTransferOutput = ON_EXIT
+remote_jobuniverse = 5
+remote_requirements = True
+remote_ShouldTransferFiles = "YES"
+remote_WhenToTransferOutput = "ON_EXIT"
queue
[2]
CGroup: /system.slice/condor.service
ââ12562 /usr/sbin/condor_master -f
ââ12603 condor_procd -A /var/run/condor/procd_pipe -L
/var/log/condor/ProcLog -R 1000000 -S 60 -C 25411
ââ12605 condor_shared_port
ââ12606 condor_collector
ââ12607 condor_schedd
ââ12638 condor_gridmanager -f -C
(Owner=?="grid"&&JobUniverse==9) -o grid -S
/tmp/condor_g_scratch.0x55f349f65100.12607
ââ12643 /usr/sbin/condor_c-gahp -f -s
grid-htcondorce0.desy.de -P localhost
ââ12645 /usr/sbin/condor_c-gahp_worker_thread -f -s
grid-htcondorce0.desy.de -P localhost
ââ12646 /usr/sbin/condor_c-gahp_worker_thread -f -s
grid-htcondorce0.desy.de -P localhost
[3] 01/26/23 16:35:15 [12638] Found job 9.0 --- inserting 01/26/23 16:35:15 [12638] Found job 8.0 --- inserting 01/26/23 16:35:15 [12638] Found job 11.0 --- inserting 01/26/23 16:35:15 [12638] Found job 7.0 --- inserting 01/26/23 16:35:15 [12638] Found job 10.0 --- inserting01/26/23 16:35:15 [12638] (9.0) doEvaluateState called: gmState GM_INIT, remoteState 0 01/26/23 16:35:15 [12638] (8.0) doEvaluateState called: gmState GM_INIT, remoteState 0 01/26/23 16:35:15 [12638] (11.0) doEvaluateState called: gmState GM_INIT, remoteState 0 01/26/23 16:35:15 [12638] (7.0) doEvaluateState called: gmState GM_INIT, remoteState 0 01/26/23 16:35:15 [12638] (10.0) doEvaluateState called: gmState GM_INIT, remoteState 0
01/26/23 16:38:25 [12638] resource grid-htcondorce0.desy.de is still down
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature