Hello,Â
We are trying unsuccessfully to get Singularity/Apptainer to run on HTCondor.Â
On startup, all of our executeÂnodes show the following message:
# condor_status -l slot1@node006 | grep -i singu
HasSingularity = false
SingularityOfflineReason = "Both SIF and Sandbox tests on startup failed"
SingularityVersion = "apptainer version 1.4.0-1.el9"
StarterAbilityList = "HasVM,HasTDP,HasSshd,HasReconnect,HasJobDeferral,HasSingularity,HasFileTransfer,HasJICLocalStdin,HasJICLocalConfig,HasPerFileEncryption,
HasJobTransferPlugins,HasTransferInputRemaps,HasSelfCheckpointTransfers,HasFileTransferPluginMethods"
We are runningÂa cluster with HTCondor version:
$CondorVersion: 24.5.1 2025-02-28 BuildID: 789686 PackageID: 24.5.1-1 GitSHA: 3d98fee1 $
$CondorPlatform: x86_64_AlmaLinux9 $
and apptainer version 1.4.0-1.el9. Please note also that /usr/bin/singularity is symlinked to /usr/bin/apptainer.Â
More information from the StarterLog is shown below:
# cat /var/log/condor/StarterLog.testing
05/09/25 16:15:03 (pid:925) my_popenv: Failed to exec /usr/bin/java, errno=2 (No such file or directory)
05/09/25 16:15:03 (pid:925) JavaDetect: failed to execute /usr/bin/java -classpath /usr/share/condor:. CondorJavaInfo old
05/09/25 16:15:03 (pid:925) DockerAPI::detect() failed to detect the Docker version; assuming absent.
05/09/25 16:15:03 (pid:925) Attempting to run: '/usr/bin/singularity /usr/bin/singularity --version'.
05/09/25 16:15:03 (pid:925) [singularity version] apptainer version 1.4.0-1.el9
05/09/25 16:15:03 (pid:925) '/usr/bin/singularity exec -C /usr/libexec/condor/singularity_test_sandbox /exit_37' did not exit successfully (code 65280); stderr is :
05/09/25 16:15:03 (pid:925) Singularity exec failed, trying again without pid namespaces
05/09/25 16:15:03 (pid:925) '/usr/bin/singularity exec --contain --ipc --cleanenv /usr/libexec/condor/singularity_test_sandbox /exit_37' did not exit successfully (code 65280); stderr is :
05/09/25 16:15:03 (pid:925) '/usr/bin/singularity exec --contain --ipc --cleanenv /usr/libexec/condor/exit_37.sif /exit_37' did not exit successfully (code 65280); stderr is :
05/09/25 16:15:03 (pid:925) '/usr/bin/singularity exec --contain --ipc --cleanenv /usr/libexec/condor/singularity_test_sandbox /get_user_ns' did not exit successfully (code 65280); stderr is :
A few things jump out:Â
- This line looks odd, "Attempting to run: '/usr/bin/singularity /usr/bin/singularity --version'." It seems that /usr/bin/singularity occurs twice.Â
- Testing the SIF and sandbox interactively appears to work:
[root@node006 ~]# /usr/bin/singularity exec --contain --ipc --cleanenv /usr/libexec/condor/singularity_test_sandbox /exit_37
WARNING: passwd file doesn't exist in container, not updating
WARNING: group file doesn't exist in container, not updating
WARNING: Skipping mount /var/lib/apptainer/mnt/session/tmp [tmp]: /tmp doesn't exist in container
WARNING: Skipping mount /var/lib/apptainer/mnt/session/var/tmp [tmp]: /var/tmp doesn't exist in container
[root@node006 ~]# echo $?
37
[root@node006 ~]# /usr/bin/singularity exec --contain --ipc --cleanenv /usr/libexec/condor/exit_37.sif /exit_37
WARNING: passwd file doesn't exist in container, not updating
WARNING: group file doesn't exist in container, not updating
[root@node006 ~]# echo $?
37 - Running the last line interactively appears to work:
[root@node006 ~]# /usr/bin/singularity exec --contain --ipc --cleanenv /usr/libexec/condor/singularity_test_sandbox /get_user_ns
WARNING: passwd file doesn't exist in container, not updating
WARNING: group file doesn't exist in container, not updating
WARNING: Skipping mount /var/lib/apptainer/mnt/session/tmp [tmp]: /tmp doesn't exist in container
WARNING: Skipping mount /var/lib/apptainer/mnt/session/var/tmp [tmp]: /var/tmp doesn't exist in container
4026531837
Any ideas how we can get this issue resolved so we can run Apptainer containers?
Kind Regards,
Glen
==========================================
System ArchitectÂ
Research Technology Services
The George Washington University
44983 Knoll Square
Enterprise Hall, 328L
Ashburn, VA 20147
==========================================