Hello,
We are trying unsuccessfully to get Singularity/Apptainer
to run on HTCondor.
On startup, all of our execute nodes show the following
message:
# condor_status -l
slot1@node006 | grep -i singu
HasSingularity = false
SingularityOfflineReason =
"Both SIF and Sandbox tests on startup failed"
SingularityVersion =
"apptainer version 1.4.0-1.el9"
StarterAbilityList =
"HasVM,HasTDP,HasSshd,HasReconnect,HasJobDeferral,HasSingularity,HasFileTransfer,HasJICLocalStdin,HasJICLocalConfig,HasPerFileEncryption,
HasJobTransferPlugins,HasTransferInputRemaps,HasSelfCheckpointTransfers,HasFileTransferPluginMethods"
We are running a cluster with HTCondor version:
$CondorVersion: 24.5.1
2025-02-28 BuildID: 789686 PackageID: 24.5.1-1 GitSHA:
3d98fee1 $
$CondorPlatform:
x86_64_AlmaLinux9 $
and apptainer version 1.4.0-1.el9. Please note also that
/usr/bin/singularity is symlinked to /usr/bin/apptainer.
More information from the StarterLog is shown below:
# cat /var/log/condor/StarterLog.testing
05/09/25 16:15:03 (pid:925) my_popenv: Failed to exec
/usr/bin/java, errno=2 (No such file or directory)
05/09/25 16:15:03 (pid:925) JavaDetect: failed to
execute /usr/bin/java -classpath /usr/share/condor:.
CondorJavaInfo old
05/09/25 16:15:03 (pid:925) DockerAPI::detect()
failed to detect the Docker version; assuming absent.
05/09/25 16:15:03 (pid:925) Attempting to run:
'/usr/bin/singularity /usr/bin/singularity --version'.
05/09/25 16:15:03 (pid:925) [singularity version]
apptainer version 1.4.0-1.el9
05/09/25 16:15:03 (pid:925) '/usr/bin/singularity
exec -C /usr/libexec/condor/singularity_test_sandbox
/exit_37' did not exit successfully (code 65280); stderr
is :
05/09/25 16:15:03 (pid:925) Singularity exec failed,
trying again without pid namespaces
05/09/25 16:15:03 (pid:925) '/usr/bin/singularity
exec --contain --ipc --cleanenv
/usr/libexec/condor/singularity_test_sandbox /exit_37'
did not exit successfully (code 65280); stderr is :
05/09/25 16:15:03 (pid:925) '/usr/bin/singularity
exec --contain --ipc --cleanenv
/usr/libexec/condor/exit_37.sif /exit_37' did not exit
successfully (code 65280); stderr is :
05/09/25 16:15:03 (pid:925) '/usr/bin/singularity
exec --contain --ipc --cleanenv
/usr/libexec/condor/singularity_test_sandbox
/get_user_ns' did not exit successfully (code 65280);
stderr is :
A few things jump out:
- This line looks odd, "Attempting to run:
'/usr/bin/singularity
/usr/bin/singularity --version'." It seems that
/usr/bin/singularity
occurs twice.
- Testing
the SIF and
sandbox
interactively
appears to
work:
[root@node006
~]#
/usr/bin/singularity
exec --contain
--ipc
--cleanenv
/usr/libexec/condor/singularity_test_sandbox
/exit_37
WARNING:
passwd file
doesn't exist
in container,
not updating
WARNING: group
file doesn't
exist in
container, not
updating
WARNING:
Skipping mount
/var/lib/apptainer/mnt/session/tmp [tmp]: /tmp doesn't exist in
container
WARNING:
Skipping mount
/var/lib/apptainer/mnt/session/var/tmp [tmp]: /var/tmp doesn't exist in
container
[root@node006
~]# echo $?
37
[root@node006
~]#
/usr/bin/singularity
exec --contain
--ipc
--cleanenv
/usr/libexec/condor/exit_37.sif
/exit_37
WARNING:
passwd file
doesn't exist
in container,
not updating
WARNING: group
file doesn't
exist in
container, not
updating
[root@node006
~]# echo $?
37
- Running
the last line
interactively
appears to
work:
[root@node006 ~]# /usr/bin/singularity exec --contain
--ipc
--cleanenv
/usr/libexec/condor/singularity_test_sandbox
/get_user_ns
WARNING:
passwd file
doesn't exist
in container,
not updating
WARNING: group
file doesn't
exist in
container, not
updating
WARNING:
Skipping mount
/var/lib/apptainer/mnt/session/tmp [tmp]: /tmp doesn't exist in
container
WARNING:
Skipping mount
/var/lib/apptainer/mnt/session/var/tmp [tmp]: /var/tmp doesn't exist in
container
4026531837
Any ideas how we can get this issue resolved so
we can run
Apptainer
containers?
Kind Regards,
Glen
==========================================
System Architect
Research Technology
Services
The
George Washington
University
44983
Knoll Square
Enterprise
Hall, 328L
Ashburn,
VA 20147
==========================================