[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] HasSingularity Fails, Returns "false"



Hello,Â

We are trying unsuccessfully to get Singularity/Apptainer to run on HTCondor.Â

On startup, all of our executeÂnodes show the following message:

# condor_status -l slot1@node006 | grep -i singu
HasSingularity = false
SingularityOfflineReason = "Both SIF and Sandbox tests on startup failed"
SingularityVersion = "apptainer version 1.4.0-1.el9"
StarterAbilityList = "HasVM,HasTDP,HasSshd,HasReconnect,HasJobDeferral,HasSingularity,HasFileTransfer,HasJICLocalStdin,HasJICLocalConfig,HasPerFileEncryption,
HasJobTransferPlugins,HasTransferInputRemaps,HasSelfCheckpointTransfers,HasFileTransferPluginMethods"


We are runningÂa cluster with HTCondor version:
$CondorVersion: 24.5.1 2025-02-28 BuildID: 789686 PackageID: 24.5.1-1 GitSHA: 3d98fee1 $
$CondorPlatform: x86_64_AlmaLinux9 $

and apptainer version 1.4.0-1.el9. Please note also that /usr/bin/singularity is symlinked to /usr/bin/apptainer.Â

More information from the StarterLog is shown below:
# cat /var/log/condor/StarterLog.testing
05/09/25 16:15:03 (pid:925) my_popenv: Failed to exec /usr/bin/java, errno=2 (No such file or directory)
05/09/25 16:15:03 (pid:925) JavaDetect: failed to execute /usr/bin/java -classpath /usr/share/condor:. CondorJavaInfo old
05/09/25 16:15:03 (pid:925) DockerAPI::detect() failed to detect the Docker version; assuming absent.
05/09/25 16:15:03 (pid:925) Attempting to run: '/usr/bin/singularity /usr/bin/singularity --version'.
05/09/25 16:15:03 (pid:925) [singularity version] apptainer version 1.4.0-1.el9
05/09/25 16:15:03 (pid:925) '/usr/bin/singularity exec -C /usr/libexec/condor/singularity_test_sandbox /exit_37' did not exit successfully (code 65280); stderr is :
05/09/25 16:15:03 (pid:925) Singularity exec failed, trying again without pid namespaces
05/09/25 16:15:03 (pid:925) '/usr/bin/singularity exec --contain --ipc --cleanenv /usr/libexec/condor/singularity_test_sandbox /exit_37' did not exit successfully (code 65280); stderr is :
05/09/25 16:15:03 (pid:925) '/usr/bin/singularity exec --contain --ipc --cleanenv /usr/libexec/condor/exit_37.sif /exit_37' did not exit successfully (code 65280); stderr is :
05/09/25 16:15:03 (pid:925) '/usr/bin/singularity exec --contain --ipc --cleanenv /usr/libexec/condor/singularity_test_sandbox /get_user_ns' did not exit successfully (code 65280); stderr is :

A few things jump out:Â
  1. This line looks odd, "Attempting to run: '/usr/bin/singularity /usr/bin/singularity --version'." It seems that /usr/bin/singularity occurs twice.Â
  2. Testing the SIF and sandbox interactively appears to work:
    [root@node006 ~]# /usr/bin/singularity exec --contain --ipc --cleanenv /usr/libexec/condor/singularity_test_sandbox /exit_37
    WARNING: passwd file doesn't exist in container, not updating
    WARNING: group file doesn't exist in container, not updating
    WARNING: Skipping mount /var/lib/apptainer/mnt/session/tmp [tmp]: /tmp doesn't exist in container
    WARNING: Skipping mount /var/lib/apptainer/mnt/session/var/tmp [tmp]: /var/tmp doesn't exist in container
    [root@node006 ~]# echo $?
    37
    [root@node006 ~]# /usr/bin/singularity exec --contain --ipc --cleanenv /usr/libexec/condor/exit_37.sif /exit_37
    WARNING: passwd file doesn't exist in container, not updating
    WARNING: group file doesn't exist in container, not updating
    [root@node006 ~]# echo $?
    37
  3. Running the last line interactively appears to work:
    [root@node006 ~]# /usr/bin/singularity exec --contain --ipc --cleanenv /usr/libexec/condor/singularity_test_sandbox /get_user_ns
    WARNING: passwd file doesn't exist in container, not updating
    WARNING: group file doesn't exist in container, not updating
    WARNING: Skipping mount /var/lib/apptainer/mnt/session/tmp [tmp]: /tmp doesn't exist in container
    WARNING: Skipping mount /var/lib/apptainer/mnt/session/var/tmp [tmp]: /var/tmp doesn't exist in container
    4026531837

Any ideas how we can get this issue resolved so we can run Apptainer containers?


Kind Regards,
Glen

==========================================
Glen MacLachlan, PhD
System ArchitectÂ
Research Technology Services
The George Washington University
44983 Knoll Square
Enterprise Hall, 328L
Ashburn, VA 20147
==========================================