[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] HasSingularity Fails, Returns "false"



On 5/9/2025 12:43 PM, Glen MacLachlan via HTCondor-users wrote:
Just a quick update -- the issue was /etc/resolv.conf being unreadable to the condor user. Once the permissions were corrected, HasSingularity returned true. I'm including this in case someone else gets hit with the same issue.

Thanks for the update Glen!

Yes, I believe /etc/resolv.conf is normally permission 644....  i.e.  read permission for all users. Not sure how/why your system ended up with different permissions.

regards,
Todd





Issue:
[root@node006 condor]# su - condor -s /bin/bash -c '/usr/bin/singularity exec --contain --ipc --cleanenv /usr/libexec/condor/exit_37.sif /exit_37'
echo $?
INFO:    Cleanup error: while unmounting /var/lib/apptainer/mnt/session/final directory: no such file or directory, while unmounting /var/lib/apptainer/mnt/session/rootfs directory: no such file or directory
FATAL:   container creation failed: open /etc/resolv.conf: permission denied
255
[root@node006 condor]# ls -ltrh /etc/resolv.conf
-rw-------. 1 root root 632 May  9 17:26 /etc/resolv.conf

Resolution:
update permissions on /etc/resolv.conf to 644

[root@node006 ~]# condor_status -l slot1@node006 | grep -i sing
HasSingularity = true
SingularityUserNamespaces = true
SingularityVersion = "apptainer version 1.4.0-1.el9"
StarterAbilityList = "HasVM,HasSIF,HasTDP,HasSshd,HasContainer,HasDockerURL,HasReconnect,HasJobDeferral,HasSingularity,HasFileTransfer,HasSandboxImage,HasJICLocalStdin,HasPidNamespaces,HasJICLocalConfig,HasPerFileEncryption,HasJobTransferPlugins,HasTransferInputRemaps,HasSelfCheckpointTransfers,HasFileTransferPluginMethods"
[root@node006 ~]# su - condor -s /bin/bash -c '/usr/bin/singularity exec --contain --ipc --cleanenv /usr/libexec/condor/exit_37.sif /exit_37'
WARNING: passwd file doesn't exist in container, not updating
WARNING: group file doesn't exist in container, not updating
[root@node006 ~]# echo $?
37

This issue is resolved. Thanks!


Kind Regards,
Glen

==========================================
Glen MacLachlan, PhD
System Architect 
Research Technology Services
The George Washington University
44983 Knoll Square
Enterprise Hall, 328L
Ashburn, VA 20147
==========================================






On Fri, May 9, 2025 at 1:08âPM Glen MacLachlan <maclach@xxxxxxx> wrote:
Hello, 

We are trying unsuccessfully to get Singularity/Apptainer to run on HTCondor. 

On startup, all of our execute nodes show the following message:

# condor_status -l slot1@node006 | grep -i singu
HasSingularity = false
SingularityOfflineReason = "Both SIF and Sandbox tests on startup failed"
SingularityVersion = "apptainer version 1.4.0-1.el9"
StarterAbilityList = "HasVM,HasTDP,HasSshd,HasReconnect,HasJobDeferral,HasSingularity,HasFileTransfer,HasJICLocalStdin,HasJICLocalConfig,HasPerFileEncryption,
HasJobTransferPlugins,HasTransferInputRemaps,HasSelfCheckpointTransfers,HasFileTransferPluginMethods"


We are running a cluster with HTCondor version:
$CondorVersion: 24.5.1 2025-02-28 BuildID: 789686 PackageID: 24.5.1-1 GitSHA: 3d98fee1 $
$CondorPlatform: x86_64_AlmaLinux9 $

and apptainer version 1.4.0-1.el9. Please note also that /usr/bin/singularity is symlinked to /usr/bin/apptainer. 

More information from the StarterLog is shown below:
# cat /var/log/condor/StarterLog.testing
05/09/25 16:15:03 (pid:925) my_popenv: Failed to exec /usr/bin/java, errno=2 (No such file or directory)
05/09/25 16:15:03 (pid:925) JavaDetect: failed to execute /usr/bin/java -classpath /usr/share/condor:. CondorJavaInfo old
05/09/25 16:15:03 (pid:925) DockerAPI::detect() failed to detect the Docker version; assuming absent.
05/09/25 16:15:03 (pid:925) Attempting to run: '/usr/bin/singularity /usr/bin/singularity --version'.
05/09/25 16:15:03 (pid:925) [singularity version] apptainer version 1.4.0-1.el9
05/09/25 16:15:03 (pid:925) '/usr/bin/singularity exec -C /usr/libexec/condor/singularity_test_sandbox /exit_37' did not exit successfully (code 65280); stderr is :
05/09/25 16:15:03 (pid:925) Singularity exec failed, trying again without pid namespaces
05/09/25 16:15:03 (pid:925) '/usr/bin/singularity exec --contain --ipc --cleanenv /usr/libexec/condor/singularity_test_sandbox /exit_37' did not exit successfully (code 65280); stderr is :
05/09/25 16:15:03 (pid:925) '/usr/bin/singularity exec --contain --ipc --cleanenv /usr/libexec/condor/exit_37.sif /exit_37' did not exit successfully (code 65280); stderr is :
05/09/25 16:15:03 (pid:925) '/usr/bin/singularity exec --contain --ipc --cleanenv /usr/libexec/condor/singularity_test_sandbox /get_user_ns' did not exit successfully (code 65280); stderr is :

A few things jump out: 
  1. This line looks odd, "Attempting to run: '/usr/bin/singularity /usr/bin/singularity --version'." It seems that /usr/bin/singularity occurs twice. 
  2. Testing the SIF and sandbox interactively appears to work:
    [root@node006 ~]# /usr/bin/singularity exec --contain --ipc --cleanenv /usr/libexec/condor/singularity_test_sandbox /exit_37
    WARNING: passwd file doesn't exist in container, not updating
    WARNING: group file doesn't exist in container, not updating
    WARNING: Skipping mount /var/lib/apptainer/mnt/session/tmp [tmp]: /tmp doesn't exist in container
    WARNING: Skipping mount /var/lib/apptainer/mnt/session/var/tmp [tmp]: /var/tmp doesn't exist in container
    [root@node006 ~]# echo $?
    37
    [root@node006 ~]# /usr/bin/singularity exec --contain --ipc --cleanenv /usr/libexec/condor/exit_37.sif /exit_37
    WARNING: passwd file doesn't exist in container, not updating
    WARNING: group file doesn't exist in container, not updating
    [root@node006 ~]# echo $?
    37
  3. Running the last line interactively appears to work:
    [root@node006 ~]# /usr/bin/singularity exec --contain --ipc --cleanenv /usr/libexec/condor/singularity_test_sandbox /get_user_ns
    WARNING: passwd file doesn't exist in container, not updating
    WARNING: group file doesn't exist in container, not updating
    WARNING: Skipping mount /var/lib/apptainer/mnt/session/tmp [tmp]: /tmp doesn't exist in container
    WARNING: Skipping mount /var/lib/apptainer/mnt/session/var/tmp [tmp]: /var/tmp doesn't exist in container
    4026531837

Any ideas how we can get this issue resolved so we can run Apptainer containers?


Kind Regards,
Glen

==========================================
Glen MacLachlan, PhD
System Architect 
Research Technology Services
The George Washington University
44983 Knoll Square
Enterprise Hall, 328L
Ashburn, VA 20147
==========================================





_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe

Join us in June at Throughput Computing 25: https://urldefense.com/v3/__https://osg-htc.org/htc25__;!!Mak6IKo!JHm_wXDzi2ceSBoxO6-y1gPtMnyTwJynWdnwy8mBbIZBTgAJRhEU5YfQ_iONER-_osF7XQ57-TwE-L4vYOPXOWtOXoGx6ys$ 

The archives can be found at: https://www-auth.cs.wisc.edu/lists/htcondor-users/ 


-- 
Todd Tannenbaum <tannenba@xxxxxxxxxxx>  University of Wisconsin-Madison
Center for High Throughput Computing    Department of Computer Sciences
Calendar: https://tinyurl.com/yd55mtgd  1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132                   Madison, WI 53706-1685