Can you send the portion of GridmanagerLog.hbaig file in the HTCondor log directory from time around one of these jobs going to held status?
The "ImportError: No module named siteâ is suspicious, and odd that itâs not printed when you run remote_gahp on the command line.
The RESOURCE USAGE POLICY banner could also be the cause. Such banners are usually suppressed when ssh is given a command to run, and the output of remote_gahp is interpreted by the HTCondor gridmanager daemon, which isnât expecting the banner.
- Jaime
Hi Again,
I also tried to monitor the status of submitted and the result are given below that might be helpful for you to figure out what is going on:
$ condor_q -hold
-- Schedd: <hostname> : <127.0.0.1:11000?... @ 01/07/21 18:01:30
ID OWNER HELD_SINCE HOLD_REASON
46.0 hbaig
1/6 13:34 Failed to start GAHP: Agent pid 3832\nImportError: No module named site\nAgent pid 3832 killed\n
Thanks for any help.
regards
Hasan
Hello,
Thanks for the response. I tried to run the command you suggested and got the following response
Agent pid 14621
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!! RESOURCE USAGE POLICY !!!
!!! Uploading and/or processing of PHI !!!
!!! or other protected data in the HPC !!!
!!! environment is prohibited. !!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
/home/FCAM/hbaig/bosco/glite/bin/batch_gahp.symlink: /home/FCAM/hbaig/bosco/glite/bin/../lib/condor/libglobus_common.so.0: no version information available (required by /home/FCAM/hbaig/bosco/glite/bin/batch_gahp.symlink)
$GahpVersion: 1.8.0 Mar 31 2008 INFN\ blahpd\ (poly,new_esc_format) $
I am able to connect to remote server where bosco is installed and donât understand how could it be an SSH issue.
Sorry for asking naive questions but I am totally a beginner and do not understand how to proceed with it. Thanks for your due help and responses.
regards
Hasan
I am working on a web-based tool which take jobs from a user and submit it to bosco resources (compute nodes). I am using a bosco version (condor 8.8.12) on Linux CentOS 7. The web interface allows a user to add a bosco pool which user can use to submit jobs.
However, when I try to submit a job, it fails. I tried to test the pool as well by using the following command:
bosco_cluster --test
It gives me the following GAHP error:
This a probably an ssh failure (network, authentication, or authorization). Bosco runs the following command to access the remote cluster submit host:
<sbin>/remote_gahp <user>@<hostname> batch_gahp
You can run it on the command line to get more details about what's going wrong. remote_gahp is a bash script, so you can dig in further, if necessary.
- Jaime
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to
htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to
htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
|