Me again… I removed USE_PID_NAMESPACES from the startd machine : job was correctly removed this time… Any way I could find out what’s wrong with PID namespaces ?
The program /usr/libexec/condor/condor_pid_ns_init does not seem very verbose when launched manually… but it seems to work (but without any doc about this specific one, it’s hard to say) [root@dev7242 condor]# echo $$ 28720 [root@dev7242 condor]# /usr/libexec/condor/condor_pid_ns_init /bin/bash [root@dev7242 condor]# echo $$ 12295 Thanks De : HTCondor-users
[mailto:htcondor-users-bounces@xxxxxxxxxxx] De la part de SCHAER Frederic I tried running condor as root and still get the same pid namespace issue, but maybe that’s not the real cause… 06/27/14 11:42:52 (112.0) (10538): ERROR "Error from
slot1@xxxxxxxxxxxxxxxxxxxxxxx: Starter configured to use PID NAMESPACES, but libexec/condor_pid_ns_init did not run properly" at line 558 in file /slots/01/dir_36628/userdir/src/condor _shadow.V6.1/pseudo_ops.cpp 06/27/14 11:42:53 Result of reading /etc/issue: Scientific Linux release 6.5 (Carbon) File is there though : [root@dev7242 condor]# ll /usr/libexec/condor/condor_pid_ns_init -rwxr-xr-x 1 root root 8576 Jun 20 18:27 /usr/libexec/condor/condor_pid_ns_init ? De : HTCondor-users
[mailto:htcondor-users-bounces@xxxxxxxxxxx]
De la part de SCHAER Frederic Hi, Attached is the condor-generated job, as I submitted things through an ARC CE. I run condor-8.2.0-254849.x86_64
The arc CE xrsl is : & (executable="testarc.sh") (inputFiles=("testarc.sh" "")) (stdout="stdout.txt") (stderr="stderr.txt") (count=1) (memory=100) (gmlog=".arc") And the testarc.sh just does some things like “env”, “mount” and a very long sleep… I wanted to test job killing (memory, walltime…) Since I had to find the condor-generated log, I also found this in the logs : ... 007 (106.000.000) 06/25 16:42:03 Shadow exception! Error from
slot1@xxxxxxxxxxxxxxxxxxxxxxx: Starter configured to use PID NAMESPACES, but libexec/condor_pid_ns_init did not run properly 0 - Run Bytes Sent By Job 0 - Run Bytes Received By Job ... 001 (106.000.000) 06/25 16:45:53 Job executing on host: <192.54.207.242:60981> ... 007 (106.000.000) 06/25 16:50:53 Shadow exception! Error from
slot1@xxxxxxxxxxxxxxxxxxxxxxx: Starter configured to use PID NAMESPACES, but libexec/condor_pid_ns_init did not run properly 0 - Run Bytes Sent By Job 19085 - Run Bytes Received By Job ... 001 (106.000.000) 06/25 16:52:53 Job executing on host: <192.54.207.242:60981> ... 007 (106.000.000) 06/25 16:57:53 Shadow exception! Error from
slot1@xxxxxxxxxxxxxxxxxxxxxxx: Starter configured to use PID NAMESPACES, but libexec/condor_pid_ns_init did not run properly 0 - Run Bytes Sent By Job 19085 - Run Bytes Received By Job This goes on for a very long time, until I guess je job/sleep ends. I have “USE_PID_NAMESPACES = true” in the startd config.d directory I configured condor to run as condor and not root as I read it’s just dropping privileges (and running as root prevents benchmark from succeeding at start) and the CONDOR_IDS variable is correctly
defined to the condor uid/gid, but I realize the condor UID is different on the startd machine than on the scheduler and collector ones : might that be an issue ? Regards De : HTCondor-users
[mailto:htcondor-users-bounces@xxxxxxxxxxx]
De la part de Greg Thain On 06/26/2014 05:20 AM, SCHAER Frederic wrote:
|