Hi again! I have uninstalled
condor and installed it again (version condor 7.4.3 x86 64-LINUX_DEBIAN50) on Ubuntu 4.4.3
platform. I get once again the
same problem. The directory
/var/run/condor does not exist (is not created) and therefore I get the error
message, 11/03 09:55:44 error
opening watchdog pipe /var/run/condor/procd_pipe.STARTD.watchdog: No such file
or directory (2) Furthermore the
daemon STARTD is not started automatically after rebooting. I have to start it manually.
Although STARTD is included in the DAEMON_LIST variable. root@noc-desktop:~#
condor_config_val -v DAEMON_LIST DAEMON_LIST: MASTER,
STARTD Defined in
'/etc/condor/condor_config.local', line 33. Has anyone used the condor 7.4.3 x86 64-LINUX_DEBIAN50 version
before on the ubuntu platform? How did it work? Should I install
another condor version instead? As I mentioned before,
I have installed condor 7.4.3 x86 64-LINUX_DEBIAN50 on both Debian 2.6.32 and
Debian 2.6.26-25lenny1 and it’s working fine. Thanks, /Sónia Från:
condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] För
Sónia Liléo Hello, I have installed
condor 7.4.3 x86 64-LINUX_DEBIAN50 on Ubuntu. However I have
noticed that there is no /var/run/condor directory. Therefore procd_pipe.STARTD.watchdog is not created. Why is the
/var/run/condor directory missing? I have tried to
uninstall condor in order to install it again but this it is not possible since
the directory /var/run/condor does not exist. FATAL: Required
directory /var/run/condor does not exist, or is not a directory. I have tried to
create this directory. Then the files procd_pipe.STARTD and
procd_pipe.STARTD.watchdog are created but there is no condor.pid. The StartLog
registers the following, 11/02 20:09:50 mkfifo
of /var/run/condor/procd_pipe.STARTD.2320.0 error: Permission denied (13) 11/02 20:09:50 failed
to initialize named pipe at /var/run/condor/procd_pipe.STARTD.2320.0 11/02 20:09:50
LocalClient: error initializing NamedPipeReader 11/02 20:09:50
ProcFamilyClient: failed to start connection with ProcD 11/02 20:09:50
register_subfamily: ProcD communication error 11/02 20:09:50
Create_Process: error registering family for pid 2926 11/02 20:09:50
Create_Process(/usr/sbin/condor_starter): child failed because it failed to
register itself with the ProcD 11/02 20:09:50 slot1:
ERROR: exec_starter failed! 11/02 20:09:50 slot1:
ERROR: exec_starter returned 0 What should I do? Which condor version
should be installed on Ubuntu platform? I have installed
condor 7.4.3 x86 64-LINUX_DEBIAN50 on both Debian 2.6.32 and Debian
2.6.26-25lenny1 and it’s working fine. Regards, Sónia Från:
condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] För
Sónia Liléo Hi! What does the following error message mean? 11/01 20:16:17 error opening watchdog pipe
/var/run/condor/procd_pipe.STARTD.watchdog: No such file or directory (2) StartLog, execute node 11/01 20:16:16 slot1: Request accepted. 11/01 20:16:16 slot1: Remote owner is
o2f_sonlil@xxxxxxxxxxxxxxxxxxxx 11/01 20:16:16 slot1: State change:
claiming protocol successful 11/01 20:16:16 slot1: Changing state:
Matched -> Claimed 11/01 20:16:16 slot1: Started ClaimLease
timer (17) w/ 1800 second lease duration 11/01 20:16:17 slot1: Got activate_claim
request from shadow (<10.110.44.78:55118>) 11/01 20:16:17 slot1: Read request ad and
starter from shadow. 11/01 20:16:17 Swap space: 917496 11/01 20:16:17 13367628 kbytes available
for "/var/lib/condor/execute" 11/01 20:16:17 slot1: Total execute space:
13362508 11/01 20:16:17 13367628 kbytes available
for "/var/lib/condor/execute" 11/01 20:16:17 slot2: Total execute space:
13362508 11/01 20:16:17 slot1: Remote job ID is
116.0 11/01 20:16:17 slot1: Remote global job ID
is o2f-sth-lap-016.un.dr.dgcsystems.net#116.0#1288638373 11/01 20:16:17 slot1: JobLeaseDuration
defined in job ClassAd: 1200 11/01 20:16:17 slot1: Resetting ClaimLease
timer (17) with new duration 11/01 20:16:17 slot1: Sending Machine Ad to
Starter 11/01 20:16:17 slot1: About to
Create_Process "condor_starter -f -a slot1
o2f-sth-lap-014.un.dr.dgcsystems.net" 11/01 20:16:17 Create_Process: using fast
clone() to create child process. 11/01 20:16:17 error opening watchdog
pipe /var/run/condor/procd_pipe.STARTD.watchdog: No such file or directory (2) 11/01 20:16:17 ProcFamilyClient: error
initializing LocalClient 11/01 20:16:17 ProcFamilyProxy: error
initializing ProcFamilyClient 11/01 20:16:17 ERROR "ProcD has
failed" at line 599 in file proc_family_proxy.cpp 11/01 20:16:17 CronMgr: 0 jobs alive 11/01 20:16:17 slot1: Canceled ClaimLease
timer (17) 11/01 20:16:17 slot1: Changing state and
activity: Claimed/Idle -> Preempting/Killing 11/01 20:16:17 Entered vacate_client
<10.110.44.79:53584> o2f-sth-lap-014.un.dr.dgcsystems.net... 11/01 20:16:17 slot1: State change: No
preempting claim, returning to owner 11/01 20:16:17 slot1: Changing state and
activity: Preempting/Killing -> Owner/Idle 11/01 20:16:17 slot1: State change:
IS_OWNER is false 11/01 20:16:17 slot1: Changing state: Owner
-> Unclaimed 11/01 20:16:17 startd exiting because of
fatal exception. StarterLog, execute node 11/01 20:20:30 Reading from /proc/cpuinfo 11/01 20:20:30 Found: Physical-IDs:False;
Core-IDs:False 11/01 20:20:30 Using processor count: 2
processors, 2 CPUs, 0 HTs 11/01 20:20:30 Reading condor configuration
from '/etc/condor/condor_config' If I do condor_restart –startd at
this execute node, I get Can't connect to local startd Is this “Found: Physical-IDs:False;
Core-IDs:False” a problem? Regards, Sónia Sónia Liléo |
_______________________________________________ Condor-users mailing list To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/condor-users The archives can be found at: https://lists.cs.wisc.edu/archive/condor-users/