Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Jobs Still not returning any output
- Date: Tue, 25 Oct 2005 01:05:03 +0100
- From: "Chris Miles" <chrismiles@xxxxxxxxxxxxxxxx>
- Subject: Re: [Condor-users] Jobs Still not returning any output
Ok. Sorted my matching problem.
Here is the output after the job.
-rw-r--r-- 1 condor users 0 2005-10-25 00:53 error_0.out
-rw-r--r-- 1 condor users 0 2005-10-25 00:53 error_1.out
-rw-r--r-- 1 condor users 0 2005-10-25 00:53 error_2.out
-rw-r--r-- 1 condor users 0 2005-10-25 00:53 error_3.out
-rw-r--r-- 1 condor users 0 2005-10-25 00:53 error_4.out
-rw-r--r-- 1 condor users 239 2005-10-24 23:17 hello.sub
-rwxr-xr-x 1 condor users 10457 2005-10-11 17:04 helloworld
-rw-r--r-- 1 condor users 4450 2005-10-25 00:53 log.out
-rw-r--r-- 1 condor users 137 2005-10-11 17:03 Main.cpp
-rw-r--r-- 1 condor users 0 2005-10-25 00:53 output_0.out
-rw-r--r-- 1 condor users 0 2005-10-25 00:53 output_1.out
-rw-r--r-- 1 condor users 0 2005-10-25 00:53 output_2.out
-rw-r--r-- 1 condor users 0 2005-10-25 00:53 output_3.out
-rw-r--r-- 1 condor users 0 2005-10-25 00:53 output_4.out
Here is the StarterLog for the only node X86_64 machine in the pool just
now.
10/25 01:51:33 ** condor_starter (CONDOR_STARTER) STARTING UP
10/25 01:51:33 ** /home/condor/release/sbin/condor_starter
10/25 01:51:33 ** $CondorVersion: 6.7.10 Aug 3 2005 $
10/25 01:51:33 ** $CondorPlatform: I386-LINUX_RH9 $
10/25 01:51:33 ** PID = 13889
10/25 01:51:33 ******************************************************
10/25 01:51:33 Using config file: /home/condor/condor_config
10/25 01:51:33 Using local config files:
/home/condor/release/etc/node1.local
10/25 01:51:33 DaemonCore: Command Socket at <192.168.1.101:36023>
10/25 01:51:33 Done setting resource limits
10/25 01:51:33 Communicating with shadow <192.168.1.1:60161>
10/25 01:51:33 Submitting machine is "mgmnt.cluster.int"
10/25 01:51:33 File transfer completed successfully.
10/25 01:51:34 Starting a VANILLA universe job with ID: 7.4
10/25 01:51:34 IWD: /home/condor/hosts/node1/execute/dir_13889
10/25 01:51:34 Output file:
/home/condor/hosts/node1/execute/dir_13889/output_4.out
10/25 01:51:34 Error file:
/home/condor/hosts/node1/execute/dir_13889/error_4.out
10/25 01:51:34 About to exec
/home/condor/hosts/node1/execute/dir_13889/condor_exec.exe
10/25 01:51:34 Create_Process succeeded, pid=13891
10/25 01:51:34 Process exited, pid=13891, status=0
10/25 01:51:34 Got SIGQUIT. Performing fast shutdown.
10/25 01:51:34 ShutdownFast all jobs.
10/25 01:51:34 **** condor_starter (condor_STARTER) EXITING WITH STATUS 0
ShadowLog from submission machine (central manager)
10/25 00:53:43 ******************************************************
10/25 00:53:43 ** condor_shadow (CONDOR_SHADOW) STARTING UP
10/25 00:53:43 ** /home/condor/release/sbin/condor_shadow
10/25 00:53:43 ** $CondorVersion: 6.7.10 Aug 3 2005 $
10/25 00:53:43 ** $CondorPlatform: I386-LINUX_RH9 $
10/25 00:53:43 ** PID = 17843
10/25 00:53:43 ******************************************************
10/25 00:53:43 Using config file: /home/condor/etc/condor_config
10/25 00:53:43 Using local config files:
/home/condor/release/etc/thebeast.local
10/25 00:53:43 DaemonCore: Command Socket at <192.168.1.1:60161>
10/25 00:53:43 Initializing a VANILLA shadow for job 7.4
10/25 00:53:43 (7.4) (17843): Request to run on <192.168.1.101:35998> was
ACCEPTED
10/25 00:53:44 (7.4) (17843): Job 7.4 terminated: exited with status 0
10/25 00:53:44 (7.4) (17843): **** condor_shadow (condor_SHADOW) EXITING
WITH STATUS 100
(There is a time scew)
condor@thebeast:~/jobs/helloworld> date
Tue Oct 25 00:58:45 BST 2005
condor@thebeast:~/jobs/helloworld> ssh node1
Last login: Tue Oct 25 01:47:15 2005 from mgmnt.cluster.int
condor@node1:~> date
Tue Oct 25 01:56:41 BST 2005
condor@node1:~>
ScheddLog from submitting machine (central manager)
10/25 00:53:16 (pid:16840) DaemonCore: Command received via UDP from host
<192.168.1.1:37278>
10/25 00:53:16 (pid:16840) DaemonCore: received command 421 (RESCHEDULE),
calling handler (reschedule_negotiator)
10/25 00:53:16 (pid:16840) Sent ad to central manager for
condor@xxxxxxxxxxxxxxxxxxxx
10/25 00:53:16 (pid:16840) Sent ad to 1 collectors for
condor@xxxxxxxxxxxxxxxxxxxx
10/25 00:53:16 (pid:16840) Called reschedule_negotiator()
10/25 00:53:28 (pid:16840) DaemonCore: Command received via TCP from host
<192.168.1.1:60128>
10/25 00:53:28 (pid:16840) DaemonCore: received command 416 (NEGOTIATE),
calling handler (negotiate)
10/25 00:53:28 (pid:16840) Negotiating for owner:
condor@xxxxxxxxxxxxxxxxxxxx
10/25 00:53:28 (pid:16840) Checking consistency running and runnable jobs
10/25 00:53:28 (pid:16840) Tables are consistent
10/25 00:53:28 (pid:16840) Out of servers - 1 jobs matched, 4 jobs idle, 1
jobs rejected
10/25 00:53:30 (pid:16840) Starting add_shadow_birthdate(7.0)
10/25 00:53:30 (pid:16840) Started shadow for job 7.0 on
"<192.168.1.101:35998>", (shadow pid = 17811)
10/25 00:53:30 (pid:16840) Sent ad to central manager for
condor@xxxxxxxxxxxxxxxxxxxx
10/25 00:53:30 (pid:16840) Sent ad to 1 collectors for
condor@xxxxxxxxxxxxxxxxxxxx
10/25 00:53:31 (pid:16840) Shadow pid 17811 for job 7.0 exited with status
100
10/25 00:53:33 (pid:16840) Starting add_shadow_birthdate(7.1)
10/25 00:53:33 (pid:16840) Started shadow for job 7.1 on
"<192.168.1.101:35998>", (shadow pid = 17821)
10/25 00:53:34 (pid:16840) Shadow pid 17821 for job 7.1 exited with status
100
10/25 00:53:35 (pid:16840) Sent ad to central manager for
condor@xxxxxxxxxxxxxxxxxxxx
10/25 00:53:35 (pid:16840) Sent ad to 1 collectors for
condor@xxxxxxxxxxxxxxxxxxxx
10/25 00:53:36 (pid:16840) Starting add_shadow_birthdate(7.2)
10/25 00:53:36 (pid:16840) Started shadow for job 7.2 on
"<192.168.1.101:35998>", (shadow pid = 17826)
10/25 00:53:38 (pid:16840) Shadow pid 17826 for job 7.2 exited with status
100
10/25 00:53:40 (pid:16840) Sent ad to central manager for
condor@xxxxxxxxxxxxxxxxxxxx
10/25 00:53:40 (pid:16840) Sent ad to 1 collectors for
condor@xxxxxxxxxxxxxxxxxxxx
10/25 00:53:40 (pid:16840) Starting add_shadow_birthdate(7.3)
10/25 00:53:40 (pid:16840) Started shadow for job 7.3 on
"<192.168.1.101:35998>", (shadow pid = 17836)
10/25 00:53:41 (pid:16840) Shadow pid 17836 for job 7.3 exited with status
100
10/25 00:53:43 (pid:16840) Starting add_shadow_birthdate(7.4)
10/25 00:53:43 (pid:16840) Started shadow for job 7.4 on
"<192.168.1.101:35998>", (shadow pid = 17843)
10/25 00:53:44 (pid:16840) Shadow pid 17843 for job 7.4 exited with status
100
10/25 00:53:44 (pid:16840) match (<192.168.1.101:35998>#1130201383#2) out of
jobs (cluster id 7); relinquishing
10/25 00:53:44 (pid:16840) Sent RELEASE_CLAIM to startd on
<192.168.1.101:35998>
10/25 00:53:44 (pid:16840) Match record (<192.168.1.101:35998>, 7, -1)
deleted
10/25 00:53:45 (pid:16840) DaemonCore: Command received via TCP from host
<192.168.1.101:36027>
10/25 00:53:45 (pid:16840) DaemonCore: received command 443
(VACATE_SERVICE), calling handler (vacate_service)
10/25 00:53:45 (pid:16840) Got VACATE_SERVICE from <192.168.1.101:36027>
10/25 00:53:45 (pid:16840) Sent owner (0 jobs) ad to 1 collectors
----- Original Message -----
From: "Erik Paulson" <epaulson@xxxxxxxxxxx>
To: "Condor-Users Mail List" <condor-users@xxxxxxxxxxx>
Sent: Monday, October 24, 2005 6:41 PM
Subject: Re: [Condor-users] Jobs Still not returning any output
On Mon, Oct 24, 2005 at 06:36:01PM +0100, Chris Miles wrote:
StartLog from execute node
Actually, it's the StarterLog that we need to take a look at, not
the StartLog. We're also going to need to see one from about the same
time as a job was running - the log from below never had a job run on
that machine during the 5 minutes the log covers.