Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] Problem with test job - possibly with output file
- Date: Mon, 15 Oct 2007 11:11:03 +0300
- From: Kostas Georgakopoulos <kgeorga@xxxxxx>
- Subject: [Condor-users] Problem with test job - possibly with output file
Hi,
I am new to condor and I am trying to test the submission from a LCG
site (globus based middleware)
to a condor pool. Here is the "vanilla" test job:
#!/bin/bash
/bin/hostname
/bin/date
The job seems to fail. Here is the output from the shadow log on the
execute machine:
10/15 10:50:49 Submitting machine is "globus-lcg.it.uom.gr"
10/15 10:50:49 File transfer completed successfully.
10/15 10:50:50 Starting a VANILLA universe job with ID: 82.0
10/15 10:50:50 IWD: /home/condor/execute/dir_24729
10/15 10:50:50 Output file: /home/condor/execute/dir_24729/_condor_stdout
10/15 10:50:50 Error file: /home/condor/execute/dir_24729/_condor_stderr
10/15 10:50:50 About to exec
/home/condor/execute/dir_24729/condor_exec.exe
UI=000003:NS=0000000003:WM=000016:BH=0000000000:JSS=000012:LM=000018:LRMS=000000:APP=000000
10/15 10:50:50 Create_Process succeeded, pid=24733
10/15 10:50:50 Process exited, pid=24733, status=1
10/15 10:50:50 Got SIGQUIT. Performing fast shutdown.
10/15 10:50:50 ShutdownFast all jobs.
10/15 10:50:51 **** condor_starter (condor_STARTER) EXITING WITH STATUS 0
Here is the output of the sched log on the submitting machine:
10/15 10:50:49 (pid:23454) Starting add_shadow_birthdate(82.0)
10/15 10:50:49 (pid:23454) Started shadow for job 82.0 on
"<195.251.209.23:55245>", (shadow pid = 12092)
10/15 10:50:50 (pid:23454) Sent ad to central manager for
dteam015@xxxxxxxxxxxxxxxxxxxx
10/15 10:50:50 (pid:23454) Sent ad to 1 collectors for
dteam015@xxxxxxxxxxxxxxxxxxxx
10/15 10:50:51 (pid:23454) Shadow pid 12092 for job 82.0 exited with
status 100
10/15 10:50:51 (pid:23454) match
(<195.251.209.23:55245>#1191836703#123#...) out of jobs (cluster id 82);
relinquishing
10/15 10:50:51 (pid:23454) Sent RELEASE_CLAIM to startd on
<195.251.209.23:55245>
10/15 10:50:51 (pid:23454) Match record (<195.251.209.23:55245>, 82, -1)
deleted
10/15 10:50:51 (pid:23454)
statfs(/home/dteam015/gram_scratch_7BFyjXkbCS) failed: 13/Permission denied
10/15 10:50:51 (pid:23454) DaemonCore: Command received via TCP from
host <195.251.209.23:52869>
10/15 10:50:51 (pid:23454) DaemonCore: received command 443
(VACATE_SERVICE), calling handler (vacate_service)
10/15 10:50:51 (pid:23454) Got VACATE_SERVICE from <195.251.209.23:52869>
What does this statfs - permission denied means? Does anyone have seen
something similar?
By watching condor_status it seems that a machine is matched for the
test job but i see that
its state goes from 'Unclaimed' to 'Claimed' and back to 'Unclaimed'.
Nothing beyond that and
the activity status is always 'Idle'.
I submit the job from the user interface of the LCG site with
'edg-job-submit' and the job never finishes.
When i run the job with condor_submit from the LCG Computing Element it
runs fine. Furthermore when i run a
'/bin/date' from the User Interface with 'globus-job-run' it runs ok.
Can anyone assist me with this?
Thanks in advance.
--
********************************************
Kostas Georgakopoulos - MSc,
Systems and Network Administrator
E-mail : kgeorga@xxxxxx
Office Tel. : +30 2310 887973
Department Of Applied Informatics,
University Of Macedonia,
Egnatias 156, Thessaloniki, Greece
********************************************