Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] Jobs are remaining idle forever
- Date: Wed, 17 Sep 2008 09:05:15 -0500
- From: "Wingard, Jeffrey" <jwingard@xxxxxxxxx>
- Subject: [Condor-users] Jobs are remaining idle forever
Jobs are
remaining idle for long periods of time. Can anyone point me to the
problem
Here is
the output from condor_q -better-analyze
-- Submitter:
dmvx.vxnet : <192.168.2.101:32843> : dmvx.vxnet
---
422.000:
Run analysis summary. Of 22 machines,
11 are
rejected by your job's requirements
0 reject
your job because of their own requirements
0
match but are serving users with a better priority in the
pool
11 match but reject the job for unknown
reasons
0 match but will not currently preempt
their existing job
0 are available to run your
job
Last successful match: Wed Sep
17 08:54:12 2008
The
Requirements _expression_ for your job is:
( (
target.eclipse_available > 0 ) &&
( target.LoadAvg <
3.000000000000000E-01 ) ) && ( target.Arch == "X86_64" ) &&
(
target.OpSys == "LINUX" ) && ( target.Disk >= DiskUsage )
&&
( ( target.Memory * 1024 ) >= ImageSize ) && (
target.HasFileTransfer )
Condition
Machines Matched Suggestion
---------
---------------- ----------
1 (
target.eclipse_available > 0 ) 16
2 ( target.Arch ==
"X86_64" ) 16
3 (
target.OpSys == "LINUX" )
16
4 ( target.LoadAvg < 3.000000000000000E-01
)17
5 ( target.Disk >= 225000
) 22
6 ( ( 1024 *
target.Memory ) >= 0 ) 22
7 ( target.HasFileTransfer
) 22
This is the
log file
000
(422.000.000) 09/16 18:19:43 Job submitted from host:
<192.168.2.101:32843>
...
This is from
the SchedLog
9/17
08:49:15 (pid:22237) Checking consistency running and runnable jobs
9/17
08:49:15 (pid:22237) Tables are consistent
9/17 08:49:15 (pid:22237) Rebuilt
prioritized runnable job list in 0.000s.
9/17 08:49:15 (pid:22237) Starting
add_shadow_birthdate(422.0)
9/17 08:49:15 (pid:22237) Started shadow for job
422.0 on "<192.168.2.104:40736>", (shadow pid = 6252)
9/17 08:49:15
(pid:22237) Shadow pid 6251 for job 424.0 exited with status 1
9/17 08:49:15
(pid:22237) ERROR: Shadow exited with unknown value 1!
9/17 08:49:15
(pid:22237) Match for cluster 424 has had 5 shadow exceptions,
relinquishing.
9/17 08:49:15 (pid:22237) Sent RELEASE_CLAIM to startd at
<192.168.2.103:40431>
9/17 08:49:15 (pid:22237) Match record
(<192.168.2.103:40431>, 424, 0) deleted
9/17 08:49:15 (pid:22237) Got
VACATE_SERVICE from <192.168.2.103:47482>
9/17 08:49:15 (pid:22237)
Shadow pid 6252 for job 422.0 exited with status 1
9/17 08:49:15 (pid:22237)
ERROR: Shadow exited with unknown value 1!
Finally,
there is nothing in the Shadow Log
Thanks
Jeff