That was the first thing I tried ... we've been using it like that
forever on our current farm at our central location. The reason is
that we have a tonne of short jobs and only a very few large jobs.
So, if there are competing jobs, with PREEMPT on short jobs take
precendence. Right?
Regardless ... these tests, with the log I previously sent is with
only one job being submitted to a farm of three machines. It's
getting preempted when nothing else is reported by condor_q -global.
The farm hasn't been deployed to artists yet. condor_q -analyze says
removed for an unknown reason.
Mark.
On Thu, Sep 17, 2009 at 6:14 AM, David Watrous
<dwatrous@xxxxxxxxxxxxxxxxxx> wrote:
Mark,
Check your PREEMPT expression on the workstation. It is evaluating to True
and causing the job to terminate.
Hope this helps,
Dave
--
===================================
David Watrous
main: 888.292.5320
Cycle Computing, LLC
Leader in Condor Grid Solutions
Enterprise Condor Support and Management Tools
http://www.cyclecomputing.com
http://www.cyclecloud.com
On Sep 17, 2009, at 12:24 AM, Mark Tigges wrote:
We have condor (7.0.5) running just fine at our own studio. I'm
trying to set it up remotely in
Shanghai, everything is running alright. If I try simple hello world
batch files, all works great.
As soon as I try a bigger job, rendering an image for a few minutes
jobs get scheduled,
start, then go down right away into idle. Wait 4 minutes and the
cycle repeats itself. I've been
reading manuals for hours, googling, and tearing my hair out. Here's
the starter log from the
machine running the job.
9/17 12:06:09 match_info called
9/17 12:06:09 Received match <10.88.70.102:64805>#1253158085#15#...
9/17 12:06:09 State change: match notification protocol successful
9/17 12:06:09 Changing state: Unclaimed -> Matched
9/17 12:06:10 Request accepted.
9/17 12:06:10 Remote owner is yhong@***********
9/17 12:06:10 State change: claiming protocol successful
9/17 12:06:10 Changing state: Matched -> Claimed
9/17 12:06:14 Got activate_claim request from shadow (<10.88.70.26:4063>)
9/17 12:06:14 Remote job ID is 75.0
9/17 12:06:14 Got universe "VANILLA" (5) from request classad
9/17 12:06:14 State change: claim-activation protocol successful
9/17 12:06:14 Changing activity: Idle -> Busy
9/17 12:06:19 State change: PREEMPT is TRUE
9/17 12:06:19 Changing activity: Busy -> Retiring
9/17 12:06:19 State change: claim retirement ended/expired
9/17 12:06:19 State change: WANT_VACATE is FALSE
9/17 12:06:19 Changing state and activity: Claimed/Retiring ->
Preempting/Killing
9/17 12:06:20 Got KILL_FRGN_JOB while in Preempting state, ignoring.
9/17 12:06:20 Got RELEASE_CLAIM while in Preempting state, ignoring.
9/17 12:06:20 Starter pid 3524 exited with status 0
9/17 12:06:20 State change: starter exited
9/17 12:06:20 State change: No preempting claim, returning to owner
9/17 12:06:20 Changing state and activity: Preempting/Killing -> Owner/Idle
9/17 12:06:20 State change: IS_OWNER is false
9/17 12:06:20 Changing state: Owner -> Unclaimed
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/