Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Jobs being shutdown immediately.
- Date: Thu, 17 Sep 2009 07:29:11 -0700
- From: Mark Tigges <mtigges@xxxxxxxxx>
- Subject: Re: [Condor-users] Jobs being shutdown immediately.
Hmmmmm, I just checked our slaves here locally, and they're set to
False explicitly,
I wasn't as careful as I thought I was.
Ok, thanks.
Mark.
On Thu, Sep 17, 2009 at 7:17 AM, Dan Bradley <dan@xxxxxxxxxxxx> wrote:
>
> The PREEMPT expression has nothing to do with preemption of one job by
> another. It is for kicking a job off of a machine because of the
> machine policy (e.g. because the machine is needed for some other purpose).
>
> Run the following command to see your PREEMPT expression on the execute
> machine where you are having the problem:
>
> condor_config_val -v PREEMPT
>
> --Dan
>
> Mark Tigges wrote:
>> That was the first thing I tried ... we've been using it like that
>> forever on our current farm at our central location. The reason is
>> that we have a tonne of short jobs and only a very few large jobs.
>> So, if there are competing jobs, with PREEMPT on short jobs take
>> precendence. Right?
>>
>> Regardless ... these tests, with the log I previously sent is with
>> only one job being submitted to a farm of three machines. It's
>> getting preempted when nothing else is reported by condor_q -global.
>> The farm hasn't been deployed to artists yet. condor_q -analyze says
>> removed for an unknown reason.
>>
>> Mark.
>>
>> On Thu, Sep 17, 2009 at 6:14 AM, David Watrous
>> <dwatrous@xxxxxxxxxxxxxxxxxx> wrote:
>>
>>> Mark,
>>> Check your PREEMPT expression on the workstation. It is evaluating to True
>>> and causing the job to terminate.
>>> Hope this helps,
>>> Dave
>>> --
>>> ===================================
>>> David Watrous
>>> main: 888.292.5320
>>> Cycle Computing, LLC
>>> Leader in Condor Grid Solutions
>>> Enterprise Condor Support and Management Tools
>>> http://www.cyclecomputing.com
>>> http://www.cyclecloud.com
>>> On Sep 17, 2009, at 12:24 AM, Mark Tigges wrote:
>>>
>>> We have condor (7.0.5) running just fine at our own studio. I'm
>>> trying to set it up remotely in
>>> Shanghai, everything is running alright. If I try simple hello world
>>> batch files, all works great.
>>>
>>> As soon as I try a bigger job, rendering an image for a few minutes
>>> jobs get scheduled,
>>> start, then go down right away into idle. Wait 4 minutes and the
>>> cycle repeats itself. I've been
>>> reading manuals for hours, googling, and tearing my hair out. Here's
>>> the starter log from the
>>> machine running the job.
>>>
>>> 9/17 12:06:09 match_info called
>>> 9/17 12:06:09 Received match <10.88.70.102:64805>#1253158085#15#...
>>> 9/17 12:06:09 State change: match notification protocol successful
>>> 9/17 12:06:09 Changing state: Unclaimed -> Matched
>>> 9/17 12:06:10 Request accepted.
>>> 9/17 12:06:10 Remote owner is yhong@***********
>>> 9/17 12:06:10 State change: claiming protocol successful
>>> 9/17 12:06:10 Changing state: Matched -> Claimed
>>> 9/17 12:06:14 Got activate_claim request from shadow (<10.88.70.26:4063>)
>>> 9/17 12:06:14 Remote job ID is 75.0
>>> 9/17 12:06:14 Got universe "VANILLA" (5) from request classad
>>> 9/17 12:06:14 State change: claim-activation protocol successful
>>> 9/17 12:06:14 Changing activity: Idle -> Busy
>>> 9/17 12:06:19 State change: PREEMPT is TRUE
>>> 9/17 12:06:19 Changing activity: Busy -> Retiring
>>> 9/17 12:06:19 State change: claim retirement ended/expired
>>> 9/17 12:06:19 State change: WANT_VACATE is FALSE
>>> 9/17 12:06:19 Changing state and activity: Claimed/Retiring ->
>>> Preempting/Killing
>>> 9/17 12:06:20 Got KILL_FRGN_JOB while in Preempting state, ignoring.
>>> 9/17 12:06:20 Got RELEASE_CLAIM while in Preempting state, ignoring.
>>> 9/17 12:06:20 Starter pid 3524 exited with status 0
>>> 9/17 12:06:20 State change: starter exited
>>> 9/17 12:06:20 State change: No preempting claim, returning to owner
>>> 9/17 12:06:20 Changing state and activity: Preempting/Killing -> Owner/Idle
>>> 9/17 12:06:20 State change: IS_OWNER is false
>>> 9/17 12:06:20 Changing state: Owner -> Unclaimed
>>> _______________________________________________
>>> Condor-users mailing list
>>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>>> subject: Unsubscribe
>>> You can also unsubscribe by visiting
>>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>>
>>> The archives can be found at:
>>> https://lists.cs.wisc.edu/archive/condor-users/
>>>
>>>
>>>
>>> _______________________________________________
>>> Condor-users mailing list
>>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>>> subject: Unsubscribe
>>> You can also unsubscribe by visiting
>>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>>
>>> The archives can be found at:
>>> https://lists.cs.wisc.edu/archive/condor-users/
>>>
>>>
>>>
>> _______________________________________________
>> Condor-users mailing list
>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/condor-users/
>>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
>