-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx
[mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Dan Bradley
Sent: October 27, 2004 6:16 PM
To: Condor-Users Mail List
Subject: Re: [Condor-users] Adjusting machine RANK classad
exprbased ontotalqueue time for a job
Ian,
Could you specify which of the jobs in your various tests are
being run by different users, if any? One potential point of
confusion is that, by design, the Condor negotiator does not
micromanage what the schedd does with a claim. Once the
schedd gets a claim on behalf of a user, it will continue to
run jobs on that claim until the claim is taken away or the
user runs out of jobs. The negotiator doesn't tell the
schedd which job to run next on the claim.
You can force renegotiation of claims after every job if you want.
Something like the following policy will do this:
MaxJobRetirementTime = 1000000
WANT_SUSPEND = FALSE
PREEMPT = TRUE
--Dan
Ian Chesal wrote:
It looks like it was my use of condor_rm that messed up my
predictability. I continued the experiment but this time I made sure
the running 44.1 process finished normally instead of being
pre-maturly
terminated by condor_rm.
I had two queued jobs with their EnteredCurrentStatus times:
44.2 1098912677
44.3 1098910808
I expected 44.2 to rank lower than 44.3 by ~31. So 44.3
should be the
next job picked up.
And this was the case. My rank expression worked this time.
Excellent.
So here's a question for the condor team: If I was a "sneaky user" I
could write a job that, after processing was complete sent
me an email
and then went to sleep for a long, long time. Upon receiving that
email, if I used condor_rm to terminate the job I'd be able
to hang on
to the resource it was using and run another job on it. Even
if another
job, from another user, had a higher rank because condor_rm seems to
prevent the machine from re-negotiating. This would give me infinite
access to a resource. Can this happen?
Ian
-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx
[mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Ian Chesal
Sent: October 27, 2004 5:18 PM
To: Condor-Users Mail List
Subject: RE: [Condor-users] Adjusting machine RANK classad
expr based
ontotalqueue time for a job
Hmm. So I went with the RANK expression:
RANK = ((TARGET.JobStatus =?= 1) * ((CurrentTime -
TARGET.EnteredCurrentStatus)/60))
My plan was to make sure jobs that are queued rank higher
the longer
they've been in the queued state. In this case, +1 for every minute
they've been sitting idle.
To test this I submitted some jobs in the held state. Jobs
are simple:
go to the machine and sleep for an hour.
I released three of the held jobs. My machine immediately picked up
44.0 from the cluster and started running.
I let the other two released jobs build up some queue time
while 44.0
slept on a machine. At one point I did see condor_status
show my 44.0
as being in the "Retiring" state instead of the "Busy"
state -- that
is good news. We have a long MaxJobRetirementTime so this is
expected.
I let about 8 minutes lapse I then I issued the commmand:
condor_hold 44.1
condor_release 44.1
So this reset the EnteredCurrentStatus time on 44.1. I now
have 44.0
running, but retiring and the remaining two jobs each have
EnteredCurrentStatus as follows:
44.1 1098910859
44.2 1098910279
By this output I expect 44.2 to have the higher rank. 44.0 is still
running so I removed it with:
condor_rm 44.0
I expected the machine to pick up 44.2 as the next job because it's
rank is higher, having been queued for a longer time that 44.1.
Not so. The machine picked up 44.1. I'm the only user in
the system so
it's not a matter of EUP. What's up? Why is it 44.2 didn't rank
higher?
Can anyone see how I messed up my prediction for next job
to run? I'm
stumped. I thought I had it all figured out.
Thanks!
Ian
-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx
[mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Ian Chesal
Sent: October 27, 2004 11:34 AM
To: Condor-Users Mail List
Subject: [Condor-users] Adjusting machine RANK classad expr
based on
totalqueue time for a job
I'm toying with adjusting the RANK expression to achieve a more
FIFO-like consideration when condor runs jobs. The idea is to rank
jobs on machines based on their time in the queue.
I wanted to bounce the rank expression and idea off the list.
The rank expression for machines I'm thinking of using is:
RANK = ((TARGET.JobStatus =?= 1) * ((CurrentTime -
TARGET.EnteredCurrentStatus)/600))
This would give a job queued 10 minutes longer than another job a
higher rank on the machine.
The other option is:
RANK = ((CurrentTime - TARGET.QDate)/600)
But this would track cumulative queue time (so if the job
queued, ran
for a bit, then got sent back to the queue) right? Or is
Qdate reset
every time a job returns to the queue, not just the first
time it's
queued up by condor_submit?
Comments? Opinions? Much appreciated.
Ian
_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
http://lists.cs.wisc.edu/mailman/listinfo/condor-users
_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
http://lists.cs.wisc.edu/mailman/listinfo/condor-users
_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
http://lists.cs.wisc.edu/mailman/listinfo/condor-users
_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
http://lists.cs.wisc.edu/mailman/listinfo/condor-users