Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Priority issue
- Date: Tue, 12 Dec 2006 13:52:44 +0100
- From: Nicolas GUIOT <nicolas.guiot@xxxxxxx>
- Subject: Re: [Condor-users] Priority issue
More details (sorry I didn't wait before writing, I'm quite in a hurry to get these results...) :
Some new CPUs got available for condor (they were in Owner state before) : some 74 jobs took them.
But on the CPUS where 72s jobs were running and are finished, it dosn't want to start 74s, but keeps running new 72s : how can I make this change ?
Thanks in advance
Nicolas
----------------
On Tue, 12 Dec 2006 11:31:48 +0100
Nicolas GUIOT <nicolas.guiot@xxxxxxx> wrote:
> Details (hope this can help):
>
> On the next 74 job in the list, I have the following condor_q -better-analyze :(a 72 job has approximatly the same)
>
> root@rhea:~# condor_q -better-analyze 74.25
>
>
> -- Submitter: rhea.my.domain : <172.XX.XX.XX:32772> : rhea.my.domain
> AddConstraint: Condition value not literal
> AddConstraint: Condition value not literal
> AddConstraint: Condition value not literal
> AddConstraint: Condition value not literal
> AddConstraint: Condition value not literal
> AddConstraint: Condition value not literal
> AddConstraint: Condition value not literal
> AddConstraint: Condition value not literal
> AddConstraint: Condition value not literal
> AddConstraint: Condition value not literal
> AddConstraint: Condition value not literal
> AddConstraint: Condition value not literal
> AddConstraint: Condition value not literal
> AddConstraint: Condition value not literal
> AddConstraint: Condition value not literal
> AddConstraint: Condition value not literal
> AddConstraint: Condition value not literal
> AddConstraint: Condition value not literal
> AddConstraint: Condition value not literal
> ---
> 074.025: Run analysis summary. Of 29 machines,
> 2 are rejected by your job's requirements
> 8 reject your job because of their own requirements
> 19 match but are serving users with a better priority in the pool
> 0 match but reject the job for unknown reasons
> 0 match but will not currently preempt their existing job
> 0 are available to run your job
> No successful match recorded.
> Last failed match: Tue Dec 12 11:16:42 2006
> Reason for last match failure: no match found
>
> The Requirements expression for your job is:
>
> ( target.Arch == "INTEL" ) && ( target.OpSys == "LINUX" ) &&
> ( target.Disk >= DiskUsage ) && ( ( target.Memory * 1024 ) >= ImageSize ) &&
> ( target.HasFileTransfer )
>
> Condition Machines Matched Suggestion
> --------- ---------------- ----------
> 1 ( target.Arch == "INTEL" ) 27
> 2 ( target.OpSys == "LINUX" ) 29
> 3 ( target.Disk >= 686 ) 29
> 4 ( ( 1024 * target.Memory ) >= 571 )29
> 5 ( target.HasFileTransfer ) 29
>
> The following attributes are missing from the job ClassAd:
>
> CheckpointPlatform
>
>
> ----------------
> On Tue, 12 Dec 2006 11:07:57 +0100
> Nicolas GUIOT <nicolas.guiot@xxxxxxx> wrote:
>
> > Hi,
> >
> > I started a first job (72), which is made of about 150 queued jobs.
> > Then I later started a second one (74), which I need first.
> > So, once started, I modified the 74's priority with :
> > condor_prio -p 500 74
> > I also modified the 72's priority to -15.
> >
> > Now my problem is that only one of the 74 job runs and other CPUs are used by 72. Even when a 72 job finishes, if a 74 is running, it doesn't launch any new 74.
> >
> > Here is the submissions script (both similar) :
> >
> > Universe = vanilla
> >
> > Executable = /nfs/rhea/attract
> > arguments = T27_R_M-mutate.pdb T27_L.red $(Process)
> > output = /nfs/MC2/output.$(Process).txt
> > error = /nfs/MC2/ERROR.$(Process)
> > Log = /nfs/MC2/LOG.$(Process)
> >
> >
> > should_transfer_files = YES
> > when_to_transfer_output = ON_EXIT
> > transfer_input_files = T27_R_M-mutate.pdb, T27_L.red,translat.dat,attract.inp,aminon.par,rotation.dat,stan
> > dard.pdb
> > notify_user = user@xxxxxxxxx
> > notification = error
> >
> > queue 147
> >
> >
> > Here is the (truncated) condor_q result :
> >
> > -- Submitter: rhea.my.domain : <172.XX.XX.XX:32772> : rhea.my.domain
> > ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
> > 72.37 saladin 12/9 17:49 0+01:29:47 R -15 701.1 attract T27_R_M-mu
> > 72.38 saladin 12/9 17:49 0+00:13:23 R -15 0.6 attract T27_R_M-mu
> > 72.41 saladin 12/9 17:49 0+00:12:25 R -15 0.6 attract T27_R_M-mu
> > 72.42 saladin 12/9 17:49 0+00:00:00 I -15 0.6 attract T27_R_M-mu
> > 72.72 saladin 12/9 17:49 0+00:00:00 I -15 0.6 attract T27_R_M-mu
> > 72.73 saladin 12/9 17:49 0+00:00:00 I -15 0.6 attract T27_R_M-mu
> > 72.74 saladin 12/9 17:49 0+00:00:00 I -15 0.6 attract T27_R_M-mu
> > 72.145 saladin 12/9 17:49 0+00:00:00 I -15 0.6 attract T27_R_M-mu
> > 72.146 saladin 12/9 17:49 0+00:00:00 I -15 0.6 attract T27_R_M-mu
> > 74.24 saladin 12/11 12:43 0+00:06:35 R 500 0.6 attract T27_R_M-mu
> > 74.25 saladin 12/11 12:43 0+00:00:00 I 500 0.6 attract T27_R_M-mu
> > 74.26 saladin 12/11 12:43 0+00:00:00 I 500 0.6 attract T27_R_M-mu
> > 74.27 saladin 12/11 12:43 0+00:00:00 I 500 0.6 attract T27_R_M-mu
> > 74.28 saladin 12/11 12:43 0+00:00:00 I 500 0.6 attract T27_R_M-mu
> > 74.29 saladin 12/11 12:43 0+00:00:00 I 500 0.6 attract T27_R_M-mu
> > 74.30 saladin 12/11 12:43 0+00:00:00 I 500 0.6 attract T27_R_M-mu
> > 74.31 saladin 12/11 12:43 0+00:00:00 I 500 0.6 attract T27_R_M-mu
> > 74.32 saladin 12/11 12:43 0+00:00:00 I 500 0.6 attract T27_R_M-mu
> > 74.33 saladin 12/11 12:43 0+00:00:00 I 500 0.6 attract T27_R_M-mu
> > 74.34 saladin 12/11 12:43 0+00:00:00 I 500 0.6 attract T27_R_M-mu
> >
> > 190 jobs; 171 idle, 19 running, 0 held
> > root@rhea:~#
> >
> > Thanks for any help.
> > Nicolas
> >
> > ----------------------------------------------------
> > CNRS - UPR 9080 : Laboratoire de Biochimie Theorique
> > Institut de Biologie Physico-Chimique
> > 13 rue Pierre et Marie Curie
> > 75005 PARIS - FRANCE
> >
> > Tel : +33 158 41 51 70
> > Fax : +33 158 41 50 26
> > ----------------------------------------------------
> > _______________________________________________
> > Condor-users mailing list
> > To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> > subject: Unsubscribe
> > You can also unsubscribe by visiting
> > https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> >
> > The archives can be found at either
> > https://lists.cs.wisc.edu/archive/condor-users/
> > http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR
> >
>
> ----------
>
>
> ----------------------------------------------------
> CNRS - UPR 9080 : Laboratoire de Biochimie Theorique
> Institut de Biologie Physico-Chimique
> 13 rue Pierre et Marie Curie
> 75005 PARIS - FRANCE
>
> Tel : +33 158 41 51 70
> Fax : +33 158 41 50 26
> ----------------------------------------------------
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at either
> https://lists.cs.wisc.edu/archive/condor-users/
> http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR
>
----------
----------------------------------------------------
CNRS - UPR 9080 : Laboratoire de Biochimie Theorique
Institut de Biologie Physico-Chimique
13 rue Pierre et Marie Curie
75005 PARIS - FRANCE
Tel : +33 158 41 51 70
Fax : +33 158 41 50 26
----------------------------------------------------