Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] [ExternalEmail] Re: Windows jobs not running after upgrade 7.2.4 to 7.4.3
- Date: Tue, 31 Aug 2010 14:43:20 +0800
- From: <Greg.Hitchen@xxxxxxxx>
- Subject: Re: [Condor-users] [ExternalEmail] Re: Windows jobs not running after upgrade 7.2.4 to 7.4.3
And the command
condor_status
-constraint "( ( ( 1024 *
target.Memory ) >= 1250 ) && ( ( 1024 *
ceiling(ifThenElse(JobVMMemory =!=
undefined,JobVMMemory,1.220703125000000E+000)) ) >= 1250 ) )"
produces many matches. So why isn't it
working?
Collector/Negotiator is linux running 7.2.3, submit
node windows running 7.4.3, execute nodes
windows
running 7.2.4
Greg
Maybe some relevant
info that might be useful.
Following are the
different requirements generated by condor for the 2
versions
7.2.4
RequestMemory =
ceiling(ImageSize / 1024.000000)
Requirements = (Arch == "INTEL") &&
(OpSys == "WINNT51") && (Disk >= DiskUsage)
&& ((Memory *
1024) >= ImageSize) && (HasFileTransfer)
7.4.3
RequestMemory =
ceiling(ifThenElse(JobVMMemory =!= UNDEFINED, JobVMMemory, Image
Size /
1024.000000))
Requirements = (Arch == "INTEL") && (OpSys ==
"WINNT51") && (Disk >= DiskUsage)
&& (((Memory * 1024)
>= ImageSize) && ((RequestMemory * 1024) >= ImageSize))
&&
(HasFileTransfer)
Cheers
Greg
We have a user who
is submitting the same jobs with the same requirements
_expression_
that all worked with
7.2.4 but apparently are not now with 7.4.3
requirements = (POOL == "VIC") && (Kflops >
1200000)
The jobs sit idle
and never execute because no machines match, as shown by condor_q
-better-analyze
013.006: Run
analysis summary. Of 1229 machines,
371 are rejected
by your job's requirements
0 reject your job
because of their own requirements
0 match but
are serving users with a better priority in the pool
839
match but reject the job for unknown reasons
19
match but will not currently preempt their existing
job
0 match but are currently
offline
0 are available to run your
job
The Requirements _expression_ for your job is:
( (
target.POOL == "VIC" ) && ( target.Kflops > 1200000 ) &&
(
target.OpSys == "WINNT51" ) ) && ( target.Arch == "INTEL" )
&&
( target.Disk >= DiskUsage ) && ( ( ( target.Memory *
1024 ) >= ImageSize ) &&
( ( RequestMemory * 1024 ) >=
ImageSize ) ) && ( target.HasFileTransfer
)
Condition
Machines Matched Suggestion
---------
---------------- ----------
1 ( ( ( 1024 *
target.Memory ) >= 1250 ) && ( ( 1024 *
ceiling(ifThenElse(JobVMMemory isnt
undefined,JobVMMemory,1.220703125000000E+000)) ) >= 1250 )
)
0
REMOVE
2 ( target.Kflops > 1200000
) 988
3 ( target.POOL ==
"VIC" )
1071
4 ( target.OpSys == "WINNT51" )
1214
5 ( target.HasFileTransfer
) 1223
6 ( target.Arch
== "INTEL" ) 1228
7 (
target.Disk >= 1500
)
1229
I've not taken a lot notice of the "extra" requirements that
condor adds itself before but am
wondering about the
"RequestMemory" requirement as googling seems to show that a
bug
was fixed in 7.2
that made it 1024 times too big due to a mix up between Mb and
Kb.
Could this still be
a problem?
I've also added
below the results from a condor_q -l command if that's at all
relevant.
I'll keep looking
into it but thought I'd try the users-list as well.
Thanks for any
info
Cheers
Greg
Err =
"cvferr_6.txt"
LastJobStatus = 0
Out = "cvfout_6.txt"
ProcId =
6
Shortjob = TRUE
UserLog =
"C:\\Users\\pok008\\TUCA\\PredSel\\jfm\\cvflog_6.txt"
JobStatus =
1
GlobalJobId = "wan110a-hr.nexus.csiro.au#13.6#1283211977"
Args =
"CRNTUL1_RF.txt climate.txt pred.txt TUCjfm006 6 0 1"
ServerTime =
1283219235
ClusterId = 13
CompletionDate = 0
NTDomain =
"NEXUS"
WindowsMajorVersion = 6
WindowsMinorVersion =
0
WindowsBuildNumber = 6002
WindowsServicePackMajorVersion =
2
WindowsServicePackMinorVersion = 0
WindowsProductType =
3
LocalUserCpu = 0.000000
LocalSysCpu = 0.000000
RemoteSysCpu =
0.000000
ExitStatus = 0
NumCkpts_RAW = 0
NumCkpts = 0
NumJobStarts =
0
NumRestarts = 0
NumSystemHolds = 0
CommittedTime =
0
TotalSuspensions = 0
LastSuspensionTime = 0
CumulativeSuspensionTime
= 0
ExitBySignal = FALSE
CondorVersion = "$CondorVersion: 7.4.3 Aug
4 2010 BuildID: 261829 $"
CondorPlatform = "$CondorPlatform: INTEL-WINNT50
$"
Iwd = "C:\\Users\\pok008\\TUCA\\PredSel\\jfm"
MinHosts = 1
MaxHosts
= 1
CurrentHosts = 0
WantRemoteSyscalls = FALSE
WantCheckpoint =
FALSE
RequestCpus = 1
EnteredCurrentStatus = 1283211977
User = "pok008@xxxxxxxx"
NiceUser =
FALSE
Environment = ""
JobNotification = 3
WantRemoteIO =
TRUE
CoreSize = 0
Rank = ConsoleIdle
In = "/dev/null"
TransferIn =
FALSE
StreamOut = FALSE
StreamErr = FALSE
BufferSize =
524288
BufferBlockSize = 32768
ShouldTransferFiles =
"YES"
WhenToTransferOutput = "ON_EXIT_OR_EVICT"
TransferFiles =
"ALWAYS"
TransferInput =
"CRNTUL1_RF.txt,climate.txt,pred.txt"
ImageSize_RAW =
1204
ExecutableSize_RAW = 1204
ExecutableSize = 1250
DiskUsage_RAW =
1374
DiskUsage = 1500
RequestMemory = ceiling(ifThenElse(JobVMMemory =!=
UNDEFINED, JobVMMemory, Image
Size / 1024.000000))
RequestDisk =
DiskUsage
Requirements = ((POOL == "VIC") && (Kflops > 1200000)
&& (OpSys == "WINNT51")) &
& (Arch == "INTEL") &&
(Disk >= DiskUsage) && (((Memory * 1024) >= ImageSize)
&&
((RequestMemory * 1024) >= ImageSize)) &&
(HasFileTransfer)
JobLeaseDuration = 300
PeriodicHold =
FALSE
PeriodicRelease = FALSE
PeriodicRemove = FALSE
> == 0)
LeaveJobInQueue = FALSE
Owner =
"pok008"
JobPrio = 3000
ImageSize = 1250
QDate =
1283211977
RemoteUserCpu = 0
RemoteWallClockTime = 0
Cmd =
"C:\\Users\\pok008\\cvcluster_select.exe"
JobUniverse =
5