Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] Windows jobs not running after upgrade 7.2.4 to 7.4.3
- Date: Tue, 31 Aug 2010 09:58:11 +0800
- From: <Greg.Hitchen@xxxxxxxx>
- Subject: [Condor-users] Windows jobs not running after upgrade 7.2.4 to 7.4.3
We have a user who
is submitting the same jobs with the same requirements
_expression_
that all worked with
7.2.4 but apparently are not now with 7.4.3
requirements = (POOL == "VIC") && (Kflops >
1200000)
The jobs sit idle
and never execute because no machines match, as shown by condor_q
-better-analyze
013.006: Run
analysis summary. Of 1229 machines,
371 are rejected
by your job's requirements
0 reject your job
because of their own requirements
0 match but
are serving users with a better priority in the pool
839
match but reject the job for unknown reasons
19
match but will not currently preempt their existing
job
0 match but are currently
offline
0 are available to run your
job
The Requirements _expression_ for your job is:
( (
target.POOL == "VIC" ) && ( target.Kflops > 1200000 ) &&
(
target.OpSys == "WINNT51" ) ) && ( target.Arch == "INTEL" )
&&
( target.Disk >= DiskUsage ) && ( ( ( target.Memory *
1024 ) >= ImageSize ) &&
( ( RequestMemory * 1024 ) >=
ImageSize ) ) && ( target.HasFileTransfer
)
Condition
Machines Matched Suggestion
---------
---------------- ----------
1 ( ( ( 1024 *
target.Memory ) >= 1250 ) && ( ( 1024 *
ceiling(ifThenElse(JobVMMemory isnt
undefined,JobVMMemory,1.220703125000000E+000)) ) >= 1250 )
)
0
REMOVE
2 ( target.Kflops > 1200000
) 988
3 ( target.POOL ==
"VIC" )
1071
4 ( target.OpSys == "WINNT51" )
1214
5 ( target.HasFileTransfer
) 1223
6 ( target.Arch
== "INTEL" ) 1228
7 (
target.Disk >= 1500
)
1229
I've not taken a lot notice of the "extra" requirements that
condor adds itself before but am
wondering about the
"RequestMemory" requirement as googling seems to show that a
bug
was fixed in 7.2
that made it 1024 times too big due to a mix up between Mb and
Kb.
Could this still be
a problem?
I've also added
below the results from a condor_q -l command if that's at all
relevant.
I'll keep looking
into it but thought I'd try the users-list as well.
Thanks for any
info
Cheers
Greg
Err =
"cvferr_6.txt"
LastJobStatus = 0
Out = "cvfout_6.txt"
ProcId =
6
Shortjob = TRUE
UserLog =
"C:\\Users\\pok008\\TUCA\\PredSel\\jfm\\cvflog_6.txt"
JobStatus =
1
GlobalJobId = "wan110a-hr.nexus.csiro.au#13.6#1283211977"
Args =
"CRNTUL1_RF.txt climate.txt pred.txt TUCjfm006 6 0 1"
ServerTime =
1283219235
ClusterId = 13
CompletionDate = 0
NTDomain =
"NEXUS"
WindowsMajorVersion = 6
WindowsMinorVersion =
0
WindowsBuildNumber = 6002
WindowsServicePackMajorVersion =
2
WindowsServicePackMinorVersion = 0
WindowsProductType =
3
LocalUserCpu = 0.000000
LocalSysCpu = 0.000000
RemoteSysCpu =
0.000000
ExitStatus = 0
NumCkpts_RAW = 0
NumCkpts = 0
NumJobStarts =
0
NumRestarts = 0
NumSystemHolds = 0
CommittedTime =
0
TotalSuspensions = 0
LastSuspensionTime = 0
CumulativeSuspensionTime
= 0
ExitBySignal = FALSE
CondorVersion = "$CondorVersion: 7.4.3 Aug
4 2010 BuildID: 261829 $"
CondorPlatform = "$CondorPlatform: INTEL-WINNT50
$"
Iwd = "C:\\Users\\pok008\\TUCA\\PredSel\\jfm"
MinHosts = 1
MaxHosts
= 1
CurrentHosts = 0
WantRemoteSyscalls = FALSE
WantCheckpoint =
FALSE
RequestCpus = 1
EnteredCurrentStatus = 1283211977
User = "pok008@xxxxxxxx"
NiceUser =
FALSE
Environment = ""
JobNotification = 3
WantRemoteIO =
TRUE
CoreSize = 0
Rank = ConsoleIdle
In = "/dev/null"
TransferIn =
FALSE
StreamOut = FALSE
StreamErr = FALSE
BufferSize =
524288
BufferBlockSize = 32768
ShouldTransferFiles =
"YES"
WhenToTransferOutput = "ON_EXIT_OR_EVICT"
TransferFiles =
"ALWAYS"
TransferInput =
"CRNTUL1_RF.txt,climate.txt,pred.txt"
ImageSize_RAW =
1204
ExecutableSize_RAW = 1204
ExecutableSize = 1250
DiskUsage_RAW =
1374
DiskUsage = 1500
RequestMemory = ceiling(ifThenElse(JobVMMemory =!=
UNDEFINED, JobVMMemory, Image
Size / 1024.000000))
RequestDisk =
DiskUsage
Requirements = ((POOL == "VIC") && (Kflops > 1200000)
&& (OpSys == "WINNT51")) &
& (Arch == "INTEL") &&
(Disk >= DiskUsage) && (((Memory * 1024) >= ImageSize)
&&
((RequestMemory * 1024) >= ImageSize)) &&
(HasFileTransfer)
JobLeaseDuration = 300
PeriodicHold =
FALSE
PeriodicRelease = FALSE
PeriodicRemove = FALSE
> == 0)
LeaveJobInQueue = FALSE
Owner =
"pok008"
JobPrio = 3000
ImageSize = 1250
QDate =
1283211977
RemoteUserCpu = 0
RemoteWallClockTime = 0
Cmd =
"C:\\Users\\pok008\\cvcluster_select.exe"
JobUniverse =
5