[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] Windows jobs not running on 7.4.4
- Date: Thu, 28 Oct 2010 08:44:43 -0200
- From: kschwarz@xxxxxxxxxxxxxx
- Subject: [Condor-users] Windows jobs not running on 7.4.4
I am trying to upgrade from version 7.2.4 to 7.4.4.
(Windows and Linux executors and Linux CM).
Submitting a simple job it stucks in the queue.
condor_status gives:
Name
OpSys Arch State Activity LoadAv
Mem ActvtyTime
slot1@xxxxxxxxxxxx LINUX X86_64
Unclaimed Idle 0.000 1983 0+03:42:06
slot2@xxxxxxxxxxxx LINUX X86_64
Unclaimed Idle 0.000 1983 0+15:42:33
<snip>
slot1@xxxxxxxxxxxx LINUX X86_64
Unclaimed Idle 0.070 1983 0+01:46:35
slot2@xxxxxxxxxxxx LINUX X86_64
Unclaimed Idle 0.000 1983 1+18:46:32
CONDOR-XXX-PC2.xxx WINNT52 INTEL Unclaimed
Idle 0.000 1023 0+00:27:05
condor-xxx-pc1.xxx WINNT52 INTEL Unclaimed
Idle 0.000 1023 0+00:35:04
slot1@xxxxxxxxxxxx WINNT61 INTEL Owner
Idle 0.120 4061 0+01:01:34
slot2@xxxxxxxxxxxx WINNT61 INTEL Owner
Idle 0.000 4061 0+01:01:35
Total Owner Claimed Unclaimed Matched Preempting Backfill
INTEL/WINNT52
2 0 0 2
0 0
0
INTEL/WINNT61
2 2 0 0
0 0
0
X86_64/LINUX 18
0 0 18
0 0
0
Total
22 2 0
20 0 0
0
condor_q -better-analyze 3.0 gives:
-- Submitter: PC303344.xxxxx : <10.3.29.182:19936>
: PC303344.xxxxx
---
003.000: Run analysis summary. Of 22 machines,
21 are rejected by your job's
requirements
0 reject your job because of
their own requirements
0 match but are serving users
with a better priority in the pool
1 match but reject the job for
unknown reasons
0 match but will not currently
preempt their existing job
0 match but are currently offline
0 are available to run your job
The Requirements _expression_ for your job is:
( target.machine == "condor-xxx-pc1.xxxxx"
&&
( target.OpSysGeneric == "WINNT" ) ) &&
( target.Arch == "INTEL" ) &&
( target.Disk >= DiskUsage ) && ( ( target.Memory
* 1024 ) >= ImageSize ) &&
( ( RequestMemory * 1024 ) >= ImageSize ) &&
( target.HasFileTransfer )
Condition
Machines Matched
Suggestion
---------
----------------
----------
1 ( ( 1024 * ceiling(ifThenElse(JobVMMemory
isnt undefined,JobVMMemory,9.765625000000000E-004)) ) >= 1 )
0 REMOVE
2 target.machine == "condor-xxx-pc1.xxxxx"
1
3 ( target.OpSysGeneric == "WINNT"
)4
4 ( target.Arch == "INTEL" )
4
5 ( target.Disk >= 1 )
22
6 ( ( 1024 * target.Memory ) >= 1 ) 22
7 ( target.HasFileTransfer )
22
The condition 1 is being suggested to be removed.
Looking in other related problems in the condor-users
list I found that in version 7.4.3 the RequestMemory _expression_ has been
changed to
RequestMemory = ceiling(ifThenElse(JobVMMemory =!=
UNDEFINED, JobVMMemory, ImageSize / 1024.000000))
My image size is 1. So there are machines available
(22)
JobVMMemory isnt undefined, therefore the JobVMMemory
=!= UNDEFINED evaluates to True !
I have not found the JobVMMemory in the condor_q -l
3.0 output
How could I overcome this issue?
Thanks,
Klaus
Output from condor_q -l 3.0
-- Submitter: PC303344.xxxxx : <10.3.29.182:19936>
: PC303344.xxxxx
ClusterId = 3
QDate = 1288175033
CompletionDate = 0
Owner = "ZZZZZZZZ"
NTDomain = "XXXXX"
WindowsMajorVersion = 6
WindowsMinorVersion = 1
WindowsBuildNumber = 7600
WindowsServicePackMajorVersion = 0
WindowsServicePackMinorVersion = 0
WindowsProductType = 1
RemoteWallClockTime = 0.000000
LocalUserCpu = 0.000000
LocalSysCpu = 0.000000
RemoteUserCpu = 0.000000
RemoteSysCpu = 0.000000
ExitStatus = 0
NumCkpts_RAW = 0
NumCkpts = 0
NumJobStarts = 0
NumRestarts = 0
NumSystemHolds = 0
CommittedTime = 0
TotalSuspensions = 0
LastSuspensionTime = 0
CumulativeSuspensionTime = 0
ExitBySignal = FALSE
CondorVersion = "$CondorVersion: 7.4.4 Oct 13
2010 BuildID: 279383 $"
CondorPlatform = "$CondorPlatform: INTEL-WINNT50
$"
Iwd = "C:\Condor_Test"
JobUniverse = 5
Cmd = "C:\Condor_Test\simple.bat"
MinHosts = 1
MaxHosts = 1
CurrentHosts = 0
WantRemoteSyscalls = FALSE
WantCheckpoint = FALSE
RequestCpus = 1
EnteredCurrentStatus = 1288175033
JobPrio = 0
User = "ZZZZZZZZ@xxxxxxxxxxxxxx"
NiceUser = FALSE
Environment = ""
JobNotification = 2
WantRemoteIO = TRUE
UserLog = "C:\Condor_Test\simple.log"
CoreSize = 0
Rank = 0.000000
In = "/dev/null"
TransferIn = FALSE
Out = "simple.out"
StreamOut = FALSE
Err = "simple.err"
StreamErr = FALSE
BufferSize = 524288
BufferBlockSize = 32768
ShouldTransferFiles = "YES"
WhenToTransferOutput = "ON_EXIT"
TransferFiles = "ONEXIT"
ImageSize_RAW = 1
ImageSize = 1
ExecutableSize_RAW = 1
ExecutableSize = 1
DiskUsage_RAW = 1
DiskUsage = 1
RequestMemory = ceiling(ifThenElse(JobVMMemory =!=
UNDEFINED, JobVMMemory, ImageSize / 1024.000000))
RequestDisk = DiskUsage
Requirements = (machine == "condor-xxx-pc1.xxxxx"
&& (OpSysGeneric == "WINNT")) && (Arch == "INTEL")
&& (Disk >= DiskUsage) && ((Memory * 1024) >= ImageSize)
&& ((RequestMemory * 1024) >= ImageSize) && (HasFileTransfer)
JobLeaseDuration = 1200
PeriodicHold = FALSE
PeriodicRelease = FALSE
PeriodicRemove = FALSE
>
>
LeaveJobInQueue = FALSE
Args = "4 12"
GlobalJobId = "PC303344.xxxxx#3.0#1288175033"
LastJobStatus = 0
JobStatus = 1
ProcId = 0
ScheddBday = 1288256462
ServerTime = 1288259804
This message is intended solely for the
use of its addressee and may contain privileged or confidential information.
All information contained herein shall be treated as confidential and shall
not be disclosed to any third party without Embraer’s prior written approval.
If you are not the addressee you should not distribute, copy or file this
message. In this case, please notify the sender and destroy its contents
immediately.
Esta mensagem é para uso exclusivo de seu destinatário e pode conter informações
privilegiadas e confidenciais. Todas as informações aqui contidas devem
ser tratadas como confidenciais e não devem ser divulgadas a terceiros
sem o prévio consentimento por escrito da Embraer. Se você não é o destinatário
não deve distribuir, copiar ou arquivar a mensagem. Neste caso, por favor,
notifique o remetente da mesma e destrua imediatamente a mensagem.