[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Why does my job not match?



On Tue, Jun 20, 2023 at 7:38âAM Jason Patton via HTCondor-users
<htcondor-users@xxxxxxxxxxx> wrote:
>
> Hi Larry,
>
> I think the problem may be in the START expression:
>
> Start = (true && (EVJobType =?= Composition))
>
> I think Composition needs to be quoted so that expression references matching the value in EVJobType against the string "Composition":
>
> Start = (true && (EVJobType =?= "Composition"))
>
> Without the quotes, the expression is referencing a (Machine) ClassAd attribute named Composition, which is undefined in the Machine ad provided.

Thank you for the reply. The issue turned out to be that we had this
in our config file:

SLOT_TYPE_3_START = ($(START) && (EVJobType =?= 'Composition'))

And it needed to be:

SLOT_TYPE_3_START = ($(START) && (EVJobType =?= "Composition"))

Double quotes instead of single.

Your reply got me looking at the quoting more carefully.


> On Tue, Jun 20, 2023 at 9:27âAM Larry Martell <larry.martell@xxxxxxxxx> wrote:
>>
>> I am submitting a job that I think should match. I have enough memory,
>> disk, and cpu and the job type and other requirements seem to be met.
>> Can anyone tell me why it's not matching?
>>
>> Output of condor_status --long, condor_q --long, and condor_q
>> -better-analyze -reverse -machine shown below.
>>
>> TIA!
>>
>> condor_status --long:
>>
>> AcceptedWhileDraining = false
>> Activity = "Idle"
>> AddressV1 = "{[ p=\"primary\"; a=\"172.20.11.75\"; port=9618;
>> n=\"Internet\"; spid=\"2683471_7ba1_3\"; noUDP=true; ], [ p=\"IPv4\";
>> a=\"172.20.11.75\"; port=9618; n=\"Internet\";
>> spid=\"2683471_7ba1_3\"; noUDP=true; ]}"
>> Arch = "X86_64"
>> AuthenticatedIdentity = "unauthenticated@unmapped"
>> CanHibernate = true
>> CheckpointPlatform = "LINUX X86_64 5.15.0-1037-aws normal N/A avx avx2
>> ssse3 sse4_1 sse4_2"
>> ChildAccountingGroup = {  }
>> ChildActivity = {  }
>> ChildCpus = {  }
>> ChildCurrentRank = {  }
>> ChildDisk = {  }
>> ChildEnteredCurrentState = {  }
>> ChildGPUs = {  }
>> ChildMemory = {  }
>> ChildName = {  }
>> ChildRemoteOwner = {  }
>> ChildRemoteUser = {  }
>> ChildRetirementTimeRemaining = {  }
>> ChildState = {  }
>> ClockDay = 1
>> ClockMin = 1351
>> COLLECTOR_HOST_STRING = "xxxx.biz"
>> CondorLoadAvg = 0.0
>> CondorPlatform = "$CondorPlatform: X86_64-Ubuntu_20.04 $"
>> CondorVersion = "$CondorVersion: 8.8.13 Mar 23 2021 BuildID:
>> Debian-8.8.13-1.1 PackageID: 8.8.13-1.1 Debian-8.8.13-1.1 $"
>> ConsoleIdle = 3600
>> CpuBusy = ((LoadAvg - CondorLoadAvg) >= 0.5)
>> CpuBusyTime = 0
>> CpuCacheSize = 36608
>> CpuFamily = 6
>> CpuIsBusy = false
>> CpuModelNumber = 85
>> Cpus = 1
>> CUDACapability = 7.5
>> CUDAClockMhz = 1590.0
>> CUDAComputeUnits = 40
>> CUDACoresPerCU = 64
>> CUDADeviceName = "Tesla T4"
>> CUDADevicePciBusId = "0000:00:1E.0"
>> CUDADeviceUuid = "81034430-ff8b-f682-edc5-86505c21f36c"
>> CUDADriverVersion = 12.0
>> CUDAECCEnabled = true
>> CUDAGlobalMemoryMb = 15102
>> CurrentRank = 0.0
>> DaemonCoreDutyCycle = 6.164174810319167E-05
>> DaemonLastReconfigTime = 1687224682
>> DaemonStartTime = 1687224682
>> DetectedCpus = 8
>> DetectedGPUs = 0
>> DetectedMemory = 31640
>> Disk = 2545577
>> EnteredCurrentActivity = 1687224689
>> EnteredCurrentState = 1687224689
>> ExpectedMachineGracefulDrainingBadput = 0
>> ExpectedMachineGracefulDrainingCompletion = 1687224689
>> ExpectedMachineQuickDrainingBadput = 0
>> ExpectedMachineQuickDrainingCompletion = 1687224689
>> FileSystemDomain = "poc.cloud.elucid.biz"
>> GPUs = 0
>> HardwareAddress = "0a:7a:f6:ec:88:b7"
>> has_avx = true
>> has_avx2 = true
>> has_sse4_1 = true
>> has_sse4_2 = true
>> has_ssse3 = true
>> HasEncryptExecuteDirectory = true
>> HasFileTransfer = true
>> HasFileTransferPluginMethods = "file,ftp,http,data,https"
>> HasIOProxy = true
>> HasJava = true
>> HasJICLocalConfig = true
>> HasJICLocalStdin = true
>> HasJobDeferral = true
>> HasMPI = true
>> HasPerFileEncryption = true
>> HasReconnect = true
>> HasSelfCheckpointTransfers = true
>> HasTDP = true
>> HasTransferInputRemaps = true
>> HasVM = false
>> HibernationLevel = 0
>> HibernationState = "NONE"
>> HibernationSupportedStates = "S4,S5"
>> IsLocalStartd = false
>> IsValidCheckpointPlatform = (TARGET.JobUniverse =!= 1 ||
>> ((MY.CheckpointPlatform =!= undefined) &&
>> ((TARGET.LastCheckpointPlatform =?= MY.CheckpointPlatform) ||
>> (TARGET.NumCkpts == 0))))
>> IsWakeAble = false
>> IsWakeOnLanEnabled = false
>> IsWakeOnLanSupported = false
>> JavaMFlops = 1533.733398
>> JavaSpecificationVersion = "11"
>> JavaVendor = "Amazon.com Inc."
>> JavaVersion = "11.0.19"
>> JobPreemptions = 0
>> JobRankPreemptions = 0
>> JobStarts = 0
>> JobUserPrioPreemptions = 0
>> KeyboardIdle = 2504
>> KFlops = 1587902
>> LastBenchmark = 1687224718
>> LastFetchWorkCompleted = 0
>> LastFetchWorkSpawned = 0
>> LastHeardFrom = 1687228292
>> LastUpdate = 1687224718
>> LoadAvg = 0.0
>> Machine = "processor.poc.cloud.elucid.biz"
>> MachineMaxVacateTime = 10 * 60
>> MachineResources = "Cpus Memory Disk Swap GPUs"
>> MaxJobRetirementTime = 0
>> Memory = 7142
>> Mips = 25819
>> MonitorSelfAge = 3608
>> MonitorSelfCPUUsage = 0.01249920167160802
>> MonitorSelfImageSize = 20556
>> MonitorSelfRegisteredSocketCount = 0
>> MonitorSelfResidentSetSize = 14116
>> MonitorSelfSecuritySessions = 6
>> MonitorSelfTime = 1687228289
>> MyAddress = "<172.20.11.75:9618?addrs=172.20.11.75-9618&noUDP&sock=2683471_7ba1_3>"
>> MyCurrentTime = 1687228292
>> MyType = "Machine"
>> Name = "slot3@xxxxxxxx"
>> NextFetchWorkDelay = -1
>> NUM_DETECTED_GPUs = 1
>> NumDynamicSlots = 0
>> NumPids = 0
>> OpSys = "LINUX"
>> OpSysAndVer = "Ubuntu20"
>> OpSysLegacy = "LINUX"
>> OpSysLongName = "Ubuntu 20.04.6 LTS"
>> OpSysMajorVer = 20
>> OpSysName = "Ubuntu"
>> OpSysShortName = "Ubuntu"
>> OpSysVer = 2004
>> PartitionableSlot = true
>> PROSERVER = "PROSERVER_PROCESSOR"
>> PslotRollupInformation = true
>> Rank = 0
>> RecentDaemonCoreDutyCycle = 5.879344050940816E-05
>> RecentJobPreemptions = 0
>> RecentJobRankPreemptions = 0
>> RecentJobStarts = 0
>> RecentJobUserPrioPreemptions = 0
>> Requirements = (START) && (IsValidCheckpointPlatform) && (WithinResourceLimits)
>> RetirementTimeRemaining = 0
>> SlotID = 3
>> SlotType = "Partitionable"
>> SlotTypeID = 3
>> SlotWeight = Cpus
>> Start = (true && (EVJobType =?= Composition))
>> StartdIpAddr = "<172.20.11.75:9618?addrs=172.20.11.75-9618&noUDP&sock=2683471_7ba1_3>"
>> StarterAbilityList =
>> "HasFileTransferPluginMethods,HasEncryptExecuteDirectory,HasVM,HasJava,HasMPI,HasFileTransfer,HasJobDeferral,HasPerFileEncryption,HasReconnect,HasTDP,HasJICLocalStdin,HasTransferInputRemaps,HasSelfCheckpointTransfers,HasJICLocalConfig"
>> State = "Unclaimed"
>> SubnetMask = "255.255.252.0"
>> TargetType = "Job"
>> TimeToLive = 2147483647
>> TotalCondorLoadAvg = 0.0
>> TotalCpus = 7.0
>> TotalDisk = 21213140
>> TotalGPUs = 0
>> TotalLoadAvg = 0.01
>> TotalMemory = 28568
>> TotalSlotCpus = 1
>> TotalSlotDisk = 2545577.0
>> TotalSlotGPUs = 0
>> TotalSlotMemory = 7142
>> TotalSlots = 3
>> TotalTimeUnclaimedIdle = 3603
>> TotalVirtualMemory = 32399928
>> UidDomain = "poc.cloud.elucid.biz"
>> Unhibernate = MY.MachineLastMatchTime =!= undefined
>> UpdateSequenceNumber = 14
>> UpdatesHistory = "00000000000000000000000000000000"
>> UpdatesLost = 0
>> UpdatesSequenced = 4112
>> UpdatesTotal = 4116
>> UtsnameMachine = "x86_64"
>> UtsnameNodename = "processor.poc.cloud.elucid.biz"
>> UtsnameRelease = "5.15.0-1037-aws"
>> UtsnameSysname = "Linux"
>> UtsnameVersion = "#41~20.04.1-Ubuntu SMP Mon May 22 18:18:00 UTC 2023"
>> VcCompSlot = true
>> VirtualMemory = 10799976
>> WakeOnLanEnabledFlags = "NONE"
>> WakeOnLanSupportedFlags = "NONE"
>> WithinResourceLimits = (ifThenElse(TARGET._condor_RequestCpus =!=
>> undefined,MY.Cpus > 0 && TARGET._condor_RequestCpus <=
>> MY.Cpus,ifThenElse(TARGET.RequestCpus =!= undefined,MY.Cpus > 0 &&
>> TARGET.RequestCpus <= MY.Cpus,1 <= MY.Cpus)) &&
>> ifThenElse(TARGET._condor_RequestMemory =!= undefined,MY.Memory > 0 &&
>> TARGET._condor_RequestMemory <=
>> MY.Memory,ifThenElse(TARGET.RequestMemory =!= undefined,MY.Memory > 0
>> && TARGET.RequestMemory <= MY.Memory,false)) &&
>> ifThenElse(TARGET._condor_RequestDisk =!= undefined,MY.Disk > 0 &&
>> TARGET._condor_RequestDisk <= MY.Disk,ifThenElse(TARGET.RequestDisk
>> =!= undefined,MY.Disk > 0 && TARGET.RequestDisk <= MY.Disk,false)) &&
>> (TARGET.RequestGPUs =?= undefined || MY.GPUs >=
>> ifThenElse(TARGET._condor_RequestGPUs =?=
>> undefined,TARGET.RequestGPUs,TARGET._condor_RequestGPUs)))
>>
>> condor_q --long
>>
>> Arguments = "args"
>> AutoClusterAttrs =
>> "_condor_RequestCpus,_condor_RequestDisk,_condor_RequestGPUs,_condor_RequestMemory,Composition,EVJobType,JobUniverse,LastCheckpointPlatform,MachineLastMatchTime,NumCkpts,Offline,RemoteOwner,RequestCpus,RequestDisk,RequestGPUs,RequestMemory,TotalJobRuntime,ConcurrencyLimits,FlockTo,Rank,Requirements,KFlops,FileSystemDomain"
>> AutoClusterId = 1
>> ClusterId = 327
>> Cmd = "cmd"
>> CommittedSlotTime = 0
>> CommittedSuspensionTime = 0
>> CommittedTime = 0
>> CondorPlatform = "$CondorPlatform: X86_64-Ubuntu_20.04 $"
>> CondorVersion = "$CondorVersion: 10.4.0 2023-04-06 BuildID: 638308
>> PackageID: 10.4.0-1.1 $"
>> CoreSize = 0
>> CumulativeRemoteSysCpu = 0.0
>> CumulativeRemoteUserCpu = 0.0
>> CumulativeSlotTime = 0
>> CumulativeSuspensionTime = 0
>> CurrentHosts = 0
>> DiskUsage = 3500
>> DiskUsage_RAW = 3376
>> EncryptExecuteDirectory = false
>> EnteredCurrentStatus = 1687224704
>> Environment = "EVCFG=/inst/web/zzzz/config.ini"
>> Err = "/inst/web/logs/vc_dont_delete/ev_327.0.err"
>> EVJobType = "Composition"
>> ExecutableSize = 3500
>> ExecutableSize_RAW = 3376
>> ExitBySignal = false
>> ExitStatus = 0
>> FileSystemDomain = "xxxx.biz"
>> GlobalJobId = "xxxx.biz#327.0#1687224704"
>> ImageSize = 3500
>> ImageSize_RAW = 3376
>> In = "/dev/null"
>> Iwd = "some_dir"
>> JobLeaseDuration = 2400
>> JobMaxRetries = 0
>> JobNotification = 0
>> JobPrio = 100000
>> JobStatus = 1
>> JobSubmitMethod = 0
>> JobUniverse = 5
>> LastRejMatchReason = "no match found "
>> LastRejMatchTime = 1687228905
>> LastSuspensionTime = 0
>> LeaveJobInQueue = false
>> MaxHosts = 1
>> MinHosts = 1
>> MyType = "Job"
>> NumCkpts = 0
>> NumCkpts_RAW = 0
>> NumJobCompletions = 0
>> NumJobStarts = 0
>> NumRestarts = 0
>> NumSystemHolds = 0
>> OnExitHold = false
>> OnExitRemove = NumJobCompletions > JobMaxRetries || ExitCode =?= 0
>> Out = "/inst/web/logs/vc_dont_delete/ev_327.0.out"
>> Owner = "prod_user"
>> PeriodicHold = false
>> PeriodicRelease = false
>> PeriodicRemove = false
>> ProcId = 0
>> QDate = 1687224704
>> Rank = 0.0
>> RemoteSysCpu = 0.0
>> RemoteUserCpu = 0.0
>> RemoteWallClockTime = 0.0
>> RequestCpus = 1
>> RequestDisk = 2048
>> RequestMemory = 2048
>> Requirements = (TARGET.VcCompSlot) && (TARGET.Arch == "X86_64") &&
>> (TARGET.OpSys == "LINUX") && (TARGET.Disk >= RequestDisk) &&
>> (TARGET.Memory >= RequestMemory) && (TARGET.FileSystemDomain ==
>> MY.FileSystemDomain)
>> ServerTime = 1687228934
>> ShouldTransferFiles = "NO"
>> StreamErr = false
>> StreamOut = false
>> TargetType = "Machine"
>> TotalSubmitProcs = 1
>> TotalSuspensions = 0
>> TransferIn = false
>> TransferInputSizeMB = 3
>> User = "prod_user@xxxxxxxxxxxxxxxxxxxxxxxxxxxx"
>> UserLog = "/inst/web/logs/vc_dont_delete/ev_327.0.log"
>>
>>
>> Here is the output of condor_q -better-analyze -reverse -machine xxxx:
>>
>> The Requirements expression for this slot is
>> (START) && (IsValidCheckpointPlatform) &&
>>         (WithinResourceLimits)  START is
>>     (true &&
>>       (EVJobType is Composition))  WithinResourceLimits is
>>     (ifThenElse(TARGET._condor_RequestCpus isnt undefined,MY.Cpus > 0 &&
>>         TARGET._condor_RequestCpus <=
>> MY.Cpus,ifThenElse(TARGET.RequestCpus isnt undefined,MY.Cpus > 0 &&
>>           TARGET.RequestCpus <= MY.Cpus,1 <= MY.Cpus)) &&
>>       ifThenElse(TARGET._condor_RequestMemory isnt undefined,MY.Memory > 0 &&
>>         TARGET._condor_RequestMemory <=
>> MY.Memory,ifThenElse(TARGET.RequestMemory isnt undefined,MY.Memory > 0
>> &&
>>           TARGET.RequestMemory <= MY.Memory,false)) &&
>>       ifThenElse(TARGET._condor_RequestDisk isnt undefined,MY.Disk > 0 &&
>>         TARGET._condor_RequestDisk <=
>> MY.Disk,ifThenElse(TARGET.RequestDisk isnt undefined,MY.Disk > 0 &&
>>           TARGET.RequestDisk <= MY.Disk,false)) &&
>>       (TARGET.RequestGPUs is undefined ||
>>         MY.GPUs >= ifThenElse(TARGET._condor_RequestGPUs is
>> undefined,TARGET.RequestGPUs,TARGET._condor_RequestGPUs)))
>>
>> This slot defines the following attributes:
>> CheckpointPlatform = "LINUX X86_64 5.15.0-1037-aws normal N/A avx avx2
>> ssse3 sse4_1 sse4_2"
>>     Cpus = 1
>>     Disk = 2545577
>>     GPUs = 0
>>     IsValidCheckpointPlatform = (TARGET.JobUniverse =!= 1 ||
>> ((MY.CheckpointPlatform =!= undefined) &&
>> ((TARGET.LastCheckpointPlatform =?= MY.CheckpointPlatform) ||
>> (TARGET.NumCkpts == 0))))
>>     Memory = 7142
>> Job 327.0 has the following attributes:    TARGET.EVJobType = "Composition"
>>     TARGET.JobUniverse = 5
>>     TARGET.NumCkpts = 0
>>     TARGET.RequestCpus = 1
>>     TARGET.RequestDisk = 2048
>>     TARGET.RequestMemory = 2048
>> The Requirements expression for this slot reduces to these conditions:
>>            Clusters
>> Step    Matched  Condition
>> -----  --------  ---------
>> [1]           0  EVJobType is Composition
>> slot3@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx: Run analysis summary of 1 jobs.
>>     0 (0.00 %) match both slot and job requirements.
>>     0 match the requirements of this slot.
>>     1 have job requirements that match this slot.
>>
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/htcondor-users/
>
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/