Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[HTCondor-users] About idle job on Condor
- Date: Fri, 07 Apr 2017 02:14:39 +0000
- From: "Bansal, Vikas" <Vikas.Bansal@xxxxxxxx>
- Subject: [HTCondor-users] About idle job on Condor
Hi,
I am new to Condor. I tried to search in archives about my problem. I suspect it is not a new problem that I am having but I was not able to find a clear solution.
Novice question.
1. Why do I see a job in Idle state? I suspect there is no general answer to that and it depends on case to case. Is that tight?
Here is an example of an idle job
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
686.0 dirac 3/29 17:30 0+06:12:22 I 0 0.0 DIRAC_HJzfpX_pilot
As far as I can say 686.0 never ran (submitted on March 29) and has always been in Idle state.
Letâs analyze the job
$ condor_q -analyze 686.0
-- Submitter: dirac-crt.hep.pnnl.gov : <192.101.107.250:10594?noUDP> : dirac-crt.hep.pnnl.gov
Last successful match: Fri Apr 7 01:47:54 2017
The Requirements expression for your job is:
( TARGET.Name == "slot11@xxxxxxxxxxxxxxxxx" ) &&
( TARGET.Arch == "X86_64" ) && ( TARGET.OpSys == "LINUX" ) &&
( TARGET.Disk >= RequestDisk ) && ( TARGET.Memory >= RequestMemory ) &&
( ( TARGET.HasFileTransfer ) ||
( TARGET.FileSystemDomain == MY.FileSystemDomain ) )
Suggestions:
Condition Machines Matched Suggestion
--------- ---------------- ----------
1 ( TARGET.Name == "slot11@xxxxxxxxxxxxxxxxx" )1
2 ( TARGET.Arch == "X86_64" ) 1298
3 ( TARGET.OpSys == "LINUX" ) 1298
4 ( TARGET.Disk >= 45 ) 1298
5 ( TARGET.Memory >= ifthenelse(MemoryUsage isnt undefined,MemoryUsage,1) )
1298
6 ( ( TARGET.HasFileTransfer ) || ( TARGET.FileSystemDomain == "dirac-crt.hep.pnnl.gov" ) )
1298
==
So there is a match. One match as I also expect.
Letâs also see job detail. Listing only some relevant fields.
$ condor_q -l 686.0
LastRemoteHost = "slot11@xxxxxxxxxxxxxxxxx"
CondorVersion = "$CondorVersion: 8.2.10 Oct 27 2015 $"
LastRejMatchReason = "no match found"
==
Why does it say âno match foundâ?
When I look at the actual node, then it has cpu/memory available.
[cwn-o10 ~]$ top
top - 01:56:18 up 50 days, 12:50, 1 user, load average: 0.06, 0.07, 0.05
Tasks: 552 total, 1 running, 551 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 63742384 total, 35741192 free, 3025640 used, 24975552 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 60014588 avail Mem
==
What else can I look around to conclude why job is in idle state?
Any help to debug this is appreciated.
Thanks,
Vikas