Nagaraj,Condor reads /proc/<pid>/stat to get the CPU utilization of a process. I've observed that this works under Scientific Linux 3, so I don't know why it is reporting no load for your processes.
Looking at the code that reads /proc/<pid>/stat, I see that some failures to read the file are only reported if you add D_FULLDEBUG to your STARTD_DEBUG list. Could you please do that and see if there are errors containing the string "ProcAPI::getProcInfo()"?
Also, does the list of pids that it is checking make sense? In your log below, there is a message "Computing percent CPU usage with pids: 3559 3561 3566 3871 3929 3930 3934 4450 5239 5240 5242 12775". Presumably one or more of these was generating the cpu load. Can you please verify that? The question I am wondering about is whether the problem is the reading of process stat information or whether it is the list of pids itself that is the trouble. If there are pids associated with the job that are generating cpu load but which are not in the list, that would cause a problem similar to what you are observing. If you just use 'top' to see the top few pids and verify that these are in the list being watched by Condor, that should be enough to see if this is really the problem or not.
--Dan Nagaraj Panyam wrote:
Hi Dan,Yes, by "user process" I refer to job run by condor on behalf of a user. There is no other user process, because the nodes are dedicated compute nodes.I did try the D_LOAD setting. But I could not make out much out of the output - but the load values dont seem to agree with the the Claimed/Busy status. Here is a clipping from a node for which condor_status reports "vm1 owner/idle loadav=1.000" and "vm2 claimed/busy loadav=0.000". I request you to have a look at the output and see if you find a clue, or please advise me what to look out for./**** 5/9 11:34:33 Load avg: 1.00 1.00 0.96 5/9 11:34:33 vm1: LoadQueue: Adding 5 entries of value 0.0000005/9 11:34:33 vm1: LoadQueue: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0. 00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.005/9 11:34:33 vm1: LoadQueue: Size: 60 Avg value: 0.00 Share of system load: 0.00 5/9 11:34:33 vm2: Computing percent CPU usage with pids: 3559 3561 3566 3871 3929 3930 3934 4450 5239 5240 5242 12775 5/9 11:34:33 ProcAPI: new boottime = 1145611802; old_boottime = 1145611802; /proc/stat boottime = 1145611803; /proc/uptime boottime = 1145611802 5/9 11:34:33 vm2: Percent CPU usage for those pids is: 0.000000 5/9 11:34:33 vm2: LoadQueue: Adding 5 entries of value 0.0000005/9 11:34:33 vm2: LoadQueue: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0. 00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.005/9 11:34:33 vm2: LoadQueue: Size: 60 Avg value: 0.00 Share of system load: 0.00 5/9 11:34:33 SystemLoad: 1.000 TotalCondorLoad: 0.000 TotalOwnerLoad: 1.0005/9 11:34:33 vm1: SystemLoad: 1.00 CondorLoad: 0.00 OwnerLoad: 1.00 5/9 11:34:33 vm2: SystemLoad: 0.00 CondorLoad: 0.00 OwnerLoad: 0.00 ***/Just to summarize my earlier mail - something seems to have changed with an OS upgrade, because with the older OS, the same node (same condor_config) a user process on each vm. Also, if I turn Hyperthreading on for the same node, two vm's run user process, and two are stuck in the "claimed/busy" state.Thanks a lot for helping me with this. Nagaraj Dan Bradley wrote:Nagaraj,Off the top of my head, I can't think of any reason for this change in behavior. When you say "user process" you are referring to a job run by Condor on behalf of a user, right? You are not talking about processes run by users outside of Condor. Just want to be sure I understand.You could try adding D_LOAD in your STARTD_DEBUG settings. This will show extra information about what's going on while monitoring system load.--Dan On May 8, 2006, at 12:46 PM, P. Nagaraj wrote:Hi, We have dual cpu nodes in our condor pool. Some of these run an older version of Linux (RH 7.1), and these take two user jobs as shown below There is a user process on each cpu here. CondorVersion is 6.6.5 and platforms are all Intel/Linux vm1 Claimed/Busy/LoadAv=1.000/Mem=502 vm2 Claimed/Busy/loadAv=1.020/Mem=502 When I upgrade the nodes (Scientific Linux 3), the behaviour changes, condor_config being unchanged. The VM that is running the user processshows up as vm2 below, while the vm that has no LoadAv shows up as Busy.vm1 Claimed/Busy/LoadAv=0.000/Mem=500 vm2 Owner/Idle/LoadAv=1.000/Mem=500 Refering to the vm1 just above, its classads are as show here below. CpuBusy = ((LoadAvg - CondorLoadAvg) >= 0.500000) CondorLoadAvg = 0.000000 LoadAvg = 0.000000 TotalLoadAvg = 1.000000 --this from the other vm which has LoadAv=1 TotalCondorLoadAvg = 0.000000 CpuBusyTime = 0 CpuIsBusy = FALSE State = "Claimed" Activity = "Busy"Start = (((LoadAvg - CondorLoadAvg) <= 0.300000) || (State != "Unclaimed"&& State != "Owner")) Requirements = START Why is there this mismatch - CpuisBusy="FALSE" and Activity=Busy?The Loadaverages indicate that a procss can start, but something makes theActivity="Busy". How do I find out why one vm is always Busy, which was now so before the OS upgrade? Similarly, if HT is enabled in BIOS, there are 4 vm's on a node. Two ofthese are Claimed/Busy (Loadav=0) and the other two are Owner/Idle (doinguser process, LoadAv=1). Thanks in advance for any help on this Nagaraj-- +---------------------------------- +--------------------------------------+Nagaraj Panyam | Office tel: +91-22-22782610 Dept of High Energy Physics | Office fax: +91-22-22804610 Tata Instt. of Fundamental Research| Home tel : +91-22-22804936 Mumbai - 400 005, INDIA | **Email** : pn@xxxxxxxxxxx+---------------------------------- +--------------------------------------+_______________________________________________ Condor-users mailing list Condor-users@xxxxxxxxxxx https://lists.cs.wisc.edu/mailman/listinfo/condor-users_______________________________________________ Condor-users mailing list Condor-users@xxxxxxxxxxx https://lists.cs.wisc.edu/mailman/listinfo/condor-users