Hi Dan,Yes, by "user process" I refer to job run by condor on behalf of a user. There is no other user process, because the nodes are dedicated compute nodes.
I did try the D_LOAD setting. But I could not make out much out of the output - but the load values dont seem to agree with the the Claimed/Busy status. Here is a clipping from a node for which condor_status reports "vm1 owner/idle loadav=1.000" and "vm2 claimed/busy loadav=0.000". I request you to have a look at the output and see if you find a clue, or please advise me what to look out for.
/**** 5/9 11:34:33 Load avg: 1.00 1.00 0.96 5/9 11:34:33 vm1: LoadQueue: Adding 5 entries of value 0.0000005/9 11:34:33 vm1: LoadQueue: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0. 00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.0
0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.005/9 11:34:33 vm1: LoadQueue: Size: 60 Avg value: 0.00 Share of system load: 0.00 5/9 11:34:33 vm2: Computing percent CPU usage with pids: 3559 3561 3566 3871 3929 3930 3934 4450 5239 5240 5242 12775 5/9 11:34:33 ProcAPI: new boottime = 1145611802; old_boottime = 1145611802; /proc/stat boottime = 1145611803; /proc/uptime boo
ttime = 1145611802 5/9 11:34:33 vm2: Percent CPU usage for those pids is: 0.000000 5/9 11:34:33 vm2: LoadQueue: Adding 5 entries of value 0.0000005/9 11:34:33 vm2: LoadQueue: 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0. 00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.0
0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.005/9 11:34:33 vm2: LoadQueue: Size: 60 Avg value: 0.00 Share of system load: 0.00 5/9 11:34:33 SystemLoad: 1.000 TotalCondorLoad: 0.000 TotalOwnerLoad: 1.000
5/9 11:34:33 vm1: SystemLoad: 1.00 CondorLoad: 0.00 OwnerLoad: 1.00 5/9 11:34:33 vm2: SystemLoad: 0.00 CondorLoad: 0.00 OwnerLoad: 0.00 ***/Just to summarize my earlier mail - something seems to have changed with an OS upgrade, because with the older OS, the same node (same condor_config) a user process on each vm. Also, if I turn Hyperthreading on for the same node, two vm's run user process, and two are stuck in the "claimed/busy" state.
Thanks a lot for helping me with this. Nagaraj Dan Bradley wrote:
Nagaraj,Off the top of my head, I can't think of any reason for this change in behavior. When you say "user process" you are referring to a job run by Condor on behalf of a user, right? You are not talking about processes run by users outside of Condor. Just want to be sure I understand.You could try adding D_LOAD in your STARTD_DEBUG settings. This will show extra information about what's going on while monitoring system load.--Dan On May 8, 2006, at 12:46 PM, P. Nagaraj wrote:Hi, We have dual cpu nodes in our condor pool. Some of these run an older version of Linux (RH 7.1), and these take two user jobs as shown below There is a user process on each cpu here. CondorVersion is 6.6.5 and platforms are all Intel/Linux vm1 Claimed/Busy/LoadAv=1.000/Mem=502 vm2 Claimed/Busy/loadAv=1.020/Mem=502 When I upgrade the nodes (Scientific Linux 3), the behaviour changes, condor_config being unchanged. The VM that is running the user processshows up as vm2 below, while the vm that has no LoadAv shows up as Busy.vm1 Claimed/Busy/LoadAv=0.000/Mem=500 vm2 Owner/Idle/LoadAv=1.000/Mem=500 Refering to the vm1 just above, its classads are as show here below. CpuBusy = ((LoadAvg - CondorLoadAvg) >= 0.500000) CondorLoadAvg = 0.000000 LoadAvg = 0.000000 TotalLoadAvg = 1.000000 --this from the other vm which has LoadAv=1 TotalCondorLoadAvg = 0.000000 CpuBusyTime = 0 CpuIsBusy = FALSE State = "Claimed" Activity = "Busy"Start = (((LoadAvg - CondorLoadAvg) <= 0.300000) || (State != "Unclaimed"&& State != "Owner")) Requirements = START Why is there this mismatch - CpuisBusy="FALSE" and Activity=Busy?The Loadaverages indicate that a procss can start, but something makes theActivity="Busy". How do I find out why one vm is always Busy, which was now so before the OS upgrade? Similarly, if HT is enabled in BIOS, there are 4 vm's on a node. Two ofthese are Claimed/Busy (Loadav=0) and the other two are Owner/Idle (doinguser process, LoadAv=1). Thanks in advance for any help on this Nagaraj-- +---------------------------------- +--------------------------------------+Nagaraj Panyam | Office tel: +91-22-22782610 Dept of High Energy Physics | Office fax: +91-22-22804610 Tata Instt. of Fundamental Research| Home tel : +91-22-22804936 Mumbai - 400 005, INDIA | **Email** : pn@xxxxxxxxxxx+---------------------------------- +--------------------------------------+_______________________________________________ Condor-users mailing list Condor-users@xxxxxxxxxxx https://lists.cs.wisc.edu/mailman/listinfo/condor-users_______________________________________________ Condor-users mailing list Condor-users@xxxxxxxxxxx https://lists.cs.wisc.edu/mailman/listinfo/condor-users
-- +----------------------------------+--------------------------------------+ Nagaraj Panyam | Office tel: +91-22-22782610Dept of High Energy Physics | Office fax: +91-22-22804610 Tata Instt. of Fundamental Research| Home tel : +91-22-22804936 Mumbai - 400 005, INDIA | **Email** : pn@xxxxxxxxxxx +----------------------------------+--------------------------------------+