[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor 7.7.5: MasterLog is flooded with ProcAPI errors...



Dear Lukas,

In my Linux Fedora 16/17 OS, it is stated that:


comm %s     The  filename of the executable, in parentheses. [...zip...]


It is not garanteed that there is no space in the string "%s", but the filename must be between parentheses.

Filenames and executables can, in principle, have spaces in their names.
I'm not sure if this is also true for parentheses......but spaces are more likely than parentheses....

Hence, I wonder if the "no spaces in the filename executable" is a wrong assumption by Condor, although mostly true.
But in this particular case this wrong assumption floods the MasterLog with endless amounts of ProcAPI lines.

Maybe using the rule that the filename is in parentheses might be a safer assumption for Condor?

Regards,
Rob.



----- Original Message -----
From: Lukas Slebodnik <slebodnik@xxxxxxxx>
To: condor-users@xxxxxxxxxxx
Cc: 
Sent: Thursday, May 24, 2012 4:40 PM
Subject: Re: [Condor-users] Condor 7.7.5: MasterLog is flooded with ProcAPI errors...

Problem occurred while parsing stat file "/proc/999/stat"
Condor expects string without white spaces between parentheses.

>999 (ddclient - slee) S 1 988 988 0 -1 4202560 29441 202449 0 0 69 54 90 208 20 0 1 0 5837 10022912 1249 4294967295 134512640 134516240 3217069360 3217068344 8217622 0 0 128 16385 3225853652 0 0 17 0 0 0 0 0 0 134520336 134520752 152264704

But there is question: Why are there white spaces?
Because according to manual pages "man 5 proc", there should be:

   The  filename  of  the  executable,  in parentheses.
   This is visible whether or  not  the  executable  is
   swapped out.

Regards,
Lukas

On Wed, May 23, 2012 at 08:37:12PM -0700, Rob wrote:
> Hi,
> 
> After starting the Master daemon, the MasterLog looks like this:
> 
> 
> 05/24/12 12:20:22 ******************************************************
> 05/24/12 12:20:22 ** condor_master (CONDOR_MASTER) STARTING UP
> 05/24/12 12:20:22 ** /usr/sbin/condor_master
> 05/24/12 12:20:22 ** SubsystemInfo: name=MASTER type=MASTER(2) class=DAEMON(1)
> 05/24/12 12:20:22 ** Configuration: subsystem:MASTER local:<NONE> class:DAEMON
> 05/24/12 12:20:22 ** $CondorVersion: 7.7.5 Mar 07 2012 $
> 05/24/12 12:20:22 ** $CondorPlatform: I686-Fedora_16 $
> 05/24/12 12:20:22 ** PID = 3347
> 05/24/12 12:20:22 ** Log last touched 5/24 12:20:07
> 05/24/12 12:20:22 ******************************************************
> 05/24/12 12:20:22 Using config source: /etc/condor/condor_config
> 05/24/12 12:20:22 Using local config sources: 
> 05/24/12 12:20:22    /etc/condor/config.d/00personal_condor.config
> 05/24/12 12:20:22    /etc/condor/config.d/01personal_condor.config
> 05/24/12 12:20:22 DaemonCore: command socket at <25.125.10.62:45300>
> 05/24/12 12:20:22 DaemonCore: private command socket at <25.125.10.62:45300>
> 05/24/12 12:20:22 Setting maximum accepts per cycle 8.
> 05/24/12 12:20:22 Started DaemonCore process "/usr/sbin/condor_collector", pid and pgroup = 3348
> 05/24/12 12:20:22 Waiting for /var/log/condor/.collector_address to appear.
> 05/24/12 12:20:23 Found /var/log/condor/.collector_address.
> 05/24/12 12:20:23 Started DaemonCore process "/usr/sbin/condor_negotiator", pid and pgroup = 3349
> 05/24/12 12:20:23 Started DaemonCore process "/usr/sbin/condor_schedd", pid and pgroup = 3350
> 05/24/12 12:20:24 ProcAPI: Unexpected short scan on /proc/999/stat, errno: 11.
> 05/24/12 12:20:24 ProcAPI: Unexpected short scan on /proc/999/stat, errno: 11.
> 05/24/12 12:20:24 ProcAPI: Unexpected short scan on /proc/999/stat, errno: 11.
> 05/24/12 12:20:24 ProcAPI: Unexpected short scan on /proc/999/stat, errno: 11.
> 05/24/12 12:20:24 ProcAPI: Unexpected short scan on /proc/999/stat, errno: 11.
> 
> and zillions more of the last lines continue to flood the MasterLog at a rate of 5 lines per second.
> 
> The program with PID 999 has nothing to do with Condor. The file /proc/999/stat contains
> 
> 999 (ddclient - slee) S 1 988 988 0 -1 4202560 29441 202449 0 0 69 54 90 208 20 0 1 0 5837 10022912 1249 4294967295 134512640 134516240 3217069360 3217068344 8217622 0 0 128 16385 3225853652 0 0 17 0 0 0 0 0 0 134520336 134520752 152264704
> 
> If relevant:
> this is currently happening on two computers with kernels 3.3.4-1.fc16 and 3.3.6-3.fc16; the first is the central Condor master (with a large pool), the second is a Condor master (without a pool) that flocks to the first one.
> 
> 
> Why is this happening?
> Or is this an indication that something is going wrong elsewhere?
> Any suggestions?
> 
> 
> Rob.
> 
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
> 
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/