I have a strange problem with an RHEL 5 scheduler box. I've applied the
usual file system descriptor tuning settings to this box, making them
via /etc/sysctl.conf.
On boot they appear to be applied before system services are started:
May 18 07:51:52 hostname sysctl: net.ipv4.ip_forward = 0
May 18 07:51:52 hostname sysctl: net.ipv4.conf.default.rp_filter = 1
May 18 07:51:52 hostname
sysctl: net.ipv4.conf.default.accept_source_route = 0
May 18 07:51:52 hostname sysctl: kernel.sysrq = 0
May 18 07:51:52 hostname sysctl: kernel.core_uses_pid = 1
May 18 07:51:52 hostname sysctl: kernel.pid_max = 4194303
May 18 07:51:52 hostname sysctl: fs.file-max = 262144
May 18 07:51:52 hostname sysctl: net.ipv4.ip_local_port_range = 1024 65535
May 18 07:51:52 hostname network: Setting network parameters: succeeded
May 18 07:51:52 hostname network: Bringing up loopback interface: succeeded
May 18 07:51:57 hostname ifup: Enslaving eth0 to bond0
May 18 07:51:57 hostname ifup: Enslaving eth1 to bond0
May 18 07:51:57 hostname network: Bringing up interface bond0: succeeded
May 18 07:52:17 hostname hpsmhd: smhstart startup succeeded
May 18 07:52:17 hostname condor: Starting up Condor
May 18 07:52:17 hostname rc: Starting condor: succeeded
May 18 07:52:17 hostname crond: crond startup succeeded
But the scheduler on the box, after boot, will still hit file descriptor
limits before it's even close to running as many jobs as it can handle:
5/18 08:12:52 Return from Handler <to startd <10.10.10.242:4208>>
5/18 08:12:52 Starting add_shadow_birthdate(1287113.13)
5/18 08:12:52 Started shadow for job 1287113.13 on "<10.10.10.35:1188>",
(shadow pid = 19732)
**** PANIC -- OUT OF FILE DESCRIPTORS at line 781 in dprintf.c
The strange thing is: restarting Condor at this point fixes the problem.
The scheduler can grow running jobs well beyond the point where it hit
that file descriptor limit the first time. It's as if the file
descriptor settings weren't in place when the Condor processes were
started up on boot.
Anyone else ever run in to a problem like this before?
Regards,
- Ian
--
Ian Chesal
ichesal@xxxxxxxxxxxxxxxxxx
http://www.cyclecomputing.com/
------------------------------------------------------------------------
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/