Hi Chris,On Thursday, 3 November, 2011 at 4:30 PM, Christopher Martin wrote:
Hi,We're getting errors in the job log files indicating that there are too many files open:...007 (196430.005.000) 11/03 08:13:00 Shadow exception!Error from slot12@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx: Failed to open '/mnt/render/jobs/job_141798_rndrgatebegin_yko_120_0400_syanye/chr_all_rp_tcrender-196430-5-stdout.txt' as standard output: Too many open files (errno 24)0 - Run Bytes Sent By Job0 - Run Bytes Received By JobThe file it's complaining about is the stdout from the job's executable. I've taken a look at the submit/scheduler machine and we're nowhere near the file limit. Same thing on the execution machine. We are however logging to a Windows share mounted to the submit/scheduler machine over CIFS. We've been experiencing extremely heavy load on the windows filer that we're logging to so I'm guessing it's a result of that but I wanted to throw this out there in case anyone else has run into similar issues before.Samba mount? I'm not particularly fond of Samba in large deployments -- it doesn't scale up well. Windows file access semantics use locks over zealously and SMB is an aging protocol, Samba can't really keep up. It usually adds up to disaster above a 200 hundred concurrent handles or so, no matter how powerful the underlying hardware.Your best bet is to move logging to local disk. You could try NFS-mounted remote but there are file lock issues on NFS to contend with as well.Regards,- Ian---Ian ChesalCycle Computing, LLCLeader in Open Compute Solutions for Clouds, Servers, and DesktopsEnterprise Condor Support and Management Tools
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/