Fortunately, I got this file
$ cat dprintf_failure.SCHEDD dprintf() had a fatal error in pid 2716 Error writing debug log errno: 27 (File too large) euid: 37002, ruid: 0
for i in `pgrep -u condor condor_schedd`; do echo "pid: $i"; cat /proc/$i/limits ; done
pid: 6475
Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 10485760 unlimited bytes
Max core file size unlimited unlimited bytes
Max resident set unlimited unlimited bytes
Max processes 137215 256698 processes
Max open files 4096 131072 files
Max locked memory 65536 65536 bytes
Max address space unlimited unlimited bytes
Max file locks unlimited unlimited locks
Max pending signals 256698 256698 signals
Max msgqueue size 819200 819200 bytes
Max nice priority 0 0
Max realtime priority 0 0
Max realtime timeout unlimited unlimited us
pid: 25338
Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 10485760 unlimited bytes
Max core file size unlimited unlimited bytes
Max resident set unlimited unlimited bytes
Max processes 137215 256698 processes
Max open files 4096 131072 files
Max locked memory 65536 65536 bytes
Max address space unlimited unlimited bytes
Max file locks unlimited unlimited locks
Max pending signals 256698 256698 signals
Max msgqueue size 819200 819200 bytes
Max nice priority 0 0
Max realtime priority 0 0
Max realtime timeout unlimited unlimited us
No definitely not. There is ample disk space.condor_version is, 8.1.1 Sep 11 2013 BuildID: 171174 BTWOn Wed, Feb 5, 2014 at 4:56 PM, Greg Thain <gthain@xxxxxxxxxxx> wrote:On 02/05/2014 03:52 PM, Rita wrote:Is it possible the disk the log is on is full? Condor daemons will refuse to run if the log disk partition can't be written to.
It seems the 8.0 scheduler is crashing (condor_schedd dies).
I nailed it down to the SchedLog.
basically, if I recreate the SchedLog ( > ScheddLog) and restart all condor processes everything resumes again.
-Greg
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@cs.wisc.edu with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
--
--- Get your facts first, then you can distort them as you please.--