[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] /var/lib/condor/spool usage



On 4/1/2015 4:08 PM, Dimitri Maziuk wrote:
On 04/01/2015 02:57 PM, Richard Pieri wrote:

If you need to reserve capacity for privileged processes on ext2/3/4
then 'tune2fs -m X /mount/point' will reserve X% of the file system's
capacity for root.
My disk use on / went from ~70% @ 11:50 to 100% @ 12:10 this afternoon.
The node stayed up: ext4 reserves enough blocks for root by default to
keep it up for some time, but condor daemons keeled over. So the issue
is, can you tell condor to not kill itself?

I think the only daemons that write to spool are the schedd (and 
shadows) and the negotiator.  The point is that the condor_master does 
should not write to SPOOL.  So if the filesystem specified by SPOOL 
fills, the schedd and/or negotiator may exit, but in that case the 
condor_master should keep running and periodically attempt to restart 
the schedd and/or negotiator.
So now the question becomes can the schedd and/or negotiator keep 
running if SPOOL fills?  Well, these two daemons have persistent state 
that must be kept, i.e. the job queue for the schedd, and the accountant 
information for the negotiator.  Currently these daemons shut down if 
they cannot safely write this information (and the condor_master will 
attempt to periodically restart them); are you hoping for a mode where, 
for instance, the schedd would keep running without logging queue 
information to disk (so that if the schedd restarted, all that job 
information would be lost)?  Perhaps of interest is the JOB_QUEUE_LOG 
config knob that allows you to put the job queue on a volume other than 
SPOOL -- we use this to increase performance on submit hosts which have 
a solid state drive which is big enough to hold the frequently written 
to job_queue.log, but not big enough to hold the whole contents of the 
spool directory.
Currently the schedd makes a subdirectory in SPOOL for each running job 
that holds intermediate checkpoint files if the submit file for the job 
requests ON_EXIT_OR_EVICT for when_to_transfer_output.  I've long wanted 
the option to store these intermediate files in the home directory of 
the user instead of SPOOL so that the space for these intermediate files 
comes out of that user's own disk quota...
regards,
Todd


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/


--
Todd Tannenbaum <tannenba@xxxxxxxxxxx> University of Wisconsin-Madison
Center for High Throughput Computing   Department of Computer Sciences
HTCondor Technical Lead                1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132                  Madison, WI 53706-1685