Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] Issue: TotalDisk is not the current amount of the free disk space on the machines
- Date: Fri, 05 Mar 2021 17:08:55 +0000
- From: Carlos Luque <carlos.luque@xxxxxx>
- Subject: Re: [HTCondor-users] Issue: TotalDisk is not the current amount of the free disk space on the machines
On 2/25/21 1:12 PM, Todd Tannenbaum wrote:
On 2/15/2021 7:31 PM, Carlos Luque wrote:
Hello all,
ÂÂÂ I'm addressing an issue about the current free disk space
detected by the daemon condor_startd. The condor version is 8.8.11
running GNU/Linux
I checked the amount of disk space on the execute machines is less
than the current disk space and/or vice versa. For example, in a
machine the TotalDisk is 4529828 KiB, but the current amount of disk
space is 74357772 KiB. In another case, the amount of disk space is
4 KiB and the TotakDisk detected is 54742440 KiB. None of machines
was running any job during the checking.
Hi Carlos,
HTCondor manages the disk space for job scratch directories. These
directories are created in the subdirectory specified by the EXECUTE
config knob (usually /var/lib/condor/execute). HTCondor assumes that
it is the only service using disk space on the volume where the
EXECUTE directory lives (enter "condor_config_val execute" to see that
path). If you have other services or users running on your nodes that
can use up significant disk space on the same volume where the EXECUTE
directory lives, it could cause problems.
Here at the University of Wisconsin, for example, our execute nodes
have a separate disk partition for EXECUTE for exclusive use by HTCondor.
When the HTCondor service is started (specifically, when the
condor_startd launches), it examines the free disk space on the volume
where EXECUTE lives and publishes that as TotalDisk. In other words,
at startup it does the equal of setting TotalDisk to:
ÂÂ df -k --output=avail `condor_config_val _execute`
HTCondor then assumes the available disk it discovered at startup what
it should manage. If something other than HTCondor consumes a lot of
space, or frees a lot of space, on the disk volume where EXECUTE lives
after HTCondor is started, that could explain the behavior you see above.
If you are using static slots, you could try putting the following in
the config:
 # Tell the condor_startd to periodically (every ~10 min) update
TotalDisk
 # based on available space on the EXECUTE volume. If this setting is
 # switched back to False (which is the default), then the startd only
 # sets TotalDisk once at startup.
 STARTD_RECOMPUTE_DISK_FREE = True
Setting STARTD_RECOMPUTE_DISK_FREE to True is not recommended with
partitionable slots. And to be honest, no matter what you do, if disk
space is tight enough that you need it carefully managed, then you
need to ensure nothing else besides jobs managed by HTCondor is
reading/writing files on the EXECUTE disk partition.
More below...
Hello Todd,
ÂÂÂ Thanks so much for your reply. The information is very valuable and
now I can understand the behavior of the variable TotalDisk.
Our machines have a large amount of hard disk space, but the user
applications use a large amount of hard disk space.
Is it possible to increase the update period? one hour or one day? Every
10 min is far too short a period for our proposal.
Moreover, the explanation of the 'Disk' attribute says 23000 = 23MiB
in the section Machine ClassAd attribute. Is it kiB or kB for the
attribute Disk ?
It is the number of bytes divided by 1024. So by the ISO 8000
standard it is KiB, and by the JEDEC standard it is KB.
OK
Could someone give me some hints to figure out this issue about the
amount of the free space in the TotalDisk?
Thanks in advanced.
Hope the above helps,
regards,
Todd
Best regards,
Carlos Luque
--
Carlos Luque
Postdoc researcher - EuroCC - Specialized Informatics Services
Instituto de AstrofÃsica de Canarias (IAC)
C/ VÃa LÃctea, s/n - 38200 - La Laguna, Tenerife, Spain
Tel: +34 922 605 200 Ext. 5547
EuroCC Spain: http://eurocc-spain.res.es
EuroCC: https://www.eurocc-access.eu
SIE-IAC: http://research.iac.es
---------------------------------------------------------------------------------------------
AVISO LEGAL: Este mensaje puede contener informaciÃn confidencial y/o privilegiada. Si usted no es el destinatario final del mismo o lo ha recibido por error, por favor notifÃquelo al remitente inmediatamente. Cualquier uso no autorizadas del contenido de este mensaje està estrictamente prohibida. MÃs informaciÃn en: https://www.iac.es/es/responsabilidad-legal
DISCLAIMER: This message may contain confidential and / or privileged information. If you are not the final recipient or have received it in error, please notify the sender immediately. Any unauthorized use of the content of this message is strictly prohibited. More information: https://www.iac.es/en/disclaimer