[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] startd_cron jobs that get stuck



Hi Christoph.
The test should be a little bit different because when you are testing the mount it's a syscall that never returns.

The logic of the test should be something like that.

When start 
If the temp_file not exists create it with the current time as timestamp else look at the timestamp of the temp_file if its less than 1 hour ago change the classad to faulty and exit. If above 1 hour delete the temp_file and continue the test. 

Do the test

Delete the temp_file 
Exit


This way the script will hang but the next run will update the classad. 
Once there is a problem with the mount it will actually display for minimum of 1 hour. 

Please let me know if you have any questions. 
Thanks David



Get Outlook for Android


From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Beyer, Christoph <christoph.beyer@xxxxxxx>
Sent: Friday, December 20, 2024 1:14:29 PM
To: htcondor-users <htcondor-users@xxxxxxxxxxx>
Subject: [HTCondor-users] startd_cron jobs that get stuck


Hi,

amongst others we do some FS-checks in startd_cron in order to make sure the mounted filesystems are responsive.

In the rare case of failure - currently CVMFS is problematic the test hangs. I tried to use the 'timeout' util in bash to wrap these checks but it does not work as expected.

For a dirty solution I tried to monitor the check in question by adding some logic to another more robust check but it seems as if once a startd_cron job is stuck the other job is unable to propagate actual startd classadds ?

Best
christoph

--
Christoph Beyer
DESY Hamburg
IT-Department

Notkestr. 85
Building 02b, Room 009
22607 Hamburg

phone:+49-(0)40-8998-2317
mail: christoph.beyer@xxxxxxx
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe

The archives can be found at: https://www-auth.cs.wisc.edu/lists/htcondor-users/