Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[HTCondor-users] .update.ad problems after upgrade.
- Date: Mon, 20 Jul 2020 15:02:30 -0500
- From: Amy Bush <amy@xxxxxxxxxxxxx>
- Subject: [HTCondor-users] .update.ad problems after upgrade.
Hello, htcondor peoples,
My general strategy is Never Upgrade, because upgrading always causes
problems. It's unavoidable, of course, so on Friday I upgraded from
condor 8.6.5 to condor 8.8.9. Things seemed to go well over the weekend,
possibly because nobody was submitting jobs, but that didn't last.
Currently I'm seeing a LOT of these in my log files:
07/20/20 14:56:07 (pid:10322) Failed to open '.update.ad' to read update
ad: No such file or directory (2).
I'm also having users report jobs failing. Immediately following the
line above:
07/20/20 14:56:07 (pid:10322) All jobs have exited... starter exiting
07/20/20 14:56:07 (pid:10322) **** condor_starter (condor_STARTER) pid
10322 EXITING WITH STATUS 0
>From what I've seen, this file should be created in /var/condor/execute,
which definitely exists on the node in question, and I believe the
permissions are fine:
angrist-14 14:59:24$ ls -al /var/condor/execute/
total 8
drwxr-xr-x 2 condor bin 4096 Jul 20 14:56 .
drwxr-xr-x 6 root root 4096 Jul 30 2019 ..
google has not presented me with a wealth of fellow htcondor users
having this problem upon upgrade, so at this point I'm not positive this
IS a problem? Is it THE problem that's causing these jobs to fail? What
the heck can I do to diagnose/resolve this issue?
Any help would be incredibly appreciated. The cluster is being lightly
used right now, but things may get really loud and angry if some certain
students researchers start using the cluster again right now.
--
amy