I have not examined the time intervals of the Quill daemons dying for our
pool, but I get hundreds of emails stating the quill daemon died and has
restarted on each machine. I have been trying to get Quill to work with
Windows as well, and I have been posting on this topic to this list. I
mentioned earlier that I have postgres database on the same server as our
CM. I was going to try installing postgress on a different server, but I
have not gotten around to this yet. I am pretty sure this is not the
problem, but it is something for me to try. I also have noticed that the
Quill daemon on our CM does not seem to die, but the Quill daemons on all
working nodes die on a regular basis. I have not determined why this is
the case, and the only difference is my OS. Our server is using server
2008 and our working nodes are 32/64bit windows xp and windows 7.
Mike
From:
<Greg.Hitchen@xxxxxxxx>
To:
<condor-users@xxxxxxxxxxx>
Date:
08/25/2010 08:07 PM
Subject:
Re: [Condor-users] Quill++ assistance
Sent by:
condor-users-bounces@xxxxxxxxxxx
That's correct, no other daemons are restarting, just condor_quill.
Interestingly, now that I have installed this version onto another
few PCs, the 1hr 25min is not EXACT. Two PCs that I "synched" yesterday
by restarting condor at the same time are now 2-3 minutes apart on
their condor_quill restarts. Maybe the condor_master restarting
condor_quill after 10secs isn't exact and the time diff gradually builds
up? I'll keep an eye on it.
Cheers
Greg
-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx [
mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Erik Paulson
Sent: Thursday, 26 August 2010 4:16 AM
To: Condor-Users Mail List
Subject: Re: [Condor-users] Quill++ assistance
And just to confirm, it's only Quill - none of the other daemons show
the same restart every hour and twenty-five minutes?
-Erik
On Wed, Aug 25, 2010 at 1:12 AM, <Greg.Hitchen@xxxxxxxx> wrote:
Hi Erik
The 1hr 25 mins is definitely not related (as far as I can tell) to
virus
scans/server activity/etc.
I've checked all the scheduled type of activities that our PCs get
installed
with and nothng "fits".
In addition I have installed 7.4.3 onto several PCs now and they all
exhibit
the 1hr 25 restart
of condor_quill and it always starts exactly 1 hr 25 mins after condor
is
started, i.e. anytime
I do a condor net stop, condor net start on them then the first of the
1hr
25mins restarts
begins 1 hr 25mins after this.
There is a dprintf_failure.QUILL file created but it is empty and 0
bytes in
size.
No core file is created and condor_quill quite happily gets restarted by
condor_master after
10 secs until the MasterLog again says it exits with error 44 after the
next
1hr 25 mins.
Nothing gets logged in the QuillLog.
Cheers
Greg
________________________________
From: condor-users-bounces@xxxxxxxxxxx
[mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Erik Paulson
Sent: Tuesday, 24 August 2010 3:46 AM
To: Condor-Users Mail List
Subject: Re: [Condor-users] Quill++ assistance
Greg: The "exit 44" issue is odd - status 44 means that Condor couldn't
log
some piece of information (which is why you don't see anything in the
logs
:). While I wouldn't rule anything in Condor out, 1:25:00 is not a
number
that strikes me as special in any of the Condor code, so I'm not sure
what
would happen on the Condor side with that periodicity. Are there any
file
server/virus scans/etc sort of activity that might interfere with writes
to
files that happen at your site?
Greg/Michael: the ACCESS_VIOLATION is happening in a strange spot. To
answer
your question, the Quill daemon should run continuously - however, if it
is
consistently crashing, the master will exponentially back off trying to
run
it until it only tries once an hour - so it may be likely that you'll
see a
core file with no Quill daemon running.
If that's the case and it is consistently crashing, I would love to see
your
full QuillLog, along with your sql.log file. We should be able to play
it
back and see exactly why it's crashing.
Thanks,
-Erik
On Wed, Aug 11, 2010 at 8:48 PM, <Greg.Hitchen@xxxxxxxx> wrote:
Perhaps not much help Michael but we've had similar problems with 7.2.4
on
windows
(see first attached email). It behaved somewhat better for 7.4.1 (see
second attached email)
and at least ran, even though restarting condor_quill every 1hr 25mins,
but a number of other
problems/issues with the 7.4 series has not allowed us to upgrade to
that
version yet.
Cheers
Greg
________________________________
From: condor-users-bounces@xxxxxxxxxxx
[mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Michael
O'Donnell
Sent: Thursday, 12 August 2010 3:56 AM
To: Condor-Users Mail List
Subject: Re: [Condor-users] Quill++ assistance
I have these specified already and I do not see any issues. The
quilllog
file show SQL statements and success at populating the tables.
However, I am finding a file on all machine other than the central
manager
that has an access violation error. I am not sure if the
condor_quill.exe
daemon is supposed to run continuously, but I do not see it running on
any
machines other than the central manager.
The file that is showing up in the log directory on each machine is
called
core.QUILL.WIN32. Its contents are (Does this mean anything to anyone
else):
<...>
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with
a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/