Subject: Re: [Condor-users] core.MASTER.WIN32 and core.CRED.WIN32
While trying to figure this out I am
noticing a couple things. First, my cred service is dying on the central
manager, which throws the core.CRED.WIN32 file. If I delete this file the
service will generally restart, but sometimes I have to restart the Condor
service to get the cred service to start again.
I am also noticing that on my submit
machine a core.STARTD.WIN32 file is created and this might be related to
why jobs are remaining in idle.
However, I do not know what any of this
means. The load average on the CM is on average 30%, with spikes as high
as 70%. This seems a little high since we are not running any other services
on the server. The collector is usually at about 25% and the spikes are
caused from the other Condor services (mainly the negotiator).
My search on google for access violation
to C:\Windows\system32\ntdll.dll
and memory problems are plentiful, but because they vary and because we
were not having problems before I am not making a lot of progress trying
to figure this out. It does seem like these files are related to the inability
of jobs to match when in fact I know that machines are available.
thanks,
mike
From:
"Michael O'Donnell" <odonnellm@xxxxxxxx>
To:
Condor-Users Mail List <condor-users@xxxxxxxxxxx>
Date:
02/09/2011 03:41 PM
Subject:
[Condor-users] core.MASTER.WIN32 and
core.CRED.WIN32
Sent by:
condor-users-bounces@xxxxxxxxxxx
I have noticed on our central manager that two files are created. These
files include:
core.MASTER.WIN32 and core.CRED.WIN32
The header content of the files include:
PID: 660
Exception code: C0000005 ACCESS_VIOLATION
Fault address: 77427F1A 01:00066F1A C:\Windows\system32\ntdll.dll
If I delete the files they are re-created, and I do not recall seeing the
files in the past. Does anyone know what this access violation is about.
Could there be a problem with antivirus or something. Our pool is functioning
with the exception that all jobs remain in idle, which started after expanding
our pool from 100 cores to 200 cores (posted earlier today--[Condor-users]
Job remains in idle (worked until I increased pool size). I don't think
this is related, but I am trying to troubleshoot this.
Thank you for your help,
Mike _______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with
a
subject: Unsubscribe
You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/condor-users