While trying to figure this out I am noticing a couple things. First, my cred service is dying on the central manager, which throws the core.CRED.WIN32 file. If I delete this file the service will generally restart, but sometimes I have to restart the Condor service to get the cred service to start again.
I am also noticing that on my submit machine a core.STARTD.WIN32 file is created and this might be related to why jobs are remaining in idle.
However, I do not know what any of this means. The load average on the CM is on average 30%, with spikes as high as 70%. This seems a little high since we are not running any other services on the server. The collector is usually at about 25% and the spikes are caused from the other Condor services (mainly the negotiator).
My search on google for access violation to C:\Windows\system32\ntdll.dll and memory problems are plentiful, but because they vary and because we were not having problems before I am not making a lot of progress trying to figure this out. It does seem like these files are related to the inability of jobs to match when in fact I know that machines are available.
thanks,
mike
From: "Michael O'Donnell" <odonnellm@xxxxxxxx> To: Condor-Users Mail List <condor-users@xxxxxxxxxxx> Date: 02/09/2011 03:41 PM Subject: [Condor-users] core.MASTER.WIN32 and core.CRED.WIN32 Sent by: condor-users-bounces@xxxxxxxxxxx
I have noticed on our central manager that two files are created. These files include:
core.MASTER.WIN32 and core.CRED.WIN32
The header content of the files include:
PID: 660
Exception code: C0000005 ACCESS_VIOLATION
Fault address: 77427F1A 01:00066F1A C:\Windows\system32\ntdll.dll
If I delete the files they are re-created, and I do not recall seeing the files in the past. Does anyone know what this access violation is about. Could there be a problem with antivirus or something. Our pool is functioning with the exception that all jobs remain in idle, which started after expanding our pool from 100 cores to 200 cores (posted earlier today--[Condor-users] Job remains in idle (worked until I increased pool size). I don't think this is related, but I am trying to troubleshoot this.
Thank you for your help,
Mike _______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/