Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] assertion, hang
- Date: Mon, 20 Mar 2006 14:06:24 -0500
- From: "Bryan S. Maher" <Bryan.Maher@xxxxxxxxxx>
- Subject: Re: [Condor-users] assertion, hang
Matt,
I don't know if a reboot is required for this to take effect or not. I
know on our machines, the modal dialog was always visible somewhere - I
can't remember if you had to log in first or if it appeared over top of
the login dialog. It's been two years since I did all initial tweaking
of my execution nodes so I'm fuzzy on some of the details. By the way,
we use value = 1, I don't know why we chose that over 2 but that's what
we use.
Try downloading pslist.exe from sysinternals.com; it' free. In fact,
get all their free tools, they are indispensable. Log onto a machine
with the assertion failure and run pstools -t to get a process tree.
Look at the process tree for condor_starter to see if there is anything
else attached to it like a debugger.
Let me know what you find either way.
-Bryan
-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx
[mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Matthew Galati
Sent: Monday, March 20, 2006 12:20 PM
To: Condor-Users Mail List
Subject: Re: [Condor-users] assertion, hang
I tried changing this to value=2. However, it still seems to hang when I
hit an assertion failure. I logged onto the process nodes that had the
assertion failure. The e.xe is still in the process list, but there was
no dialog box that I could see. So, the registry change did not seem to
help. Do I need to reboot after registry change?
Matt
> -----Original Message-----
> From: condor-users-bounces@xxxxxxxxxxx
> [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Bryan S. Maher
> Sent: Monday, March 20, 2006 10:02 AM
> To: Condor-Users Mail List
> Subject: Re: [Condor-users] assertion, hang
>
> Matt,
>
> In addition to the UNC cmd.exe registry change, we also make
> a registry change that suppresses the windows modal dialog
> popup that occurs when an application crashes. I believe
> your application may be triggering this modal dialog and
> until the dialog is cleared, Condor will think the
> application is still running.
>
> Change the value of the following registry key:
>
> HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Windows\ErrorMode
>
> MS KB229012 explains the value settings:
>
> http://support.microsoft.com/?scid=http%3a%2f%2fwww.support.mi
> crosoft.co
> m%2fkb%2f229012%2fen-us%2f
>
> Hope this helps.
>
> -Bryan
>
> -----Original Message-----
> From: condor-users-bounces@xxxxxxxxxxx
> [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Matthew Galati
> Sent: Friday, March 17, 2006 10:14 PM
> To: Condor-Users Mail List
> Subject: [Condor-users] assertion, hang
>
> My condor pool consists of a set of machines running Windows
> 2003 Server. All of my input, executables and output are on a
> shared windows drive. Here is part of my sub:
>
> ====
> environment =
> PATH=\\ordsrv3\ormpdata\bin\WinXP-Debug;c:\WINDOWS\system32;c:
> \WINNT\sys
> tem32
> executable = condor_exec.bat
> initialdir = \\ordsrv3\ormpdata\milprun\test_win
> transfer_executable = false
> should_transfer_files = NO
> requirements = (OpSys=="WINNT52")
>
> output = 10teams.out
> error = 10teams.err
> log = 10teams.log
> universe = vanilla
> arguments = --parm \\ordsrv3\ormpdata\parm\milpwin.parm
> --instance 10teams queue 1
>
>
> output = 22433.out
> error = 22433.err
> log = 22433.log
> universe = vanilla
> arguments = --parm \\ordsrv3\ormpdata\parm\milpwin.parm --instance
> 22433
> queue 1
> ====
>
> I am using condor_exec.bat as a wrapper to my executable. If
> I try to run the executable directly, I get Shadow Exception
> at "CreateProcess".
> The .bat file was suggested on this mailing list - it seems to work.
>
> condor_exec.bat:
> \\ordsrv3\ormpdata\bin\WinXP-Debug\exemilpNET.exe %*
>
>
> If my executable dies due to an assertion failure (this is a
> C app, using assert( )), then the failure correctly reports
> to stderr. However, the job seems to get hung. That is, it
> stays in the condor queue indefinitely, as if condor does not
> know that it is done - even after the assertion. Is there
> some way to handle this situation? I want condor to treat the
> assertion as a completion so that it moves on to the next in
> the queue.
>
> Thanks,
> Matt
>
>
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-users