Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[condor-users] Unexplained status=128
Hi all,
I had a random unexplainable problem of programs failing with status 128
on some nodes for some time now on windows platform. I finally tracked
it down to be our main GUI running on the node. Here is what happens, we
have a main GUI application that submits jobs to Condor and reports the
result back to the user. The programs executed on each node is a batch
file that sets up the file shares and executes the worker non gui
applications.
Everything runs fine until the node that Condor tries to run the worker
programs on has the main GUI running. In that case the programs exit
with status 128(DLL not found) but the problem is all the DLL are there.
I have put directory listing in the batch commands and verified that the
files shares are mapped and all the DLL's needed are there. I have also
used "depends" to write the dependency to file that I analysed and it
verifies that all the DLL's are there at run time of the programs. The
only thing I could think of is that one of the DLL's could not be loaded
when the program is running because of its DllMain initialization or
delayed initialization. However, it is not easy to find which one since
there are a lot of our own and third party DLL's. For the time being I
have changed the submit configuration file to reschedule the job if
status 128 happens by using "On_exit_remove" criteria.
Any body has any ideas on why this happens and how to tackle this
problem? Or is there any way I could know that the GUI is running on the
node? or that the user is actively logged in even though the keyboard
and mouse are not moving?
Thanks
BTB
Condor Support Information:
http://www.cs.wisc.edu/condor/condor-support/
To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with
unsubscribe condor-users <your_email_address>