HTCondor Project List Archives



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-devel] Which manufacturer/version of C/C++ compiler didyou use?



>On Tue, Mar 24, 2009 at 12:33:45PM -0400, Ian Chesal wrote:
>> >On Tue, Mar 24, 2009 at 11:01:19AM -0400, Ian Chesal wrote:
>> >> I got asked the strangest question today by upper management:
>> >>
>> >> Which manufacture/version of C/C++ compiler did you use?
Specifically
>> >> for: 6.8.6, 7.0.2 and 7.2.1.
>> >
>> > Which OS/arch ports of Condor? It is a wide variety. We should be
able
>> > to get this info for you, but it is only really useful for a couple
>> > of reasons:
>> >
>> > 1. What g++ runtimes need to be installed on machines where Condor
is
>> run.
>> > 2. People are using condor_compile or linking with libcondorapi.
>>
>> Opps! Thought I typed that in there: I'm only interested in the
>> compilers used to build the Windows ports.
>
> Ah, in that case:
>
> 7.2.1 was VC++ 9 (or 2008)
> 6.8.6 and 7.0.2 was VC++ 6
>
> Have a nice day.

Sigh. I wish. I knew the Director was asking for some malicious reason.

After, quite literally, millions of jobs run successfully through our
Condor pool there's still a handful of QA people here who like to blame
the pool first for all crashes, and our software last.

Here's my situation: we've got crash in our software when we run a QA
job on our Condor pool. They Director is trying "gather evidence" to
prove that the crash is being caused by Condor putting an out-of-date
runtime library into memory that is then being used by our software
(hence crashing the code).

Something along the lines of: 6.8.6 is using GeneralWindowsAPI::foo()
from the MS runtime library. So it gets loaded into memory. And when it
spawn the thread that spawns the thread that eventually spawns the
thread that runs our job the out-of-data foo() is being used instead of
the new foo() that our software expects.

Now I'm fairly certain Condor isn't spawning the thread in the same
process space that condor_starter or condor_startd is running in. Is
this true?

How can I demonstrate to this Director that Condor is giving the cmd it
spawns it's own process space? So it loads its own DLLs? Here's a
typical process tree for our jobs:

C:\WINDOWS>ps -ef | grep condor
  SYSTEM   1768    760  0   Mar 08 con  5:04
d:/abc//condor/bin/condor_master.exe
  SYSTEM   1844   1768  0   Mar 08 con  5h01 condor_startd.exe -f
  SYSTEM   2804   1844  0 13:44:34 con  0:01 condor_starter -f -a vm1
ttc-schedd1.altera.com
       0   2824   2804  0 13:44:36 con  0:00 condor_exec.exe /Q /C
condor_exec.bat
/data/gquan/abc/qor/91080/stratixiv/wc-new/no_sweep_parameter/polyphased
dc
       0   2132   2824  0 13:44:36 con  0:00 perl -x -S condor_exec.bat
/data/gquan/abc/qor/91080/stratixiv/wc-new/no_sweep_parameter/polyphased
dc
       0   2708   2880  0 13:57:30 con 29:38
D:\abc\condor\execute\dir_2804\quartus\bin\quartus_fit.EXE polyphaseddc
--import_settings_files=on --export_settings_files=off

All our jobs are run via Perl scripts. And the executable that's
currently crashing is quartus_fit.exe.

I mean: the easy defence to show that this isn't Condor-related is that
it doesn't crash on *all* machines. Just *some* machines. And *all*
machines are running Condor 6.8.6 in this particular pool.

But I really want to end this type of dialog once and for all. Any help
here to help me show that Condor isn't forcing the cmd's it spawns to
inherit its runtime linking dependencies is greatly appreciated.

Welcome to the corporate world. :)

Warm regards,
- Ian

Confidentiality Notice.
This message may contain information that is confidential or otherwise protected from disclosure. If you are not the intended recipient, you are hereby notified that any use, disclosure, dissemination, distribution,  or copying  of this message, or any attachments, is strictly prohibited.  If you have received this message in error, please advise the sender by reply e-mail, and delete the message and any attachments.  Thank you.