[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-devel] alternatives to #define for GCB support
- Date: Tue, 03 May 2005 17:38:18 -0500
- From: Derek Wright <wright@xxxxxxxxxxx>
- Subject: [Condor-devel] alternatives to #define for GCB support
disclaimer: this is long, somewhat complicated, and probably
uninteresting to most of you. however, i figured i should err on the
side of including more people so that folks with bright ideas could
chime in, or interested parties could learn something. if you don't
care, don't know, or are too busy, please ignore this message. don't
feel the need to reply, since todd and sonny and i are already working
on it. but, if anyone a) has a chance to read this and b) has any
comments, please let me know. thanks,
-d
p.s. 10,000 foot view of the problem: to make condor work with GCB, we
need to convert all syscalls that touch network sockets to use
gcb-aware versions of those methods.
p.p.s. i'll include sonny's original message on why not use #define at
the very bottom of this for the interested reader...
------- Forwarded Message
To: Se-Chang Son <sschang@xxxxxxxxxxx>
Subject: alternatives to #define for GCB support
cc: Todd Tannenbaum <tannenba@xxxxxxxxxxx>
Date: Tue, 03 May 2005 17:21:00 -0500
From: Derek Wright <wright@xxxxxxxxxxxxxxxxx>
i sent todd your email about why you don't think #define will work for
converting condor to use GCB. we agree. however, we brainstormed
some other possible solutions that we wanted to talk with you about.
are you free on friday (5/6) at 1pm CDT to talk about this stuff with
todd and myself? you could just go to todd's office and y'all can
call me. let me know if that time/day works ok for you (that's the
first time todd was available).
in the meantime, here's some stuff to think about...
there are 2 giant problems with "porting" condor to use GCB by the
method currently checked into the latest GCB branch:
1) it's a huge code-maintainance pain. we'll have tons of cvs
conflicts. it will be easy for people to forget to use the
"Generic_*" version of a method and break GCB support in a subtle
way that would be very difficult to diagnose and solve, etc, etc.
2) it doesn't work at all with externals we link against (e.g. globus,
kerberos, etc). for example, we can't claim condor is "GCBized" if
once you turn on kerberos authentication, there are socket calls
that don't know anything about GCB. :( having a GCB that only
works if you don't use globus or kerberos is worse than a GCB that
only works on a few platforms.
todd and i came up with 2 possible alternatives. they both address
these problems, though each has problems of their own. ;) i wrote up a
summary of our discussion and the proposed alternatives, including a
list of known pros + cons (as i see them). please read this over and
let me know what you think. if you agree that these seem reasonable,
i'm hoping you can spend some time this week "playing" with the
proposals and see in practice how difficult or easy they would be to
implement. sound good? thanks!
-d
------------------------------------------------------------
1) make libsocketstubs.a that we put on the link line right before we
include libc.a, libsocket.a, etc. this would basically be what
libgeneric.a does now, except all the functions would be named with
the "real" function names. we wouldn't change any code in condor,
we'd just let the linker resolve the functions correctly. in
libsocketstubs.a, we'd have seperate .o files for each function,
and those would look something like:
int
send( int s, const void *msg, size_t len, int flags )
{
if( is_gcb() ) {
return gcb_send( s, msg, len, flags );
} else {
return SEND( s, msg, len, flags );
}
}
the SEND() method would be "extracted" from the real libc.a or
libsocket.a and we'd use ToUpper to convert the symbol to the all-caps
version. so, we'd also have a libsocketreal.a (or something) that
contained all the .o files for all the all-caps versions of each
method. this is exactly the trick we use in the standard universe for
places in the ckpting code where we need to locally call a given
system call, instead of having it go remotely.
in the few cases (like the checkpointing code) where we don't always
want the GCB versions, we wouldn't link those things using
libsocketstubs.a and libsocketreal.a, we'd modify the code to directly
call the Generic_* version in the appropriate places.
pros:
- works with the externals
- means we only have to modify link-lines, not condor source, except
in the few special cases where we do *not* want GCB calls.
- works fine with c++ methods that have the same names, since the
linker can obvious tell the difference between "read()" and
"ReliSock::read()", etc, etc.
cons:
- major headache getting libgcbstubs.a to work on all platforms. for
example, if a given vendor's send.o file in libc.a defines send()
and _send(), we'd need our stub to do the same thing. getting this
right for a given platform is about 1/4 of the total work of making
a full vs. clipped port of condor. :(
- wouldn't work on solaris, where libsocket only comes as a dynamic
library and we couldn't extract .o files from that like this.
2) GNU linker tricks
the GNU linker (ld) supports a "--wrap" option that provides a nice
way to have the linker itself basically handle the hard work of option
#1. basically, if you do this:
"ld -o foo foo.o bar.o baz.o --wrap send -lc -lsocket"
then the linker will replace all unresolved references to "send()"
with calls to "__wrap_send()" instead. the "real" version of send()
will be made available as "__real_send()". so, in our case, the
proposed libsocketstub.a library would be full of "__wrap" methods
that looked something like this:
int
__wrap_send( int s, const void *msg, size_t len, int flags )
{
if( is_gcb() ) {
return gcb_send( s, msg, len, flags );
} else {
return __real_send( s, msg, len, flags );
}
}
in this case, we wouldn't need libsocketreal.a at all (and all the
headache it would require to maintain such a thing), and it wouldn't
matter where on the link line we put libsocketstubs.a, since the
linker itself would convert all references to "send()" to call
"__wrap_send()" instead, and we can be sure that no condor, system, or
external library is going to define that.
again, in the few source files where we specifically do *NOT* want to
use GCB, we could either modify the source to directly call
__real_send() or gcb_send() as appropriate, we could define our own
version of __wrap_send() that only called __real_send(), etc, etc.
you can read the GNU ld man page for details on all of this.
pros:
- works with the externals
- means we only have to modify link-lines, not condor source (except
for the exceptions). ;)
- avoids all the trouble of extracting .o files, ToUpper, maintaining
all these stubs, etc.
- autoconf could determine if we're using the GNU linker on a given
platform and the build system could enable all this automatically
for the link line.
cons:
- only works on platforms where we can use the GNU linker to build
condor. this is unfortunate, but we think it's better to have
maintainable GCB support on a subset of platforms than to have a
nightmare on all platforms. ;) if GCB becomes widly used and it
becomes a problem that we don't support it on a platform where we
can't use GNU ld, we can look into a hybrid approach of #1 and #2
where we mostly use #2, but manually maintain a #1-style solution on
the few platforms where #2 is impossible.
------------------------------------------------------------
------- End of Forwarded Message
------- Forwarded Message
Date: Thu, 11 Mar 2004 12:08:22 CST
To: Derek Wright <wright@xxxxxxxxxxx>
cc: Todd Tannenbaum <tannenba@xxxxxxxxxxx>
From: Se-Chang Son <sschang@xxxxxxxxxxx>
Subject: Re: getting GCB into 6.7.x
Hi Derek,
Since Todd is out of touch at the moment, let me bring this issue to you.
Todd and I decided to use the macro approach to make Condor GCBnized.
The basic idea is as follows:
1) create a header file that looks like:
#define socket Generic_socket
#define bind Generic_bind
...
2) include the header file of 1) in "condor_common.h"
3) Generic_* functions check environment variables and call either
GCB_* or regular socket functions
With this approach, we believed that we can make Condor GCB-aware by
changing relatively small number of places and this will help you merge
my changes to 6.7.0 easily.
However, I believe that it is becoming clear that the decision was
wrong. Some reasons are:
1) Pete says that some platforms implement socket calls as macro. If
this is true, our approach may not work or extremely difficult to make
it work.
2) We have many classes whose member functions have the same name as
socket calls. Few examples are cedar, c++_util/memory_file,
ckpt/CondorFile (and all inherited classes), and ckpt/CondorFileTable.
To handle this, I have to rename those methods. Unfortunately, some of
these classes are used by the stub generator.
3) Some parts of Condor use 'ifstream' and 'ofstream' classes that
happen to have the member functions with the same name I am trying to
redefine.
4) Some parts of Condor must not use GCB. For example, checkpoint
routine must not use GCB functions after calculating the image size.
Considering these cases and maybe others I have not found yet, I believe
that this macro approach is no simpler for me, you, and team members
than changing each socket call into Generic call.
Do you think I still wait for Todd's decision?
How hard will it be to merge my changes to 6.7 if I take the alternative
way that replaces socket calls with Generic calls case by case?
Thank you.
------- End of Forwarded Message