[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-devel] condor and a threaded library
- Date: Tue, 29 Jan 2008 10:54:16 -0600
- From: Todd Tannenbaum <tannenba@xxxxxxxxxxx>
- Subject: Re: [Condor-devel] condor and a threaded library
There's always the worry that we're calling some non-threadsafe C library
function that the library has properly protected but Condor hasn't. That
should be less and less of a worry as time goes on.
You mean that the threaded library code calls a non-reentrant C lib
function, but protects itself in some sane way. However, Condor calls
the same function, except without the library's protection?
Correct. Typical culprits include functions to get host and user
information. Also, even if the library uses the "*_r()" thread safe
versions of function X, it is not clear if the entire process needs to
use the *_r functions, or if it is ok for condor to use the
thread-unsafe functions combined with _r functions in the library.
Besides C library functions, ditto of the above for things like OpenSSL.
Signal handling could become a pain as well. Condor daemons use signals
to communicate with each other --- combining signals and threads in one
process may require work. Useful post-mortem debugging could be
painful. (how well will the google core dump library continue to work,
if at all? which thread context gets dumped?)
It's also usually the case that in order to do anything useful, somewhere in
that library you have to call back into the Condor code, and then all bets are
off. Non-reenterant Condor utility functions are all over the place, and
there are many places that hold state between calls to the same function.
That's not a concern because the library will not call back to Condor.
If it will not call back into Condor, why link it into the Condor
daemons? Just to send outbound notifications? Depending upon what sort
of notifications and their frequency, perhaps a
better/simpler/safer/more modular design could be found....
Also, the library would not call exit(), which Condor redefines. Are
there other functions that Condor redefines I'm forgetting? open/close
are #define'd, so they are not a concern.
Sure, lots of them. See the util lib.
We've been burned time and time again by threads on Windows, even though
we were convinced that they don't interact with Condor code, and lo, it turns
out we still crash because of race condidtions. It just doesn't seem worth it.
This is because of call-backs or Condor code itself being threaded, right?
That and the items listed above and others not enumerated. Plus we
don't necessarily want threading in Condor, except for very specific
tasks such as overlap of CPU and I/O, where what is going on in the
worker thread is small/simple to the point of nearly a single system
call. We may be forced to also do OpenSSL functions in a thread pool at
some point as well, but that doesn't make us happy...
See http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-1.pdf,
http://www.softpanorama.org/People/Ousterhout/Threads/ to understand
some of our thinking. What we have thought about down the road for
DaemonCore is a more hybrid thread/event model, something similar to
http://research.microsoft.com/Farsite/USENIX2002.ps but simpler.
What are you thinking about using?
It's a client messaging library, architected to manages its own threads,
which would never call back to Condor.
And it would be doing what?
thanks,
Todd
--
Todd Tannenbaum University of Wisconsin-Madison
Condor Project Research Department of Computer Sciences
tannenba@xxxxxxxxxxx 1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132 Madison, WI 53706-1685