HTCondor Project List Archives



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-devel] condor and a threaded library





There's always the worry that we're calling some non-threadsafe C library
function that the library has properly protected but Condor hasn't. That
should be less and less of a worry as time goes on.

You mean that the threaded library code calls a non-reentrant C lib function, but protects itself in some sane way. However, Condor calls the same function, except without the library's protection?


Correct. Typical culprits include functions to get host and user information. Also, even if the library uses the "*_r()" thread safe versions of function X, it is not clear if the entire process needs to use the *_r functions, or if it is ok for condor to use the thread-unsafe functions combined with _r functions in the library.

Besides C library functions, ditto of the above for things like OpenSSL.

Signal handling could become a pain as well. Condor daemons use signals to communicate with each other --- combining signals and threads in one process may require work. Useful post-mortem debugging could be painful. (how well will the google core dump library continue to work, if at all? which thread context gets dumped?)


It's also usually the case that in order to do anything useful, somewhere in
that library you have to call back into the Condor code, and then all bets are
off. Non-reenterant Condor utility functions are all over the place, and
there are many places that hold state between calls to the same function.

That's not a concern because the library will not call back to Condor.

If it will not call back into Condor, why link it into the Condor daemons? Just to send outbound notifications? Depending upon what sort of notifications and their frequency, perhaps a better/simpler/safer/more modular design could be found....

Also, the library would not call exit(), which Condor redefines. Are there other functions that Condor redefines I'm forgetting? open/close are #define'd, so they are not a concern.


Sure, lots of them.    See the util lib.


We've been burned time and time again by threads on Windows, even though we were convinced that they don't interact with Condor code, and lo, it turns
out we still crash because of race condidtions. It just doesn't seem worth it.

This is because of call-backs or Condor code itself being threaded, right?


That and the items listed above and others not enumerated. Plus we don't necessarily want threading in Condor, except for very specific tasks such as overlap of CPU and I/O, where what is going on in the worker thread is small/simple to the point of nearly a single system call. We may be forced to also do OpenSSL functions in a thread pool at some point as well, but that doesn't make us happy...

See http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-1.pdf, http://www.softpanorama.org/People/Ousterhout/Threads/ to understand some of our thinking. What we have thought about down the road for DaemonCore is a more hybrid thread/event model, something similar to http://research.microsoft.com/Farsite/USENIX2002.ps but simpler.


What are you thinking about using?

It's a client messaging library, architected to manages its own threads, which would never call back to Condor.



And it would be doing what?

thanks,
Todd


--
Todd Tannenbaum                       University of Wisconsin-Madison
Condor Project Research               Department of Computer Sciences
tannenba@xxxxxxxxxxx                  1210 W. Dayton St. Rm #4257
Phone: (608) 263-7132                 Madison, WI 53706-1685