[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor SOAP bug: stopping server when there are pending transactions hangs daemons



Matthew Farrellee wrote:
On May 1, 2006, at 5:42 PM, David E. Konerding wrote:

Hi,

I am noticing a very inconvenient bug with Condor SOAP:

If a transaction is begun, and has not yet expired, stopping the condor
master causes all the daemons to go to a zombie
state and hang around.
This is probably the same problem as the condor_q issue below. All  
Condor daemons are single threaded, so if there is a SOAP transaction  
active no one can talk to the Schedd. I'm guessing that the Master  
just gives up trying to tell it's children to shutdown at some point  
and exits. If the children are shutdown serially then a "hanging"  
Schedd at the beginning of the child list would account for this.

I'm confused by this answer. There is nothing in a single threaded application which prevents a server from maintaining more than one simultaneous transaction (database servers do this all the time). Nor is there anything that prevents a server from listening to a port and responding to multiple requests (nearly) simultaneously. So does this mean that the Condor source base itself has the limitation of one transaction at a time?
What's happening when condor_q is being run at multiple times, or being 
run when something is being submitted:
is condor_submit using transactions internally, and condor_q blocks 
while submits are in progress?
Finally, what does this mean for me writing a web service job submitted 
and monitor where multiple submitters and monitors will be
accessing the same Condor SOAP server?  From my perspective, it means 
all my client codes have to be aware of the single-transaction limit and
has to retry operations, and be aggressive about asking for long 
transaction times (because I'm doing file transfers and there could be 
network timeouts,
I don't want to lose an entire job submission and file transfer 
transaction just because there was a network dropout), yet
be careful to close down those transactions.  If a single client crashes 
with a long transaction outstanding, it'll host all the other clients.

Dave