Re: [HTCondor-devel] condor_startd protocol...


Date: Thu, 16 Jun 2016 12:03:48 -0500
From: Brian Bockelman <bbockelm@xxxxxxxxxxx>
Subject: Re: [HTCondor-devel] condor_startd protocol...
Hi Nick,

Are you trying to run the condor_startd on Android, but implement your own version of the schedd?

If so, you possibly could use the Claim-on-demand features (which are now in python).  Snippets from the unit tests:

        coll = htcondor.Collector()
        ads = coll.locateAll(htcondor.DaemonTypes.Startd)

        job_common = { \
            'Cmd': '/bin/sh',
            'JobUniverse': 5,
            'Iwd': os.path.abspath(testdir),
            'Out': 'testclaim.out',
            'Err': 'testclaim.err',
            'StarterUserLog': 'testclaim.log',
        }

        claim = htcondor.Claim(ads[0])
        claim.requestCOD()
        hello_world_job = dict(job_common)
        hello_world_job['Arguments'] = "-c 'echo hello world > %s'" % output_file
        claim.activate(hello_world_job)

Given the appropriate COD configuration, that should work for a python process talking directly to a startd.

In terms of reverse-engineering and documenting the protocol with WireShark: would likely be a massive project but hugely beneficial to the community!

Brian

> On Jun 16, 2016, at 11:56 AM, John M Knoeller <johnkn@xxxxxxxxxxx> wrote:
> 
> There is no formal specification for the HTCondor communication protocols.  We (or you)  would have to reverse engineer the protocol by reading the C++ code.   
>  
> The protocols are complex, in part because of the security negotiation that happens at the start of most (all?) of them.  You could possibly implement a subset of this if you donât need for your Android nodes to join any arbitrary HTCondor pool.
>  
> I should note that the protocol for claiming and running jobs on a STARTD is particularly complex. However the code is open source and we can point you at the relevant C++ modules if you still want to pursue it.
> Usually both sides of the conversation know the HTCondor version of the other side, and sometimes that protocol changes based on that knowledge.  You will see that when you look at the code. 
>  
> As for ClassAds, I have no doubt that it would be useful for you to have an implementation, but it may not be necessary.  A lot of that data that HTCondor passes back and forth is ClassAds, but the communication protocols themselves are binary and usually of variable size. Much of the actual data passed is text.  ClassAds are (currently) passed as text, for instance, but proceeded by a binary header.
>  
> -tj
>  
> From: HTCondor-devel [mailto:htcondor-devel-bounces@xxxxxxxxxxx] On Behalf Of Nick Ton
> Sent: Thursday, June 16, 2016 8:33 AM
> To: htcondor-devel@xxxxxxxxxxx; Todd Tannenbaum <tannenba@xxxxxxxxxxx>
> Subject: Re: [HTCondor-devel] condor_startd protocol...
>  
>  
> Hi Todd,
>  
> Thank you for your prompt response. I read through the links you provided both from the HTCondor side and the BOINC side. So to use the HTCondor/BOINC approach, it seems that I would have stand up a BOINC Server and a Central Manager in order to submit jobs to an Android device. While this isn't that much trouble the focus of my research is on the protocol. I had thought possibly HTCondor SOAP, GRAM or GAPH would allow an Android device to accept jobs but these protocols are really for resource monitoring and job submission not job acceptance.
>  
> So I noticed that on the HTCondor website there is a Java implementation of the ClassAd specification. I was going to use it to define the resource and universe to a Linux-based Central Manager. But I'm not sure how to send it via TCP. There doesn't seem to be a message block specification for the HTCondor socket communication. I guess, I could use Wireshark to back out the protocol definition but I was hoping that other people have done research in this area. Is there a message level definition for HTCondor?
>  
> Many Thanks,
> Nick
>  
>  
>  
> On Wed, Jun 15, 2016 at 4:55 PM, Todd Tannenbaum <tannenba@xxxxxxxxxxx> wrote:
> On 6/15/2016 2:36 PM, Nick Ton wrote:
> 
> Hello,
> 
> I like to first describe the problem I'm trying to tackle and perhaps
> someone may have an elegant solution. I like to make a bunch of android
> devices be resource computing machines that accept very simple jobs
> (i.e. send back their location). I know that it may sound like this type
> of problem can be better solved without HTCondor but I will expand the
> jobs to more be interesting/complex once I have worked out the protocol.
> 
> 
> 
> I know I am not directly answering your question, but you may be interested to know that BOINC, a system for volunteer based computing, runs on Android ( see https://is.gd/2LTNA4 ), and HTCondor can delegate jobs to many other scheduling systems including BOINC via its grid universe. Some additional info:
> 
> 
> http://research.cs.wisc.edu/htcondor/manual/v8.4/5_3Grid_Universe.html#53039
> 
>   https://boinc.berkeley.edu/trac/wiki/CondorBoinc
> 
> regards
> Todd
>  
> _______________________________________________
> HTCondor-devel mailing list
> HTCondor-devel@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-devel


[← Prev in Thread] Current Thread [Next in Thread→]