HTCondor Project List Archives



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-devel] follow up SOAP questions



(Input from other people wanted below...)

On Mar 5, 2007, at 10:11 AM, Rob.Futrick@xxxxxxxxxxxxxxxx wrote:

[snip]

>> It looks like this is a bug in birdbath. The logic in a test is
>> reversed. It sends an error on success and success on error. This
>> will be fixed immediately.
Thanks. I've updated my code to catch, log, and otherwise ignore the error and to generate an error upon success. I'll update it again once the bug is fixed. In the meantime, I could also fix it locally if you let me know where it is in the code.

soap_scheddStub.C:707 should be !abortJobsByConstraint...


>> > a) Add in appropriate requirements for file transfer, disk size,
>> > memory,
>> > platform (had some trouble figuring out exactly what needs to be
>> > checked/set but I think I have the appropriate ones now)
>>
>> Yup.
>>
>> > b) Set NTDomain attribute
>>
>> If on Windows, yes.
Yes, FYI, I'm currently using a linux server interfacing via soap with a windows scheduler, which parcels out the jobs to windows startds. I've also used a linux scheduler, which is how I discovered the file transfer requirement. :) There seem to be some others though, such as Disk and Disk_RAW which may need to be set, and I was hoping that I wasn't missing anything important.

>>
>> Yes, but I don't know what you mean by "safe" or why you would need
>> to change the names.
It seems that condor_submit changes the names of the output and error parameters from what they are in the submit file to _std_out and _std_error (or somethings similar) so that they can safely be created on the execute node. Then, when transfering the files back from the execute nodes, the scheduler, or condor_transfer_data, creates them locally as the names specified in the original submit file. At the moment, I just set them to something apropriate to our system, so there's no need to rename them when they're retrieved.

>>
>> Yes, most types should be easy to determine. The only trick being
>> things like Rank/Requirements/Periodic*/On*/etc which look like
>> strings but should be expressions.
Well, I've had a bit of difficulty at it, but it may just have been how I went about it. I'll put together a more comprehensive description of my algorithms when I get a chance.

>> I am not an expert on ClassAds, but I don't think you want to
>> actually replace the reference with a value, instead let ClassAd
>> evaluation do this for you later.
From looking at the source code, I thought that a lot of the replacement was done in the condor_submit binary. It seems as if it is Cluster and Process are definitely replaced, as well as any custom attributes. Are these the only ones or are there others that are also replaced?

>> I believe condor_submit just knows the type based on the name of the
>> attribute. For instance, maybe Rank and Requirements are always the
>> only expressions. I don't think there are any specific guidelines.
Thanks.  I'll take another look in the code and do some more testing.

Can anyone else add some input about this?


>> You should be able to hold multiple jobs in a single transaction, and >> right now it seems that is what you have to do. We could add a hold/ >> release that takes a constraint instead of cluster+proc ids, if that
>> would be helpful, and I imagine it would be.
>>
>> As for the error message. That cannot easily be changed. The actual
>> reason for the failure is only logged, and is not immediately
>> accessible to the birdbath code. Bad, I know.
That would be helpful, but in the meantime, I don't have a problem continuing to hold jobs one at a time in a cluster. I'll see what the logs say, and get back to you.

I'll look into adding this functionality.

[snip]


Thank you very much for you help. I really appreciated it. As always, if I can do anything to help or if you need any more information, just let me know.

My pleasure.



matt