>> > a) Add in appropriate requirements for file transfer, disk size,
>> > memory,
>> > platform (had some trouble figuring out exactly what needs to be
>> > checked/set but I think I have the appropriate ones now)
>>
>> Yup.
>>
>> > b) Set NTDomain attribute
>>
>> If on Windows, yes.
Yes, FYI, I'm currently using a linux server interfacing via soap
with a windows scheduler, which parcels out the jobs to windows
startds. I've also used a linux scheduler, which is how I
discovered the file transfer requirement. :) There seem to be some
others though, such as Disk and Disk_RAW which may need to be set,
and I was hoping that I wasn't missing anything important.
>>
>> Yes, but I don't know what you mean by "safe" or why you would need
>> to change the names.
It seems that condor_submit changes the names of the output and
error parameters from what they are in the submit file to _std_out
and _std_error (or somethings similar) so that they can safely be
created on the execute node. Then, when transfering the files back
from the execute nodes, the scheduler, or condor_transfer_data,
creates them locally as the names specified in the original submit
file. At the moment, I just set them to something apropriate to
our system, so there's no need to rename them when they're retrieved.
>>
>> Yes, most types should be easy to determine. The only trick being
>> things like Rank/Requirements/Periodic*/On*/etc which look like
>> strings but should be expressions.
Well, I've had a bit of difficulty at it, but it may just have been
how I went about it. I'll put together a more comprehensive
description of my algorithms when I get a chance.
>> I am not an expert on ClassAds, but I don't think you want to
>> actually replace the reference with a value, instead let ClassAd
>> evaluation do this for you later.
From looking at the source code, I thought that a lot of the
replacement was done in the condor_submit binary. It seems as if
it is Cluster and Process are definitely replaced, as well as any
custom attributes. Are these the only ones or are there others
that are also replaced?
>> I believe condor_submit just knows the type based on the name of
the
>> attribute. For instance, maybe Rank and Requirements are always the
>> only expressions. I don't think there are any specific guidelines.
Thanks. I'll take another look in the code and do some more testing.
>> You should be able to hold multiple jobs in a single
transaction, and
>> right now it seems that is what you have to do. We could add a
hold/
>> release that takes a constraint instead of cluster+proc ids, if
that
>> would be helpful, and I imagine it would be.
>>
>> As for the error message. That cannot easily be changed. The actual
>> reason for the failure is only logged, and is not immediately
>> accessible to the birdbath code. Bad, I know.
That would be helpful, but in the meantime, I don't have a problem
continuing to hold jobs one at a time in a cluster. I'll see what
the logs say, and get back to you.