Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] Issues with transferring files from URLs
- Date: Mon, 3 Nov 2014 09:47:03 -0600
- From: Zachary Miller <zmiller@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] Issues with transferring files from URLs
On Mon, Nov 03, 2014 at 02:51:13PM +0000, Brian Candler wrote:
> The documentation says:
>
> "For vanilla and vm universe jobs only, a file may be specified by
> giving a URL, instead of a file name. The implementation for URL
> transfers requires both configuration and available plug-in."
>
> but these are indeed present (/etc/condor/condor_config has
> FILETRANSFER_PLUGINS which includes /usr/lib/condor/libexec/curl_plugin)
>
> WORKAROUND: I was able to make it work by setting "should_transfer_files
> = yes".
>
> However, is this right? Surely a URL should always be fetched,
> regardless of whether or not you are in the same filesystem domain,
> since URLs don't appear in the filesystem anyway?
What you did is correct. I agree this is confusing and needs better
documentation. The slightly more technical answer is that without
"should_transfer_files" that nothing is fetched (i.e. assumed to be
accessable via shared filesystem) and that includes the URLs. I think
a good argument can be made that such behavior violates the principle
of least surprise.
> (2) Given a unimplemented URL scheme (like "https"), I found a
> difference between my test personal condor node and my production condor
> node. The former would leave the job idle because of a classAd matching
> condition which was never true:
>
> 1 ( TARGET.HasFileTransfer &&
> stringListMember("https",HasFileTransferPluginMethods) )
>
> but the latter puts the job into a "held" (H) state, saying
>
> Hold reason: Error from slot1@xxxxxxxxxxxxxxxx: STARTER at 192.168.6.42
> failed to receive file /var/lib/condor/execute/dir_24716/xxxx.xxxx:
> FILETRANSFER:1:FILETRANSFER: plugin for type https not found!
>
> (Aside: if a plugin for https is not present, wouldn't it be better to
> abort the job rather than put it into a 'held' state indefinitely, as
> this isn't a condition which is likely to fix itself?)
Possbily. We generally like jobs to go on hold when there is a problem with
the input so that it's quite obvious to the user and they have a chance to do
something about it. And in this case, the job should never have matched in the
first place.
> Anyway, I managed to drill down to find the difference, and it turns out
> to be different behaviour depending on whether you set
>
> should_transfer_files = if_needed
>
> or
>
> should_transfer_files = yes
>
> I cannot find this behaviour documented anywhere. Looking at
>
> http://research.cs.wisc.edu/htcondor/manual/current/2_5Submitting_Job.html#SECTION00354000000000000000
>
> it says the default value is "should_transfer_files = if_needed" and
> this will enable the file transfer mechanism if the machines are in
> different filesystem domains. This implies to me that if the machines
> are in different filesystem domains this should behave the same as
> "should_transfer_files = yes", but actually the generated requirements
> expressions are different in these two cases.
Again, I agree with you here... URLs should receive special treatment when
considering whether file transfer is needed.
Thanks for your report and thoughtful analysis. I created a defect ticket
for these issues: https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=4692
While we may fix these issues in the current development series, the work may
be postponed because in the longer-term, I plan to redo the transfer plugin
architecture so that the plugins are user-supplied and not admin-supplied,
along with a number of other changes (e.g. status/keepalives and batching).
Hopefully your workarounds will suffice for now, and at the very least we
need to document the current behavior more clearly. Thanks again for your
report.
Cheers,
-zach