Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] Parallel job with flocking, condor_tail does not work, upload/download to/from a running job, slots in Claimed-Idle state, ...
- Date: Fri, 15 Mar 2019 11:20:06 -0500 (CDT)
- From: Todd L Miller <tlmiller@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] Parallel job with flocking, condor_tail does not work, upload/download to/from a running job, slots in Claimed-Idle state, ...
I also had to write a universal (Windows & Linux) wrapper
script cause (as far as I understood) it is impossible to use different
executables (directive executable) in as single parallel job.
If I recall correctly, the canonical approach here is to do
something like the following:
executable = my_executable_for_$$(ARCH)
so that the executable has a different name on Windows than it does on
Linux. I haven't tried this, though. :)
Well, here comes a tricky part. I need to submit a job, with dozens of
processes like in item 1 above, but one of these processes must be run
on this special node like in item 2. I tried to tell this special node
that it is also a âdedicatedâ one, but this does not seem to work. So I
am stuck here. I suppose my question is the following. Is it possible to
submit a parallel job in a way that one of these parallel processes
flocks to a different pool.
Not as far as I know. You may, in this case, want to consider
startd flocking instead -- have the special node report to each of the
pools which need to be able to run jobs on it. (That is, add their
collectors to the COLLECTOR_HOST list.) This will probably result in the
special node being matched simultaneously in multiple pools, which can
have confusing results. (It should work -- the first schedd to contact
the start will 'win' -- but may lead starvation, if one of the schedds is
consistently faster/slower than the others.) However, since the special
node will be in the pools, it will probably be accessible to parallel
universe jobs.
To solve previous item I tried the condor_tail and it does not seem to
work at all. It simply hangs until job finishes, then it exits reporting
that there is no such job. No output is provided. I could not make it
work and I do not know how to debug. Any ideas?
Try it with a vanilla universe job first? I don't know if
condor_tail is expected to work with parallel universe jobs.
- ToddM
- Prev by Date:
Re: [HTCondor-users] Parallel job with flocking, condor_tail does not work, upload/download to/from a running job, slots in Claimed-Idle state, ...
- Next by Date:
[HTCondor-users] condor_status shows nothing
- Previous by thread:
[HTCondor-users] Parallel job with flocking, condor_tail does not work, upload/download to/from a running job, slots in Claimed-Idle state, ...
- Next by thread:
Re: [HTCondor-users] Parallel job with flocking, condor_tail does not work, upload/download to/from a running job, slots in Claimed-Idle state, ...
- Index(es):