Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Parallel job with flocking, condor_tail does not work, upload/download to/from a running job, slots in Claimed-Idle state, ...

Date: Fri, 15 Mar 2019 11:20:06 -0500 (CDT)
From: Todd L Miller <tlmiller@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] Parallel job with flocking, condor_tail does not work, upload/download to/from a running job, slots in Claimed-Idle state, ...

I also had to write a universal (Windows & Linux) wrapperscript cause (as far as I understood) it is impossible to use differentexecutables (directive executable) in as single parallel job.

If I recall correctly, the canonical approach here is to dosomething like the following:


executable = my_executable_for_$$(ARCH)

so that the executable has a different name on Windows than it does onLinux. I haven't tried this, though. :)

Well, here comes a tricky part. I need to submit a job, with dozens ofprocesses like in item 1 above, but one of these processes must be runon this special node like in item 2. I tried to tell this special nodethat it is also a âdedicatedâ one, but this does not seem to work. So Iam stuck here. I suppose my question is the following. Is it possible tosubmit a parallel job in a way that one of these parallel processesflocks to a different pool.

Not as far as I know. You may, in this case, want to considerstartd flocking instead -- have the special node report to each of thepools which need to be able to run jobs on it. (That is, add theircollectors to the COLLECTOR_HOST list.) This will probably result in thespecial node being matched simultaneously in multiple pools, which canhave confusing results. (It should work -- the first schedd to contactthe start will 'win' -- but may lead starvation, if one of the schedds isconsistently faster/slower than the others.) However, since the specialnode will be in the pools, it will probably be accessible to paralleluniverse jobs.

To solve previous item I tried the condor_tail and it does not seem towork at all. It simply hangs until job finishes, then it exits reportingthat there is no such job. No output is provided. I could not make itwork and I do not know how to debug. Any ideas?

Try it with a vanilla universe job first? I don't know ifcondor_tail is expected to work with parallel universe jobs.


- ToddM

Follow-Ups:
- Re: [HTCondor-users] Parallel job with flocking, condor_tail does not work, upload/download to/from a running job, slots in Claimed-Idle state, ...
  - From: Alexander Prokhorov

References:
- [HTCondor-users] Parallel job with flocking, condor_tail does not work, upload/download to/from a running job, slots in Claimed-Idle state, ...
  - From: Alexander Prokhorov

Prev by Date: Re: [HTCondor-users] Parallel job with flocking, condor_tail does not work, upload/download to/from a running job, slots in Claimed-Idle state, ...
Next by Date: [HTCondor-users] condor_status shows nothing
Previous by thread: [HTCondor-users] Parallel job with flocking, condor_tail does not work, upload/download to/from a running job, slots in Claimed-Idle state, ...
Next by thread: Re: [HTCondor-users] Parallel job with flocking, condor_tail does not work, upload/download to/from a running job, slots in Claimed-Idle state, ...
Index(es):
- Date
- Thread

Mailing List Archives

Authenticated access

Re: [HTCondor-users] Parallel job with flocking, condor_tail does not work, upload/download to/from a running job, slots in Claimed-Idle state, ...