[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Submitting to one of several independent pools



Hello Collin,

 

You are right, the remote machine needs access to the .dag file (I was expecting condor to copy it along with the other files that appear in spool), and furthermore I have needed to use absolute path to the dag file when submitting with condor_submit_dag so the .dag.condor.sub file is generated with this full path to the network share.

 

So issue solved and thanks a lot again Michael and Collin for your help.

Ãscar

 

 

 

De: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> En nombre de Collin Mehring
Enviado el: martes, 2 de octubre de 2018 20:51
Para: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Asunto: Re: [HTCondor-users] Submitting to one of several independent pools

 

Hi Oscar,

 

The condor_submit_dag -r option assumes that all of the necessary files will be visible on the remote machine.

 

Here's an excerpt from the manual page for condor_submit_dag on the '-r' flag that suggests something to try:

 

<...> Note that this option does not currently specify input files for condor_dagman, nor the individual nodes to be taken along! It is assumed that any necessary files will be present on the remote computer, possibly via a shared file system between the local computer and the remote computer. <...> If other options are desired, including transfer of other input files, consider using the -no_submit option, modifying the resulting submit file for specific needs, and then using condor_submit on that.

 

We use a shared file system so I haven't run into this personally, but hopefully that helps.


Good luck,

Collin

On Tue, Oct 2, 2018 at 3:47 AM Laborda Sanchez, Oscar (Volkswagen Group Services) <extern.Oscar.Laborda@xxxxxxx> wrote:

Michael, thank you for your reply.

 

From your message pointing me to the â-nameâ option, I have been trying both the -name and -remote options in condor_submit and they are working just fine. Unfortunately I use DAG jobs and I cannot get them to correctly run with option "-r" (AFAIK, it is equivalent to "condor_submit -remote", but there is no equivalent to "condor_submit -name" for DAG, right?)

 

I am submitting a simple a.dag file, but the DAG job just gets stuck never running and I find the following in the a.dag.dagman.out file in the spool dir:

 

10/02/18 10:08:34 (fd:4) (pid:32476) (D_ALWAYS) DAGMAN_LOG_ON_NFS_IS_ERROR setting: False

10/02/18 10:08:34 (fd:4) (pid:32476) (D_ALWAYS) Default node log file is: <C:\condor\spool\88\0\cluster88.proc0.subproc0\.\a.dag.nodes.log>

10/02/18 10:08:34 (fd:4) (pid:32476) (D_ALWAYS) DAG Lockfile will be written to a.dag.lock

10/02/18 10:08:34 (fd:4) (pid:32476) (D_ALWAYS) DAG Input file is a.dag

10/02/18 10:08:34 (fd:4) (pid:32476) (D_ALWAYS) Parsing 1 dagfiles

10/02/18 10:08:34 (fd:4) (pid:32476) (D_ALWAYS) Parsing a.dag ...

10/02/18 10:08:34 (fd:4) (pid:32476) (D_ALWAYS) ERROR: Could not open file a.dag for input (cwd) (errno 2, No such file or directory)

10/02/18 10:08:34 (fd:4) (pid:32476) (D_ALWAYS) Removing any/all submitted HTCondor jobs...

10/02/18 10:08:34 (fd:4) (pid:32476) (D_ALWAYS) Running: C:\condor\bin\condor_rm.exe -const DAGManJobId' '=?=' '88

10/02/18 10:08:35 (fd:4) (pid:32476) (D_ALWAYS) Warning: failure: C:\condor\bin\condor_rm.exe -const DAGManJobId' '=?=' '88

10/02/18 10:08:35 (fd:4) (pid:32476) (D_ALWAYS)  (my_pclose() returned 1 (errno 2, No such file or directory))

10/02/18 10:08:35 (fd:4) (pid:32476) (D_ALWAYS) ERROR: Warning is fatal error because of DAGMAN_USE_STRICT setting

10/02/18 10:08:35 (fd:4) (pid:32476) (D_ALWAYS) Aborting DAG...

10/02/18 10:08:35 (fd:4) (pid:32476) (D_ALWAYS) Writing Rescue DAG to a.dag.rescue002...

 

The a.dag file certainly has not been copied into that directory.

In the a.dag.dagman.log I am also getting this:

 

        (0) Abnormal termination (signal -1073741819)

 

Any idea on how to fix this?

 

Thanks

Oscar

 

 

-----Mensaje original-----
De: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> En nombre de Michael Pelletier
Enviado el: martes, 25 de septiembre de 2018 16:43
Para: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Asunto: Re: [HTCondor-users] Submitting to one of several independent pools

 

Oscar,

 

The "-name" option to condor_submit is what you're looking for:

 

       -name schedd_name

 

          Submit to the specified condor_schedd . Use this option to submit to

          a condor_schedd other than the default local  one.   schedd_name  is

          the  value  of  the  Name ClassAd attribute on the machine where the

          condor_schedd daemon runs.

 

You would set up the workstation with a default scheduler, probably the production one, and then to submit for test you'd add the "-name" option to the submission to specify the hostname of the test pool's schedd.

 

If you want to avoid the need for the command line option while testing, so you don't have to change options going from test to production, you can set the _CONDOR_SCHEDD_NAME environment variable to override what's in the workstation's configuration file setting for the default scheduler.

 

Michael V. Pelletier

Information Technology

Digital Transformation & Innovation

Integrated Defense Systems

Raytheon Company

 

_______________________________________________

HTCondor-users mailing list

To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a

subject: Unsubscribe

You can also unsubscribe by visiting

 

The archives can be found at:

 

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/


 

--

Collin Mehring | PE-JoSE - Software Engineer