Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[HTCondor-users] Help with condor_wait (stuck in Windows using HTCondor 8.4.5 and 8.4.6)
- Date: Tue, 17 May 2016 18:05:52 +0200
- From: Martà Coma Company <mcoma@xxxxxxxxxxxxx>
- Subject: [HTCondor-users] Help with condor_wait (stuck in Windows using HTCondor 8.4.5 and 8.4.6)
Hello,
We are new to HTCondor, have been configuring a little pool since the
last course in Barcelona. The tests we have conducted submitting jobs
manually have been successful, now we are trying to interact with
HTCondor from an in-house optimization code written in C++. We have
encountered problems running in Windows 7 (so far, in Linux is working
as expected).
The problem is that after the "condor_submit submit.condor" system call
from our code, we call "condor_wait log.condor" and SOMETIMES it gets
stuck in the condor_wait (we call the submit and wait commands in a
loop). The log.condor shows that all jobs are terminated, condor_q
returns no jobs at all and the results of all calculations are there and
correct. If we kill the condor_wait from the task manager the process
continues without problems, until it gets stuck again in another loop
iteration. We have been waiting for several hours for condor_wait to
return.
We use initialdirs and relative paths in the submit file, so all jobs
are logged in the same file:
################################
# Condor submit file #
Universe = vanilla
Executable = mathcasesexe_windows.exe
Log = ../log.condor
Output = out.condor
Error = err.condor
initialdir = condorInd$(Process)
should_transfer_files = YES
transfer_input_files = Eval.DVs, ../prob.dat
when_to_transfer_output = ON_EXIT
transfer_output_files = Eval.individual, Cons.individual
Queue 24
################################
We have tested both HTCondor 8.4.5 and 8.4.6 in the submission node with
the same issues. We have also tried to delete the log.condor file
between loop iterations, but the problem remains.
We found the same problem for a Linux user in version 7.2.5 and solved
in version 7.4.3 in 2010
(https://lists.cs.wisc.edu/archive/htcondor-users/2010-June/msg00221.shtml).
Could be a bug in condor_wait for Windows?
How can we solve this problem? Any help would be much appreciated.
Thanks in advance,
Martà Coma
CIMNE Aerospace Engineering Group