Subject: Re: [Condor-users] condor_dagman.exe in idle after submit jobs completed
The disk is not full. I am writing all
files to an NTFS SAN (log, err, out, and files created by each job). The
IO was a problem and I used maxjobs to throttle the number of concurrently
running jobs.
There is no problem with the permissions
for both condor and the jobs. The jobs also run via RunAsOwner.
There is no information (empty) in the
dprintf_failure.DAGMAN
file.
mike
From:
"R. Kent Wenger" <wenger@xxxxxxxxxxx>
To:
Condor-Users Mail List <condor-users@xxxxxxxxxxx>
Date:
06/15/2011 10:39 AM
Subject:
Re: [Condor-users] condor_dagman.exe
in idle after submit jobs completed
Sent by:
condor-users-bounces@xxxxxxxxxxx
On Wed, 15 Jun 2011, Michael O'Donnell wrote:
> Thank you for your comments. It looks like I am going to have to spend
a
> lot more time investigating this because it is not evident what has
> happened. Most of the jobs did complete, but something happened to
the
> communication between the jobs and the condor_dagman.exe. I do not
know
> the communication process yet, but I did not see any errors in the
dagman
> log or anything. Basically the dagman went into recovery mode and
could
> never exit this recovery loop. When it went into recovery mode
it
> generated this file: dprintf_failure.DAGMAN. If I delete the file
it would
> generate it again on the next recovery attempt.
>
> When I released the condor_dagman job, a recovery file was not generated.
> I then tried to rerun the dag and the following happened:
> dprintf_failure.DAGMAN was generated again
> condor_dagman job went into idle
> no dag jobs were submitted
> condor_dagman.exe would not exit without forcing it
Hmm, something else to check: is your disk full? And are file
permissions set to reasonable values? (DAGMan monitors the node jobs
by
reading their user log files.)
Also, what are the contents of the dprintf_failure.DAGMAN file?
Kent Wenger
Condor Team
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with
a
subject: Unsubscribe
You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/condor-users