On Thu, 3 Nov 2011, Christopher Martin wrote:It's fine to have any combination of jobs logging to their own log files vs. jobs logging to a common log file. It's important, though, that jobs in separate DAGs not share log files (unless you're 100% sure the DAGs won't be run at the same time).
So from what I can see it's like you say, it's as if the dagman can't tell
that the jobs have completed successfully. The job logs do indicate
completion though. I'm wondering, do the jobs all have to log to the same
log file? Currently I have each job logging to it's own log file. All logs
for both the jobs and the dagman are logging to the same directory.
I've included snippets from a dagman.out that shows the state of things
before and after the schedd restart.
Can you send the following files?:
* dagman.out
* the actual dag file
* the node job log files
If you do that, I'll take a look in more detail and see what I can figure out.
>From your original email, it sounds like this problem happens consistentlywhen your schedd restarts -- is that right? If so, that eliminates the things that would be my first guesses as to the problem (e.g., some kind of transient log file reading error).
Kent Wenger
Condor Team
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxedu with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/
Attachment:
dagmanlogs.tar.gz
Description: GNU Zip compressed data