Hi All,
Does Someone knows maybe how to over come this?
I have a simple DAG job file looks like this:
JOB A A.job
JOB B B.job
PARENT A CHILD B
Job A and Job B can run on the Windows Condor Cluster without any problem.
Here is how A.Job looks like:
universe = vanilla
transfer_files=always
requirements =
executable = U:\runA.bat
Arguments =
output =A.out
log = A.log
error = A.err
notification = Error
initialdir = U:
run_as_owner = True
load_profile = True
queue 4
Now when runing the DAG job using condor_submit_dag.exe DAG.job I get the following error:
7/20/10 10:20:25 WARNING: ProcessId not confirmed unique
07/20/10 10:20:25 Bootstrapping...
07/20/10 10:20:25 Number of pre-completed nodes: 0
07/20/10 10:20:25 Registering condor_event_timer...
07/20/10 10:20:26 Sleeping for one second for log file consistency
07/20/10 10:20:27 DAGMan::Job:8001:ERROR: Unable to monitor log file for node A|ReadMultipleUserLogs:9004:Error getting file ID in monitorLogFile()|ReadMultipleUserLogs:9004:Error initializing log file U:\A.log|MultiLogFiles:9001:Error (2, No such file or directory) opening file U:\A.log for creation or truncation
07/20/10 10:20:27 Of 2 nodes total:
07/20/10 10:20:27 Done Pre Queued Post Ready Un-Ready Failed
07/20/10 10:20:27 === === === === === === ===
07/20/10 10:20:27 0 0 0 0 0 2 0
07/20/10 10:20:27 ERROR: a cycle exists in the DAG
07/20/10 10:20:27 ---------------------- Job ----------------------
07/20/10 10:20:27 Node Name: A
07/20/10 10:20:27 Noop: false
07/20/10 10:20:27 NodeID: 0
07/20/10 10:20:27 Node Status: STATUS_ERROR
07/20/10 10:20:27 Node return val: -1003
07/20/10 10:20:27 Error: Unable to monitor node job log file
07/20/10 10:20:27 Job Submit File: A.job
07/20/10 10:20:27 Condor Job ID: [not yet submitted]
07/20/10 10:20:27 Q_PARENTS: <END>
07/20/10 10:20:27 Q_WAITING: <END>
07/20/10 10:20:27 Q_CHILDREN: B, <END>
07/20/10 10:20:27 ---------------------- Job ----------------------
07/20/10 10:20:27 Node Name: B
07/20/10 10:20:27 Noop: false
07/20/10 10:20:27 NodeID: 1
07/20/10 10:20:27 Node Status: STATUS_READY
07/20/10 10:20:27 Node return val: -1
07/20/10 10:20:27 Job Submit File: B.job
07/20/10 10:20:27 Condor Job ID: [not yet submitted]
07/20/10 10:20:27 Q_PARENTS: A, <END>
07/20/10 10:20:27 Q_WAITING: A, <END>
07/20/10 10:20:27 Q_CHILDREN: <END>
07/20/10 10:20:27 --------------------------------------- <END>
07/20/10 10:20:27 Aborting DAG...
07/20/10 10:20:27 Writing Rescue DAG to dag.dag.rescue001...
07/20/10 10:20:27 Note: 0 total job deferrals because of -MaxJobs limit (0)
07/20/10 10:20:27 Note: 0 total job deferrals because of -MaxIdle limit (0)
07/20/10 10:20:27 Note: 0 total job deferrals because of node category throttles
07/20/10 10:20:27 Note: 0 total PRE script deferrals because of -MaxPre limit (0)
07/20/10 10:20:27 Note: 0 total POST script deferrals because of -MaxPost limit (0)
But it doesn't say much. Can someone please drop a comment on this?
This Job is part of a hadoop cluster that I'm trying to build.
Thank you
Sassy