I am trying to execute a simple DAG workflow (bash script below)
which submits 2 condor jobs. The condor setup I have is a Personal
Condor (7.7) installed from development condor yum repo.The
'executable' command in the submit file has the macro
"simple.$$(OpSys).bat" defined and I have the binary
'simple.LINUX.bat' in the same working tree.
The first job - 1.condor finishes off successfully, but the second
job - 2.condor is held with the error msg "Cannot expand $$(OpSys)."
Curious thing is the OpSys attribute seems to be set to "LINUX"
(condor_status cmd below) and that the first job executes but not the
second job.
The log messages and my bash script are below. Any pointers as to
what my setup is missing would be a great help.
Thanks,
Poornima.
Here's what I poked into:
1. The Condor system mail msg for the second job: "Attribute
$$(OpSys) cannot be expanded because this attribute was not found in
the machine ClassAd."
2. $ condor_status -long |grep -i machine
Machine = "xxx.xxx.xxx.xxx"
Unhibernate = MY.MachineLastMatchTime =!= undefined
MyType = "Machine"
$ condor_status -long |grep -i opsys
OpSysAndVer = "LINUX"
OpSysVer = 206
OpSys = "LINUX"
3. Message from SchedLog, where it puts on hold job 81 (the held job
2.condor), but schedules job 80 (1.condor)
11/10/11 14:38:39 (pid:17416) Starting add_shadow_birthdate(80.0)
11/10/11 14:38:39 (pid:17416) Started shadow for job 80.0 on
centos6.lab.ac.uab.edu <10.0.0.26:40521> for ppreddy, (shadow pid =
16301)
11/10/11 14:38:39 (pid:17416) Finished negotiating for ppreddy in
local pool: 1 matched, 1 rejected
11/10/11 14:38:43 (pid:17416) Shadow pid 16301 for job 80.0 reports
job exit reason 100.
11/10/11 14:38:43 (pid:17416) match (centos6.lab.ac.uab.edu
<10.0.0.26:40521> for ppreddy) switching to job 81.0
11/10/11 14:38:43 (pid:17416) Shadow pid 16301 switching to job 81.0.
11/10/11 14:38:43 (pid:17416) Starting add_shadow_birthdate(81.0)
11/10/11 14:38:43 (pid:17416) Putting job 81.0 on hold - cannot
expand $$(OpSys)
11/10/11 14:38:43 (pid:17416) Job 81.0 put on hold: Cannot expand
$$(OpSys).
11/10/11 14:38:43 (pid:17416) Failed to expand job ad when switching
shadow 16301 to new job 81.0
4. NegotatorLog and MatchLog show rejection for not having found a
match:
11/10/11 14:38:39 Rejected 81.0 <x.x.x.x:51994>: no match found
#!/bin/bash
runs=2
arg1=4
arg2=10
for job in `seq $runs`
do
cat > $job.condor << EOF
Universe = vanilla
Executable = simple.\$\$(OpSys).bat
Arguments = $arg1 $arg2
Log = $job.log
Output = $job.out
Error = $job.error
Queue
EOF
let arg1=$arg1+1
let arg2=$arg2+10
JOB_LIST="$JOB_LIST $job"
done
# generate condor dagman to manage jobs
for JOB_ID in $JOB_LIST; do
echo "JOB job_$JOB_ID $JOB_ID.condor" >>master.dag
echo "SCRIPT PRE job_$JOB_ID pre-job " >>master.dag
echo "SCRIPT POST job_$JOB_ID post-job " >>master.dag
echo "RETRY job_$JOB_ID 5" >>master.dag
done
condor_submit_dag -notification Never master.dag >condor_submit_dag.out
------------------------------------------------------------------------
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/