I am trying to execute a simple DAG workflow (bash script below) which
submits 2 condor jobs. The condor setup I have is a Personal Condor
(7.7) installed from development condor yum repo.The 'executable'
command in the submit file has the macro "simple.$$(OpSys).bat" defined
and I have the binary 'simple.LINUX.bat' in the same working tree.
The first job - 1.condor finishes off successfully, but the second job -
2.condor is held with the error msg "Cannot expand $$(OpSys)." Curious
thing is the OpSys attribute seems to be set to "LINUX" (condor_status
cmd below) and that the first job executes but not the second job.
The log messages and my bash script are below. Any pointers as to what
my setup is missing would be a great help.
Thanks,
Poornima.
Here's what I poked into:
1. The Condor system mail msg for the second job: "Attribute $$(OpSys)
cannot be expanded because this attribute was not found in the machine
ClassAd."
2. $ condor_status -long |grep -i machine
Machine = "xxx.xxx.xxx.xxx"
Unhibernate = MY.MachineLastMatchTime =!= undefined
MyType = "Machine"
$ condor_status -long |grep -i opsys
OpSysAndVer = "LINUX"
OpSysVer = 206
OpSys = "LINUX"
3. Message from SchedLog, where it puts on hold job 81 (the held job
2.condor), but schedules job 80 (1.condor)
11/10/11 14:38:39 (pid:17416) Starting add_shadow_birthdate(80.0)
11/10/11 14:38:39 (pid:17416) Started shadow for job 80.0 on
centos6.lab.ac.uab.edu <10.0.0.26:40521> for ppreddy, (shadow pid = 16301)
11/10/11 14:38:39 (pid:17416) Finished negotiating for ppreddy in local
pool: 1 matched, 1 rejected
11/10/11 14:38:43 (pid:17416) Shadow pid 16301 for job 80.0 reports job
exit reason 100.
11/10/11 14:38:43 (pid:17416) match (centos6.lab.ac.uab.edu
<10.0.0.26:40521> for ppreddy) switching to job 81.0
11/10/11 14:38:43 (pid:17416) Shadow pid 16301 switching to job 81.0.
11/10/11 14:38:43 (pid:17416) Starting add_shadow_birthdate(81.0)
11/10/11 14:38:43 (pid:17416) Putting job 81.0 on hold - cannot expand
$$(OpSys)
11/10/11 14:38:43 (pid:17416) Job 81.0 put on hold: Cannot expand $$(OpSys).
11/10/11 14:38:43 (pid:17416) Failed to expand job ad when switching
shadow 16301 to new job 81.0
4. NegotatorLog and MatchLog show rejection for not having found a match:
11/10/11 14:38:39 Rejected 81.0 <x.x.x.x:51994>: no match found
#!/bin/bash
runs=2
arg1=4
arg2=10
for job in `seq $runs`
do
cat > $job.condor << EOF
Universe = vanilla
Executable = simple.\$\$(OpSys).bat
Arguments = $arg1 $arg2
Log = $job.log
Output = $job.out
Error = $job.error
Queue
EOF
let arg1=$arg1+1
let arg2=$arg2+10
JOB_LIST="$JOB_LIST $job"
done
# generate condor dagman to manage jobs
for JOB_ID in $JOB_LIST; do
echo "JOB job_$JOB_ID $JOB_ID.condor" >>master.dag
echo "SCRIPT PRE job_$JOB_ID pre-job " >>master.dag
echo "SCRIPT POST job_$JOB_ID post-job " >>master.dag
echo "RETRY job_$JOB_ID 5" >>master.dag
done
condor_submit_dag -notification Never master.dag >condor_submit_dag.out
------------------------------------------------------------------------
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/