Hi,
I have a heterogeneous pool, including IA64 and X86_64. The
X86_64 server is the submitter, and others are work nodes. I
compiled my source file on worker nodes, and submited it from the
submitter. Submiting my job, I used condor_q to query job, and the
result is as follows.
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE
CMD
7046.0 zhxue 9/14 17:41 0+00:27:19
R 0 0.0
ia64
7047.0 zhxue 9/14 17:41 0+00:00:01
H 0 0.0 data
Why 7047.0 is generated?
Furthermore, I use "condor_q -analyze" command, and it prompts the
following:
7046.000: Request is being serviced
---
7047.000: Request is held.
Hold reason: Error from starter on slot2@**.**.**: Failed to execute '/home/zhxue/.globus/.gass_cache/local/md5/58/1da5713002eb7a2d6fe3f76e3f673a/md5/b5/f7f0ea2e16e03c4fdb16fcbbb5abd9/data': Exec format error It seems 7047.0 is the execution process, but it can not been scheduled to
IA64 servers. (slot2@**.**.** is a core with
x86_64 architecture).
I specified "requirements" in the submit script, but it seems not work. The
script is as follows:
universe=grid
grid_resource = gt2
***.***.***:/jobmanager-condor requirements = Arch == "IA64" && OpSys == "Linux"
output = ......
error=....
log = ......
queue
Would you like to help me? Any suggestion is appreciated.
Thanks.
|