Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] New Cluster - match but reject the job for unknown reasons
- Date: Wed, 25 May 2011 11:32:34 -0400
- From: Dirk Colbry <colbrydi@xxxxxxx>
- Subject: [Condor-users] New Cluster - match but reject the job for unknown reasons
Hey Everyone,
I am setting up a new condor cluster. The CONDOR_HOST is running in
RHEL6.0 with condor 7.4.4 using a basic yum install. All of my worker
nodes are in WindowsXP also with condor 7.4.4. When I submit a job to
the windows machines the are always stuck in Idle and I seem to be
reproducing the problem described at the following link:
https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=1645
I have seen similar problems posted to this email list in the past but
I was unable to determine the proper direction for a solution. My
output of condor_q with the -better-analyze flag is as follows:
==================================
> condor_q -better-analyze 36.0
-- Submitter: accumulator.hpcc.msu.edu : <10.1.1.24:49968> :
accumulator.hpcc.msu.edu
---
036.000: Run analysis summary. Of 6 machines,
1 are rejected by your job's requirements
2 reject your job because of their own requirements
0 match but are serving users with a better priority in the pool
3 match but reject the job for unknown reasons
0 match but will not currently preempt their existing job
0 match but are currently offline
0 are available to run your job
Last successful match: Wed May 25 09:20:30 2011
The Requirements expression for your job is:
( ( target.OpSys == "WINNT51" ) && ( target.Arch == "INTEL" ) ) &&
( target.Disk >= DiskUsage ) && ( ( target.Memory * 1024 ) >= ImageSize ) &&
( ( RequestMemory * 1024 ) >= ImageSize ) && ( target.HasFileTransfer )
Condition Machines Matched Suggestion
--------- ---------------- ----------
1 ( ( 1024 * ceiling(ifThenElse(JobVMMemory isnt
undefined,JobVMMemory,0.0)) ) >= 0 )
0 REMOVE
2 ( target.OpSys == "WINNT51" ) 5
3 ( target.Arch == "INTEL" ) 5
4 ( target.Disk >= 75 ) 6
5 ( ( 1024 * target.Memory ) >= 0 ) 6
6 ( target.HasFileTransfer ) 6
==================================
According to the above link the REMOVE suggestion from the
better-analyze flag is a red herring and the problem is more likely in
the firewalls. I put my CONDOR_HOST and one of my windowsXP boxes on a
private network, turned off all of the firewalls and still get the
same error so it looks like I am not dealing with a firewall problem.
Has anyone else seen this problem? Do you have any suggestions for
things I could try? Is there any more information I could be looking
for to make this problem easier to diagnose?
Thanks,
- Dirk