Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] New Cluster - match but reject the job for unknown reasons
- Date: Fri, 10 Jun 2011 15:31:51 -0400
- From: Dirk Colbry <colbrydi@xxxxxxx>
- Subject: Re: [Condor-users] New Cluster - match but reject the job for unknown reasons
Shahaan,
Thanks for your help. I did some digging and as it turns out it was a
firewall problem. The CONDOR_HOST was wide open to receive
connections but could not make any external requests. It was very
frustrating bug to find. I wonder if there is any way that condor
could return a better error message for this case?
Thanks again,
- Dirk
On Wed, May 25, 2011 at 8:40 PM, Shahaan Ayyub <shahaan@xxxxxxxxx> wrote:
> Hi Dirk,
> Have a careful look yourself; and also tail the Sched and Negotiator logs
> in your next mail.
> regards,
> Shahaan
>
>
> On Thu, May 26, 2011 at 1:32 AM, Dirk Colbry <colbrydi@xxxxxxx> wrote:
>>
>> Hey Everyone,
>>
>> I am setting up a new condor cluster. The CONDOR_HOST is running in
>> RHEL6.0 with condor 7.4.4 using a basic yum install. All of my worker
>> nodes are in WindowsXP also with condor 7.4.4. When I submit a job to
>> the windows machines the are always stuck in Idle and I seem to be
>> reproducing the problem described at the following link:
>>
>> https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=1645
>>
>> I have seen similar problems posted to this email list in the past but
>> I was unable to determine the proper direction for a solution. My
>> output of condor_q with the -better-analyze flag is as follows:
>>
>> ==================================
>> > condor_q -better-analyze 36.0
>>
>> -- Submitter: accumulator.hpcc.msu.edu : <10.1.1.24:49968> :
>> accumulator.hpcc.msu.edu
>> ---
>> 036.000: Run analysis summary. Of 6 machines,
>> 1 are rejected by your job's requirements
>> 2 reject your job because of their own requirements
>> 0 match but are serving users with a better priority in the pool
>> 3 match but reject the job for unknown reasons
>> 0 match but will not currently preempt their existing job
>> 0 match but are currently offline
>> 0 are available to run your job
>> Last successful match: Wed May 25 09:20:30 2011
>>
>> The Requirements expression for your job is:
>>
>> ( ( target.OpSys == "WINNT51" ) && ( target.Arch == "INTEL" ) ) &&
>> ( target.Disk >= DiskUsage ) && ( ( target.Memory * 1024 ) >= ImageSize )
>> &&
>> ( ( RequestMemory * 1024 ) >= ImageSize ) && ( target.HasFileTransfer )
>>
>> Condition Machines Matched Suggestion
>> --------- ---------------- ----------
>> 1 ( ( 1024 * ceiling(ifThenElse(JobVMMemory isnt
>> undefined,JobVMMemory,0.0)) ) >= 0 )
>> 0 REMOVE
>> 2 ( target.OpSys == "WINNT51" ) 5
>> 3 ( target.Arch == "INTEL" ) 5
>> 4 ( target.Disk >= 75 ) 6
>> 5 ( ( 1024 * target.Memory ) >= 0 ) 6
>> 6 ( target.HasFileTransfer ) 6
>> ==================================
>>
>> According to the above link the REMOVE suggestion from the
>> better-analyze flag is a red herring and the problem is more likely in
>> the firewalls. I put my CONDOR_HOST and one of my windowsXP boxes on a
>> private network, turned off all of the firewalls and still get the
>> same error so it looks like I am not dealing with a firewall problem.
>>
>> Has anyone else seen this problem? Do you have any suggestions for
>> things I could try? Is there any more information I could be looking
>> for to make this problem easier to diagnose?
>>
>> Thanks,
>>
>> - Dirk
>> _______________________________________________
>> Condor-users mailing list
>> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/condor-users/
>
>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
>
>