Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [condor-users] newbie question
- Date: Fri, 23 Apr 2004 16:18:19 -0500
- From: Michael Remijan <remijan@xxxxxxxxxxxxx>
- Subject: Re: [condor-users] newbie question
The HOSTALLOW_* values are whatever the installation script makes
them. I haved not edited them. Tailing NegotiatorLog, I see connect
failures. The central manager and one of the submit machines,
141.142.65.40, are not firewalled. The other submit machine 141.142.15.3,
may be.
4/23 16:06:38 ---------- Started Negotiation Cycle ----------
4/23 16:06:38 Phase 1: Obtaining ads from collector ...
4/23 16:06:38 Getting all public ads ...
4/23 16:06:38 Sorting 15 ads ...
4/23 16:06:38 Getting startd private ads ...
4/23 16:06:38 Got ads: 15 public and 7 private
4/23 16:06:38 Public ads include 2 submitter, 7 startd
4/23 16:06:38 Phase 2: Performing accounting ...
4/23 16:06:38 Phase 3: Sorting submitter ads by priority ...
4/23 16:06:38 Phase 4.1: Negotiating with schedds ...
4/23 16:06:38 Negotiating with remijan@xxxxxxxxxxxxx at <141.142.15.3:33875>
4/23 16:07:08 select returns 0, connect failed
4/23 16:07:08 Will keep trying for 30 seconds...
4/23 16:07:09 Connect failed for 30 seconds; returning FALSE
4/23 16:07:09 Failed to connect to <141.142.15.3:33875>
4/23 16:07:09 Error: Ignoring schedd for this cycle
4/23 16:07:09 Negotiating with remijan@xxxxxxxxxxxxx at <141.142.65.40:35243>
4/23 16:07:09 Request 00004.00000:
4/23 16:10:18 Can't connect to <141.142.15.3:33876>:0, errno = 110
4/23 16:10:18 Will keep trying for 10 seconds...
4/23 16:10:19 Connect failed for 10 seconds; returning FALSE
4/23 16:10:19 ERROR:
SECMAN:2003:TCP connection to <141.142.15.3:33876> failed
4/23 16:10:19 condor_write(): Socket closed when trying to write buffer
4/23 16:10:19 Buf::write(): condor_write() failed
4/23 16:10:19 Could not send PERMISSION
4/23 16:10:19 Error: Ignoring schedd for this cycle
4/23 16:10:19 ---------- Finished Negotiation Cycle ----------
At 02:26 PM 4/23/2004, you wrote:
5 match, but prefer another specific job despite its worse user-priority
This message is misleading. It really means, "something else is wrong".
Yup, that's vague and not useful.
Your ClassAds don't reveal anything interesting. Is there anything useful
in your job log file?
/home/remijan/condor-jobs/hello/job.log
Are your permissions set up correctly to allow you to access the central
manager? That is, are the HOSTALLOW_* variables set correctly? If they
aren't, you'll see permission denied errors in the CollectorLog files on
the central manager.
If the above do not help, try this:
1) On your central manager:
tail -f NegotiatorLog
2) On the submit computer:
condor_reschedule
You should see the negotiator trying to look for a match with your jobs.
It may report errors, or it may fail. What do you see?
If we don't get anywhere with this, let's set up a VNC session/phone call,
and I'll help debug it more directly. (VNC will let us share an X window
so we can both type and see what is in it.)
-alain
Condor Support Information:
http://www.cs.wisc.edu/condor/condor-support/
To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with
unsubscribe condor-users <your_email_address>
Condor Support Information:
http://www.cs.wisc.edu/condor/condor-support/
To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with
unsubscribe condor-users <your_email_address>