Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] problems with startup and executing a job
- Date: Fri, 20 Oct 2006 14:43:32 +0900
- From: nini <nini663@xxxxxxx>
- Subject: Re: [Condor-users] problems with startup and executing a job
Also for Problem 2:
~>condor_q -analyze
007.000: Run analysis summary. Of 4 machines,
2 are rejected by your job's requirements
2 reject your job because of their own requirements
0 match but are serving users with a better priority in the pool
0 match but reject the job for unknown reasons
0 match but will not currently preempt their existing job
0 are available to run your job
No successful match recorded.
Last failed match: Fri Oct 20 12:01:27 2006
Reason for last match failure: no match found
The Requirements expression for your job is:
( target.Arch == "INTEL" ) && ( target.OpSys == "LINUX" ) &&
( target.Disk >= DiskUsage ) && ( ( target.Memory * 1024 ) >=
ImageSize ) &&
( TARGET.FileSystemDomain == MY.FileSystemDomain )
Condition Machines Matched Suggestion
--------- ---------------- ----------
1 ( TARGET.FileSystemDomain == "nini" )2
2 ( target.Arch == "INTEL" ) 4
3 ( target.OpSys == "LINUX" ) 4
4 ( target.Disk >= 10000 ) 4
5 ( ( 1024 * target.Memory ) >= 10000 )4
在 2006-10-20五的 11:03 +0900,nini写道:
> For Problem 2, the NegotiatorLog said:
>
> 10/20 10:26:01 ---------- Started Negotiation Cycle ----------
> 10/20 10:26:01 Phase 1: Obtaining ads from collector ...
> 10/20 10:26:01 Getting all public ads ...
> 10/20 10:26:01 Sorting 6 ads ...
> 10/20 10:26:01 Getting startd private ads ...
> 10/20 10:26:01 Got ads: 6 public and 2 private
> 10/20 10:26:01 Public ads include 1 submitter, 2 startd
> 10/20 10:26:01 Phase 2: Performing accounting ...
> 10/20 10:26:01 Phase 3: Sorting submitter ads by priority ...
> 10/20 10:26:01 Phase 4.1: Negotiating with schedds ...
> 10/20 10:26:01 Negotiating with condor@nini at <129.254.175.78:46913>
> 10/20 10:26:01 0 seconds so far
> 10/20 10:26:01 Request 00001.00000:
> 10/20 10:26:01 Rejected 1.0 condor@nini <129.254.175.78:46913>: no
> match found
> 10/20 10:26:01 Request 00006.00000:
> 10/20 10:26:01 Rejected 6.0 condor@nini <129.254.175.78:46913>: no
> match found
> 10/20 10:26:01 Got NO_MORE_JOBS; done negotiating
> 10/20 10:26:01 ---------- Finished Negotiation Cycle ----------
>
>
>
> 在 2006-10-20五的 10:59 +0900,nini写道:
> > Dear all, I got two problems:
> >
> > ~~~~~~~~~~~~~~~~~~~~~~~Problem 1~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >
> > After starting condor_master as root on all machines in the pool, the
> > MasterLog on central-manager looks ok, but that on the other machine has
> > problem:
> >
> > 10/20 09:48:17 ******************************************************
> > 10/20 09:48:17 ** condor_master (CONDOR_MASTER) STARTING UP
> > 10/20 09:48:17 ** /home/condor/condor/sbin/condor_master
> > 10/20 09:48:17 ** $CondorVersion: 6.8.1 Sep 17 2006 $
> > 10/20 09:48:17 ** $CondorPlatform: I386-LINUX_RHEL3 $
> > 10/20 09:48:17 ** PID = 2768
> > 10/20 09:48:17 ** Log last touched 10/18 17:52:01
> > 10/20 09:48:17 ******************************************************
> > 10/20 09:48:17 Using config
> > source: /home/condor/condor/etc/condor_config
> > 10/20 09:48:17 Using local config sources:
> > 10/20 09:48:17 /home/condor/condor_config.local
> > 10/20 09:48:17 DaemonCore: Command Socket at <129.254.187.125:42587>
> > 10/20 09:48:17 Started DaemonCore process
> > "/home/condor/condor/sbin/condor_startd", pid and pgroup = 2769
> > 10/20 09:48:18 Started DaemonCore process
> > "/home/condor/condor/sbin/condor_schedd", pid and pgroup = 2770
> > 10/20 09:48:23 attempt to connect to <129.254.187.125:9618> failed:
> > Connection refused (connect errno = 111).
> > 10/20 09:48:23 ERROR: SECMAN:2003:TCP connection to
> > <129.254.187.125:9618> failed
> >
> > 10/20 09:48:23 Failed to start non-blocking update to
> > <129.254.187.125:9618>.
> >
> >
> > The IP address above is the local machine's IP, should it be? Can
> > anybody give hints for the failed connection?
> >
> >
> > Just now I restart condor with condor_master, the MasterLog changed:
> >
> > 10/20 10:54:48 ******************************************************
> > 10/20 10:54:48 ** condor_master (CONDOR_MASTER) STARTING UP
> > 10/20 10:54:48 ** /home/condor/condor/sbin/condor_master
> > 10/20 10:54:48 ** $CondorVersion: 6.8.1 Sep 17 2006 $
> > 10/20 10:54:48 ** $CondorPlatform: I386-LINUX_RHEL3 $
> > 10/20 10:54:48 ** PID = 3527
> > 10/20 10:54:48 ** Log last touched 10/20 10:54:18
> > 10/20 10:54:48 ******************************************************
> > 10/20 10:54:48 Using config
> > source: /home/condor/condor/etc/condor_config
> > 10/20 10:54:48 Using local config sources:
> > 10/20 10:54:48 /home/condor/condor_config.local
> > 10/20 10:54:48 FileLock::obtain(1) failed - errno 11 (Resource
> > temporarily unavailable) 10/20 10:54:48 ERROR "Can't get lock on
> > "/home/condor/log/InstanceLock"" at line 976 in file master.C
> >
> >
> > ~~~~~~~~~~~~~~~~~~~~~~~~~Problem 2~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > Given Problem 1 not solved, I submit jobs on the central-manager, all
> > the jobs are kept idle, no execution. The jobs' logs contain only:
> >
> > 000 (007.000.000) 10/20 10:26:01 Job submitted from host:
> > <129.254.175.78:46913>
> > ...
> >
> > Condor is installed with all manager/submit/execute functions on
> > central-manager, I cannot solve what may cause this happening!
> >
> >
> > Thanks,
> >
> >
> >
> > _______________________________________________
> > Condor-users mailing list
> > To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> > subject: Unsubscribe
> > You can also unsubscribe by visiting
> > https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> >
> > The archives can be found at either
> > https://lists.cs.wisc.edu/archive/condor-users/
> > http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR
> >
>
>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at either
> https://lists.cs.wisc.edu/archive/condor-users/
> http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR