Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Checkpoint server doesn't spawn :/
- Date: Mon, 20 Mar 2006 09:52:20 +0100
- From: Nicolas GUIOT <nicolas.guiot@xxxxxxx>
- Subject: Re: [Condor-users] Checkpoint server doesn't spawn :/
OK, that worked, but I'm quite surpised : Why wasn't it neither in the orginial condor_config file, nor in the condor_confog.local.ckpt, even commented !!?! Did I miss it anywhere ? How could I have found it by myself ?
Thanks anyway, now I'm testing my checkpoint server :)
Nicolas
----------------
On Fri, 17 Mar 2006 12:50:36 -0600
Dan Bradley <dan@xxxxxxxxxxxx> wrote:
> The checkpoint server from 6.7.10 should work fine. Your problem
> appears to simple be that you don't have CKPT_SERVER defined in your
> config file. A typical config entry would look like this:
>
> CKPT_SERVER = $(SBIN)/condor_ckpt_server
>
> --Dan
>
> Nicolas GUIOT wrote:
>
> >Forgot to mention :
> >
> >I'm using Condor 6.7.10, and I've seen something about this in the changelog for 6.7.14 : do you think I should upgrade ?
> >
> >And if necessary, is there a right way to make an upgrade without disturbing the "production" system ? (all the execute and submit node run the shared /nfs/condor/sbin binaries...
> >
> >Thanks in advance for your help..
> >
> >++
> >Nicolas
> >
> >
> >----------------
> >On Fri, 17 Mar 2006 10:13:36 +0100
> >Nicolas GUIOT <nicolas.guiot@xxxxxxx> wrote:
> >
> >
> >
> >>Hi all,
> >>
> >>I setup a machine with condor, the edited the local config file, and added the following line :
> >>DAEMON_LIST = MASTER, CKPT_SERVER
> >> and also the CKPT variables (as in example file for ckpt server)
> >>
> >>Then I try to start condor, and nothing runs (ps ax|grep condor gives nothing)
> >>
> >>I tryed to remove the CKPT_SERVER line from the config file, and I can then see the master running :
> >>
> >># ps ax|grep condor
> >> 3229 ? Ss 0:00 /nfs/condor/sbin/condor_master
> >> 3239 pts/0 S+ 0:00 grep condor
> >>
> >>I then try to manually start the ckpt_server, and I get the following error :
> >># /ibpc/io/condor/sbin/condor_checkpoint
> >>Can't find address for local startd
> >>Perhaps you need to query another pool.
> >>
> >>
> >>I then have found and other error. In the first case (with CKPT_SERVER in condor_config.local), I can see a message in MasterLog :
> >>3/17 10:05:30 Using config file: /nfs/condor/etc/condor_config
> >>3/17 10:05:30 Using local config files: /scratch/condor/condor_config.local
> >>3/17 10:05:30 DaemonCore: Command Socket at <xxx.XXX.xxx.XXX:32796>
> >>3/17 10:05:30 CKPT_SERVER from your DAEMON_LIST is not defined the config files!
> >>3/17 10:05:30 ERROR "Must have the path to CKPT_SERVER defined." at line 1296 in file daemon.C
> >>
> >>--> Probably solving this will solve my whole problem : do you have any idea where I could define the "path to CKPT_SERVER", and what does this path represent ? server name, server dir ...?
> >>
> >>
> >>Thanks in advance
> >>Nicolas
-----------------------------------------------
CNRS - UPR 9080 : Laboratoire de Biochimie Theorique
Institut de Biologie Physico-Chimique
13 rue Pierre et Marie Curie
75005 PARIS - FRANCE
Tel : +33 158 41 51 70
Fax : +33 158 41 50 26
------------------------------------------------