Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] How to solve problem between condor and globus?
- Date: Thu, 22 Dec 2005 16:44:29 +0800
- From: "Fu-Ming Tsai" <sary357@xxxxxxxxxxxxxxxxxx>
- Subject: Re: [Condor-users] How to solve problem between condor and globus?
Hello, Pedro,
Please refer the attched file. It's my global condor_config file.
the following is my submit file.
[root@osgc01 root]# more /home/sary357/job/job4.jdl
Universe = globus
globusscheduler = osgc01.grid.sinica.edu.tw/jobmanager-condor
Executable = job4.sh
Output = job4.out
Error = job4.err
Log = job4.log
Requirements = (Name=="vm2@xxxxxxxxxxxxxxxxxxxxxxxxx")
should_transer_file = IF_NEEDED
when_to_transfer_output = ON_EXIT
Queue
[root@osgc01 root]# more /home/sary357/job/job4.sh
#!/bin/bash
/bin/hostname
Thank you for your attention!!
BR
On Wed, 21 Dec 2005 19:12:40 +0100, Pedro R. Br輍ger Taboada wrote
> I see many problems, staging, universe and expression. I need to see
> the submit file and the condor_config file. Perhaps the I can solve
> your problem.
>
> Pedro
>
> -----Urspr?gliche Nachricht-----
> Von: condor-users-bounces@xxxxxxxxxxx
> [mailto:condor-users-bounces@xxxxxxxxxxx] Im Auftrag von Fu-Ming Tsai
> Gesendet: Dienstag, 20. Dezember 2005 11:06
> An: Condor-Users Mail List
> Betreff: Re: [Condor-users] How to solve problem between condor and globus?
>
> Sorry, all,
> After trying so many times, I gave up and used NFS.
> However, I still can not submit globus job to condor.
> so, I tried to get some debug information.
>
> [sary357@osgc01 job]$ condor_q -analyze
> ---
> 4206.000: Run analysis summary. Of 4 machines,
> 3 are rejected by your job's requirements
> 0 reject your job because of their own requirements
> 0 match but are serving users with a better priority in the
> pool 1 match but reject the job for unknown reasons 0
> match but will not currently preempt their existing job 0 are
> available to run your job
>
> WARNING: Analysis is only meaningful for Globus universe jobs using
> matchmaking.
> ---
> 4207.000: Run analysis summary. Of 4 machines,
> 0 are rejected by your job's requirements
> 3 reject your job because of their own requirements
> 0 match but are serving users with a better priority in the
> pool 1 match but reject the job for unknown reasons 0
> match but will not currently preempt their existing job 0 are
> available to run your job
> Last successful match: Tue Dec 20 09:45:24 2005
> Last failed match: Tue Dec 20 09:55:31 2005 Reason
> for last match failure: no match found
>
> == StarterLog.vm2==
> 12/20 17:35:36 Shadow version: $CondorVersion: 6.7.7 Apr 27 2005 $
> 12/20 17:35:36 Submitting machine is "osgc01.grid.sinica.edu.tw"
> 12/20 17:35:36 ShouldTransferFiles is "NO", NOT transfering files
> 12/20 17:35:36 Submit UidDomain: "grid.sinica.edu.tw"
> 12/20 17:35:36 Local UidDomain: "grid.sinica.edu.tw"
> 12/20 17:35:36 Initialized user_priv as "sary357"
>
> 12/20 17:35:36 Done moving to directory "/opt/osg/osgs01/execute/dir_6591"
>
> 12/20 17:35:36 JICShadow::initIOProxy(): Job does not define WantIOProxy
> 12/20 17:35:36 No StarterUserLog found in job ClassAd
> 12/20 17:35:36 Starter will not write a local UserLog
> 12/20 17:35:36 Starting a VANILLA universe job with ID: 4207.0
> 12/20 17:35:36 In OsProc::OsProc()
> 12/20 17:35:36 Main job KillSignal: 15 (SIGTERM)
> 12/20 17:35:36 Main job RmKillSignal: 15 (SIGTERM)
> 12/20 17:35:36 Main job HoldKillSignal: 15 (SIGTERM)
> 12/20 17:35:36 in VanillaProc::StartJob()
> 12/20 17:35:36 in OsProc::StartJob()
> 12/20 17:35:36 IWD: /home/sary357/gram_scratch_tUb21E3Wqv
> 12/20 17:35:36 Input file: /dev/null
> 12/20 17:35:36 Failed to
> open
> '/home/sary357/.globus/job/osgc01.grid.sinica.edu.tw/17186.1135070994/std
> out' as standard output: No such file or directory (errno 2)
> 12/20 17:35:36 Failed to
> open
> '/home/sary357/.globus/job/osgc01.grid.sinica.edu.tw/17186.1135070994/std
> err' as standard error: No such file or directory (errno 2)
> 12/20 17:35:36 Failed to open some/all of the std files...
> 12/20 17:35:36 Aborting OsProc::StartJob.
> 12/20 17:35:36 Failed to start job, exiting
> 12/20 17:35:36 ShutdownFast all jobs.
> 12/20 17:35:36 Got ShutdownFast when no jobs running.
> 12/20 17:35:36 Removing /opt/osg/osgs01/execute/dir_6591
>
> 12/20 17:35:36 Attempting to remove /opt/osg/osgs01/execute/dir_6591
> as SuperUser (root)
> =========================
>
> [sary357@osgc01 job]$ condor_q -better-analyze 4206
>
> -- Submitter: osgc01.grid.sinica.edu.tw : <140.109.98.41:41846> :
> osgc01.grid.sinica.edu.tw
> ---
> 4206.000: Run analysis summary. Of 4 machines,
> 3 are rejected by your job's requirements
> 0 reject your job because of their own requirements
> 0 match but are serving users with a better priority in the
> pool 1 match but reject the job for unknown reasons 0
> match but will not currently preempt their existing job 0 are
> available to run your job
>
> The Requirements expression for your job is:
>
> ( ( target.Name == "vm2@xxxxxxxxxxxxxxxxxxxxxxxxx" ) )
>
> Condition Machines Matched Suggestion
> --------- ---------------- ----------
> 1 ( ( target.Name == "vm2@xxxxxxxxxxxxxxxxxxxxxxxxx" ) )
> 1
>
> WARNING: Analysis is only meaningful for Globus universe jobs using
> matchmaking.
> [sary357@osgc01 job]$ condor_q -better-analyze 4207
>
> -- Submitter: osgc01.grid.sinica.edu.tw : <140.109.98.41:41846> :
> osgc01.grid.sinica.edu.tw
> Segmentation fault
>
> I'm sure the FileDomain in those 2 machines are the same.
> It looks like the output file and error file can not be built. Does
> anyone know?
>
> BR
>
> ----------------------------------------------------------------------
> "Gravitation is not responsible for people falling in love."
>
> Fu-Ming Tsai
> Academia Sinica Computing Centre, Academia Sinica
> sary357@xxxxxxxxxxxxxxxxxx
> ------------------------------------------------------------------------
>
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
----------------------------------------------------------------------
"Gravitation is not responsible for people falling in love."
Fu-Ming Tsai
Academia Sinica Computing Centre, Academia Sinica
sary357@xxxxxxxxxxxxxxxxxx
------------------------------------------------------------------------
######################################################################
##
## condor_config
##
## This is the global configuration file for condor.
##
## The file is divided into four main parts:
## Part 1: Settings you MUST customize
## Part 2: Settings you may want to customize
## Part 3: Settings that control the policy of when condor will
## start and stop jobs on your machines
## Part 4: Settings you should probably leave alone (unless you
## know what you're doing)
##
## Please read the INSTALL file (or the Install chapter in the
## Condor Administrator's Manual) for detailed explanations of the
## various settings in here and possible ways to configure your
## pool.
##
## If you are installing Condor as root and then handing over the
## administration of this file to a person you do not trust with
## root access, please read the Installation chapter paying careful
## note to the condor_config.root entries.
##
## Unless otherwise specified, settings that are commented out show
## the defaults that are used if you don't define a value. Settings
## that are defined here MUST BE DEFINED since they have no default
## value.
##
## Unless otherwise indicated, all settings which specify a time are
## defined in seconds.
##
######################################################################
######################################################################
######################################################################
##
## ###### #
## # # ## ##### ##### ##
## # # # # # # # # #
## ###### # # # # # #
## # ###### ##### # #
## # # # # # # #
## # # # # # # #####
##
## Part 1: Settings you must customize:
######################################################################
######################################################################
## What machine is your central manager?
CONDOR_HOST = osgc01.grid.sinica.edu.tw
##--------------------------------------------------------------------
## Pathnames:
##--------------------------------------------------------------------
## Where have you installed the bin, sbin and lib condor directories?
RELEASE_DIR = /opt/osg/osg_0.2.0/condor
## Where is the local condor directory for each host?
## This is where the local config file(s), logs and
## spool/execute directories are located
#LOCAL_DIR = /opt/osg/osg_0.2.0/condor/home
LOCAL_DIR = /opt/osg/$(HOSTNAME)
## Where is the machine-specific local config file for each host?
LOCAL_CONFIG_FILE = /opt/osg/$(HOSTNAME)/condor_config.local
#LOCAL_CONFIG_FILE = $(RELEASE_DIR)/etc/$(HOSTNAME).local
## If the local config file is not present, is it an error?
## WARNING: This is a potential security issue.
## If not specificed, te default is True
#REQUIRE_LOCAL_CONFIG_FILE = TRUE
##--------------------------------------------------------------------
## Mail parameters:
##--------------------------------------------------------------------
## When something goes wrong with condor at your site, who should get
## the email?
CONDOR_ADMIN = sary357@xxxxxxxxxxxxxxxxxx
## Full path to a mail delivery program that understands that "-s"
## means you want to specify a subject:
MAIL = /bin/mail
##--------------------------------------------------------------------
## Network domain parameters:
##--------------------------------------------------------------------
## Internet domain of machines sharing a common UID space. If your
## machines don't share a common UID space, set it to
## UID_DOMAIN = $(FULL_HOSTNAME)
## to specify that each machine has its own UID space.
#UID_DOMAIN = $(FULL_HOSTNAME)
UID_DOMAIN = grid.sinica.edu.tw
## Internet domain of machines sharing a common file system.
## If your machines don't use a network file system, set it to
## FILESYSTEM_DOMAIN = $(FULL_HOSTNAME)
## to specify that each machine has its own file system.
FILESYSTEM_DOMAIN = grid.sinica.edu.tw
## This macro is used to specify a short description of your pool.
## It should be about 20 characters long. For example, the name of
## the UW-Madison Computer Science Condor Pool is ``UW-Madison CS''.
COLLECTOR_NAME = My Pool
######################################################################
######################################################################
##
## ###### #####
## # # ## ##### ##### # #
## # # # # # # # #
## ###### # # # # # #####
## # ###### ##### # #
## # # # # # # #
## # # # # # # #######
##
## Part 2: Settings you may want to customize:
## (it is generally safe to leave these untouched)
######################################################################
######################################################################
##
## The user/group ID <uid>.<gid> of the "Condor" user.
## (this can also be specified in the environment)
#CONDOR_IDS=517.517
CONDOR_IDS=0.0
##--------------------------------------------------------------------
## Flocking: Submitting jobs to more than one pool
##--------------------------------------------------------------------
## Flocking allows you to run your jobs in other pools, or lets
## others run jobs in your pool.
##
## To let others flock to you, define FLOCK_FROM.
##
## To flock to others, define FLOCK_TO.
## FLOCK_FROM defines the machines where you would like to grant
## people access to your pool via flocking. (i.e. you are granting
## access to these machines to join your pool).
FLOCK_FROM = *.grid.sinica.edu.tw
## An example of this is:
#FLOCK_FROM = somehost.friendly.domain, anotherhost.friendly.domain
## FLOCK_TO defines the central managers of the pools that you want
## to flock to. (i.e. you are specifying the machines that you
## want your jobs to be negotiated at -- thereby specifying the
## pools they will run in.)
FLOCK_TO = *.grid.sinica.edu.tw
## An example of this is:
#FLOCK_TO = central_manager.friendly.domain, condor.cs.wisc.edu
## FLOCK_COLLECTOR_HOSTS should almost always be the same as
## FLOCK_NEGOTIATOR_HOSTS (as shown below). The only reason it would be
## different is if the collector and negotiator in the pool that you are
## flocking too are running on different machines (not recommended).
## The collectors must be specified in the same corresponding order as
## the FLOCK_NEGOTIATOR_HOSTS list.
FLOCK_NEGOTIATOR_HOSTS = $(FLOCK_TO)
FLOCK_COLLECTOR_HOSTS = $(FLOCK_TO)
## An example of having the negotiator and the collector on different
## machines is:
#FLOCK_NEGOTIATOR_HOSTS = condor.cs.wisc.edu, condor-negotiator.friendly.domain
#FLOCK_COLLECTOR_HOSTS = condor.cs.wisc.edu, condor-collector.friendly.domain
##--------------------------------------------------------------------
## Host/IP access levels
##--------------------------------------------------------------------
## Please see the administrator's manual for details on these
## settings, what they're for, and how to use them.
## What machines have administrative rights for your pool? This
## defaults to your central manager. You should set it to the
## machine(s) where whoever is the condor administrator(s) works
## (assuming you trust all the users who log into that/those
## machine(s), since this is machine-wide access you're granting).
HOSTALLOW_ADMINISTRATOR = $(CONDOR_HOST)
## If there are no machines that should have administrative access
## to your pool (for example, there's no machine where only trusted
## users have accounts), you can uncomment this setting.
## Unfortunately, this will mean that administering your pool will
## be more difficult.
#HOSTDENY_ADMINISTRATOR = *
## What machines should have "owner" access to your machines, meaning
## they can issue commands that a machine owner should be able to
## issue to their own machine (like condor_vacate). This defaults to
## machines with administrator access, and the local machine. This
## is probably what you want.
HOSTALLOW_OWNER = $(FULL_HOSTNAME), $(HOSTALLOW_ADMINISTRATOR)
## Read access. Machines listed as allow (and/or not listed as deny)
## can view the status of your pool, but cannot join your pool
## or run jobs.
## NOTE: By default, without these entries customized, you
## are granting read access to the whole world. You may want to
## restrict that to hosts in your domain. If possible, please also
## grant read access to "*.cs.wisc.edu", so the Condor developers
## will be able to view the status of your pool and more easily help
## you install, configure or debug your Condor installation.
## It is important to have this defined.
HOSTALLOW_READ = *
#HOSTALLOW_READ = *.your.domain, *.cs.wisc.edu
#HOSTDENY_READ = *.bad.subnet, bad-machine.your.domain, 144.77.88.*
## Write access. Machines listed here can join your pool, submit
## jobs, etc. Note: Any machine which has WRITE access must
## also be granted READ access. Granting WRITE access below does
## not also automatically grant READ access; you must change
## HOSTALLOW_READ above as well.
## If you leave it as it is, it will be unspecified, and effectively
## it will be allowing anyone to write to your pool.
HOSTALLOW_WRITE = *
#HOSTALLOW_WRITE = *.your.domain, your-friend's-machine.other.domain
#HOSTDENY_WRITE = bad-machine.your.domain
## Negotiator access. Machines listed here are trusted central
## managers. You should normally not have to change this.
HOSTALLOW_NEGOTIATOR = $(CONDOR_HOST)
## Now, with flocking we need to let the SCHEDD trust the other
## negotiators we are flocking with as well. You should normally
## not have to change this either.
HOSTALLOW_NEGOTIATOR_SCHEDD = $(CONDOR_HOST), $(FLOCK_NEGOTIATOR_HOSTS)
## Config access. Machines listed here can use the condor_config_val
## tool to modify all daemon configurations except those specified in
## the condor_config.root file. This level of host-wide access
## should only be granted with extreme caution. By default, config
## access is denied from all hosts.
#HOSTALLOW_CONFIG = trusted-host.your.domain
## Flocking Configs. These are the real things that Condor looks at,
## but we set them from the FLOCK_FROM/TO macros above. It is safe
## to leave these unchanged.
HOSTALLOW_WRITE_COLLECTOR = $(HOSTALLOW_WRITE), $(FLOCK_FROM)
HOSTALLOW_WRITE_STARTD = $(HOSTALLOW_WRITE), $(FLOCK_FROM)
HOSTALLOW_READ_COLLECTOR = $(HOSTALLOW_READ), $(FLOCK_FROM)
HOSTALLOW_READ_STARTD = $(HOSTALLOW_READ), $(FLOCK_FROM)
##--------------------------------------------------------------------
## Security parameters for setting configuration values remotely:
##--------------------------------------------------------------------
## These parameters define the list of attributes that can be set
## remotely with condor_config_val for the security access levels
## defined above (for example, WRITE, ADMINISTRATOR, CONFIG, etc).
## Please see the administrator's manual for futher details on these
## settings, what they're for, and how to use them. There are no
## default values for any of these settings. If they are not
## defined, no attributes can be set with condor_config_val.
## Attributes that can be set by hosts with "CONFIG" permission (as
## defined with HOSTALLOW_CONFIG and HOSTDENY_CONFIG above).
## The commented-out value here was the default behavior of Condor
## prior to version 6.3.3. If you don't need this behavior, you
## should leave this commented out.
#SETTABLE_ATTRS_CONFIG = *
## Attributes that can be set by hosts with "ADMINISTRATOR"
## permission (as defined above)
#SETTABLE_ATTRS_ADMINISTRATOR = *_DEBUG, MAX_*_LOG
## Attributes that can be set by hosts with "OWNER" permission (as
## defined above) NOTE: any Condor job running on a given host will
## have OWNER permission on that host by default. If you grant this
## kind of access, Condor jobs will be able to modify any attributes
## you list below on the machine where they are running. This has
## obvious security implications, so only grant this kind of
## permission for custom attributes that you define for your own use
## at your pool (custom attributes about your machines that are
## published with the STARTD_EXPRS setting, for example).
#SETTABLE_ATTRS_OWNER = your_custom_attribute, another_custom_attr
## You can also define daemon-specific versions of each of these
## settings. For example, to define settings that can only be
## changed in the condor_startd's configuration by hosts with OWNER
## permission, you would use:
#STARTD_SETTABLE_ATTRS_OWNER = your_custom_attribute_name
##--------------------------------------------------------------------
## Network filesystem parameters:
##--------------------------------------------------------------------
## Do you want to use NFS for file access instead of remote system
## calls?
#USE_NFS = False
## Do you want to use AFS for file access instead of remote system
## calls?
#USE_AFS = False
##--------------------------------------------------------------------
## Checkpoint server:
##--------------------------------------------------------------------
## Do you want to use a checkpoint server if one is available? If a
## checkpoint server isn't available or USE_CKPT_SERVER is set to
## False, checkpoints will be written to the local SPOOL directory on
## the submission machine.
#USE_CKPT_SERVER = True
## What's the hostname of this machine's nearest checkpoint server?
#CKPT_SERVER_HOST = checkpoint-server-hostname.your.domain
## Do you want the starter on the execute machine to choose the
## checkpoint server? If False, the CKPT_SERVER_HOST set on
## the submit machine is used. Otherwise, the CKPT_SERVER_HOST set
## on the execute machine is used. The default is true.
#STARTER_CHOOSES_CKPT_SERVER = True
##--------------------------------------------------------------------
## Miscellaneous:
##--------------------------------------------------------------------
## Try to save this much swap space by not starting new shadows.
## Specified in megabytes.
#RESERVED_SWAP = 5
## What's the maximum number of jobs you want a single submit machine
## to spawn shadows for?
#MAX_JOBS_RUNNING = 200
## Condor needs to create a few lock files to synchronize access to
## various log files. Because of problems we've had with network
## filesystems and file locking over the years, we HIGHLY recommend
## that you put these lock files on a local partition on each
## machine. If you don't have your LOCAL_DIR on a local partition,
## be sure to change this entry. Whatever user (or group) condor is
## running as needs to have write access to this directory. If
## you're not running as root, this is whatever user you started up
## the condor_master as. If you are running as root, and there's a
## condor account, it's probably condor. Otherwise, it's whatever
## you've set in the CONDOR_IDS environment variable. See the Admin
## manual for details on this.
LOCK = $(LOG)
## If you don't use a fully qualified name in your /etc/hosts file
## (or NIS, etc.) for either your official hostname or as an alias,
## Condor wouldn't normally be able to use fully qualified names in
## places that it'd like to. You can set this parameter to the
## domain you'd like appended to your hostname, if changing your host
## information isn't a good option. This parameter must be set in
## the global config file (not the LOCAL_CONFIG_FILE from above).
#DEFAULT_DOMAIN_NAME = your.domain.name
## Condor can be told whether or not you want the Condor daemons to
## create a core file if something really bad happens. This just
## sets the resource limit for the size of a core file. By default,
## we don't do anything, and leave in place whatever limit was in
## effect when you started the Condor daemons. If this parameter is
## set and "True", we increase the limit to as large as it gets. If
## it's set to "False", we set the limit at 0 (which means that no
## core files are even created). Core files greatly help the Condor
## developers debug any problems you might be having.
#CREATE_CORE_FILES = True
## Condor Glidein downloads binaries from a remote server for the
## machines into which you're gliding. This saves you from manually
## downloading and installing binaries for every architecture you
## might want to glidein to. The default server is one maintained at
## The University of Wisconsin. If you don't want to use the UW
## server, you can set up your own and change the following values to
## point to it, instead.
#GLIDEIN_SERVER_URLS = \
# http://www.cs.wisc.edu/condor/glidein/binaries \
# gsiftp://gridftp.cs.wisc.edu/p/condor/public/binaries/glidein
GLIDEIN_SERVER_NAME = gridftp.cs.wisc.edu
GLIDEIN_SERVER_DIR = /p/condor/public/binaries/glidein
## If your site needs to use UID_DOMAIN settings (defined above) that
## are not real Internet domains that match the hostnames, you can
## tell Condor to trust whatever UID_DOMAIN a submit machine gives to
## the execute machine and just make sure the two strings match. The
## default for this setting is False, since it is more secure this
## way.
#TRUST_UID_DOMAIN = False
## If you would like to be informed in near real-time via condor_q when
## a vanilla/standard/java job is in a suspension state, set this attribute to
## TRUE. However, this real-time update of the condor_schedd by the shadows
## could cause performance issues if there are thousands of concurrently
## running vanilla/standard/java jobs under a single condor_schedd and they are
## allowed to suspend and resume.
#REAL_TIME_JOB_SUSPEND_UPDATES = False
## A standard universe job can perform arbitrary shell calls via the
## libc 'system()' function. This function call is routed back to the shadow
## which performs the actual system() invocation in the initialdir of the
## running program and as the user who submitted the job. However, since the
## user job can request ARBITRARY shell commands to be run by the shadow, this
## is a generally unsafe practice. This should only be made available if it is
## actually needed. If this attribute is not defined, then it is the same as
## it being defined to False. Set it to True to allow the shadow to execute
## arbitrary shell code from the user job.
#SHADOW_ALLOW_UNSAFE_REMOTE_EXEC = False
##--------------------------------------------------------------------
## Settings that control the daemon's debugging output:
##--------------------------------------------------------------------
##
## The flags given in ALL_DEBUG are shared between all daemons.
##
#ALL_DEBUG =
ALL_DEBUG =D_FULLDEBUG
MAX_COLLECTOR_LOG = 100000000
COLLECTOR_DEBUG =
#COLLECTOR_DEBUG = D_FULLDEBUG
MAX_KBDD_LOG = 100000000
KBDD_DEBUG =
MAX_NEGOTIATOR_LOG = 100000000
NEGOTIATOR_DEBUG = D_MATCH
MAX_NEGOTIATOR_MATCH_LOG = 100000000
MAX_SCHEDD_LOG = 100000000
SCHEDD_DEBUG = D_COMMAND
#SCHEDD_DEBUG = D_FULLDEBUG
MAX_SHADOW_LOG = 100000000
SHADOW_DEBUG =
#SHADOW_DEBUG =D_FULLDEBUG
MAX_STARTD_LOG = 100000000
STARTD_DEBUG = D_COMMAND
#STARTD_DEBUG = D_FULLDEBUG
MAX_STARTER_LOG = 100000000
STARTER_DEBUG = D_NODATE
#STARTER_DEBUG = D_FULLDEBUG
MAX_MASTER_LOG = 100000000
MASTER_DEBUG = D_COMMAND
#MASTER_DEBUG = D_FULLDEBUG
## When the master starts up, should it truncate it's log file?
#TRUNC_MASTER_LOG_ON_OPEN = False
######################################################################
######################################################################
##
## ###### #####
## # # ## ##### ##### # #
## # # # # # # # #
## ###### # # # # # #####
## # ###### ##### # #
## # # # # # # # #
## # # # # # # #####
##
## Part 3: Settings control the policy for running, stopping, and
## periodically checkpointing condor jobs:
######################################################################
######################################################################
## This section contains macros are here to help write legible
## expressions:
MINUTE = 60
HOUR = (60 * $(MINUTE))
StateTimer = (CurrentTime - EnteredCurrentState)
ActivityTimer = (CurrentTime - EnteredCurrentActivity)
ActivationTimer = (CurrentTime - JobStart)
LastCkpt = (CurrentTime - LastPeriodicCheckpoint)
## The JobUniverse attribute is just an int. These macros can be
## used to specify the universe in a human-readable way:
STANDARD = 1
PVM = 4
VANILLA = 5
MPI = 8
IsPVM = (TARGET.JobUniverse == $(PVM))
IsMPI = (TARGET.JobUniverse == $(MPI))
IsVanilla = (TARGET.JobUniverse == $(VANILLA))
IsStandard = (TARGET.JobUniverse == $(STANDARD))
NonCondorLoadAvg = (LoadAvg - CondorLoadAvg)
BackgroundLoad = 0.3
HighLoad = 0.5
StartIdleTime = 15 * $(MINUTE)
ContinueIdleTime = 5 * $(MINUTE)
MaxSuspendTime = 10 * $(MINUTE)
MaxVacateTime = 10 * $(MINUTE)
KeyboardBusy = (KeyboardIdle < $(MINUTE))
ConsoleBusy = (ConsoleIdle < $(MINUTE))
CPUIdle = ($(NonCondorLoadAvg) <= $(BackgroundLoad))
CPUBusy = ($(NonCondorLoadAvg) >= $(HighLoad))
KeyboardNotBusy = ($(KeyboardBusy) == False)
BigJob = (TARGET.ImageSize >= (50 * 1024))
MediumJob = (TARGET.ImageSize >= (15 * 1024) && TARGET.ImageSize < (50 * 1024))
SmallJob = (TARGET.ImageSize < (15 * 1024))
JustCPU = ($(CPUBusy) && ($(KeyboardBusy) == False))
MachineBusy = ($(CPUBusy) || $(KeyboardBusy))
## The RANK expression controls which jobs this machine prefers to
## run over others. Some examples from the manual include:
## RANK = TARGET.ImageSize
## RANK = (Owner == "coltrane") + (Owner == "tyner") \
## + ((Owner == "garrison") * 10) + (Owner == "jones")
## By default, RANK is always 0, meaning that all jobs have an equal
## ranking.
#RANK = 0
#####################################################################
## This where you choose the configuration that you would like to
## use. It has no defaults so it must be defined. We start this
## file off with the UWCS_* policy.
######################################################################
## Also here is what is referred to as the TESTINGMODE_*, which is
## a quick hardwired way to test Condor.
## Replace UWCS_* with TESTINGMODE_* if you wish to do testing mode.
## For example:
## WANT_SUSPEND = $(UWCS_WANT_SUSPEND)
## becomes
## WANT_SUSPEND = $(TESTINGMODE_WANT_SUSPEND)
WANT_SUSPEND = $(UWCS_WANT_SUSPEND)
WANT_VACATE = $(UWCS_WANT_VACATE)
## When is this machine willing to start a job?
START = $(UWCS_START)
## When to suspend a job?
SUSPEND = $(UWCS_SUSPEND)
## When to resume a suspended job?
CONTINUE = $(UWCS_CONTINUE)
## When to nicely stop a job?
## (as opposed to killing it instantaneously)
PREEMPT = $(UWCS_PREEMPT)
## When to instantaneously kill a preempting job
## (e.g. if a job is in the pre-empting stage for too long)
KILL = $(UWCS_KILL)
PERIODIC_CHECKPOINT = $(UWCS_PERIODIC_CHECKPOINT)
PREEMPTION_REQUIREMENTS = $(UWCS_PREEMPTION_REQUIREMENTS)
PREEMPTION_RANK = $(UWCS_PREEMPTION_RANK)
NEGOTIATOR_PRE_JOB_RANK = $(UWCS_NEGOTIATOR_PRE_JOB_RANK)
NEGOTIATOR_POST_JOB_RANK = $(UWCS_NEGOTIATOR_POST_JOB_RANK)
MaxJobRetirementTime = $(UWCS_MaxJobRetirementTime)
#####################################################################
## This is the UWisc - CS Department Configuration.
#####################################################################
UWCS_WANT_SUSPEND = ( $(SmallJob) || $(KeyboardNotBusy) \
|| $(IsPVM) || $(IsVanilla) )
UWCS_WANT_VACATE = ( $(ActivationTimer) > 10 * $(MINUTE) \
|| $(IsPVM) || $(IsVanilla) )
# Only start jobs if:
# 1) the keyboard has been idle long enough, AND
# 2) the load average is low enough OR the machine is currently
# running a Condor job
# (NOTE: Condor will only run 1 job at a time on a given resource.
# The reasons Condor might consider running a different job while
# already running one are machine Rank (defined above), and user
# priorities.)
UWCS_START = ( (KeyboardIdle > $(StartIdleTime)) \
&& ( $(CPUIdle) || \
(State != "Unclaimed" && State != "Owner")) )
# Suspend jobs if:
# 1) the keyboard has been touched, OR
# 2a) The cpu has been busy for more than 2 minutes, AND
# 2b) the job has been running for more than 90 seconds
UWCS_SUSPEND = ( $(KeyboardBusy) || \
( (CpuBusyTime > 2 * $(MINUTE)) \
&& $(ActivationTimer) > 90 ) )
# Continue jobs if:
# 1) the cpu is idle, AND
# 2) we've been suspended more than 10 seconds, AND
# 3) the keyboard hasn't been touched in a while
UWCS_CONTINUE = ( $(CPUIdle) && ($(ActivityTimer) > 10) \
&& (KeyboardIdle > $(ContinueIdleTime)) )
# Preempt jobs if:
# 1) The job is suspended and has been suspended longer than we want
# 2) OR, we don't want to suspend this job, but the conditions to
# suspend jobs have been met (someone is using the machine)
UWCS_PREEMPT = ( ((Activity == "Suspended") && \
($(ActivityTimer) > $(MaxSuspendTime))) \
|| (SUSPEND && (WANT_SUSPEND == False)) )
# Maximum time (in seconds) to wait for a job to finish before kicking
# it off (due to PREEMPT, a higher priority claim, or the startd
# gracefully shutting down). This is computed from the time the job
# was started, minus any suspension time. Once the retirement time runs
# out, the usual preemption process will take place. The job may
# self-limit the retirement time to _less_ than what is given here.
# By default, nice user jobs and standard universe jobs set their
# MaxJobRetirementTime to 0, so they will usually not wait in retirement.
UWCS_MaxJobRetirementTime = 0
# Kill jobs if they have taken too long to vacate gracefully
UWCS_KILL = $(ActivityTimer) > $(MaxVacateTime)
## Only define vanilla versions of these if you want to make them
## different from the above settings.
#SUSPEND_VANILLA = ( $(KeyboardBusy) || \
# ((CpuBusyTime > 2 * $(MINUTE)) && $(ActivationTimer) > 90) )
#CONTINUE_VANILLA = ( $(CPUIdle) && ($(ActivityTimer) > 10) \
# && (KeyboardIdle > $(ContinueIdleTime)) )
#PREEMPT_VANILLA = ( ((Activity == "Suspended") && \
# ($(ActivityTimer) > $(MaxSuspendTime))) \
# || (SUSPEND_VANILLA && (WANT_SUSPEND == False)) )
#KILL_VANILLA = $(ActivityTimer) > $(MaxVacateTime)
## We use a simple Periodic checkpointing mechanism, but then
## again we have a very fast network.
UWCS_PERIODIC_CHECKPOINT = $(LastCkpt) > (3 * $(HOUR))
## You might want to checkpoint a little less often. A good
## example of this is below. For jobs smaller than 60 megabytes, we
## periodic checkpoint every 6 hours. For larger jobs, we only
## checkpoint every 12 hours.
#UWCS_PERIODIC_CHECKPOINT = ( (TARGET.ImageSize < 60000) && \
# ($(LastCkpt) > (6 * $(HOUR))) ) || \
# ( $(LastCkpt) > (12 * $(HOUR)) )
## The rank expressions used by the negotiator are configured below.
## This is the order in which ranks are applied by the negotiator:
## 1. NEGOTIATOR_PRE_JOB_RANK
## 2. rank in job ClassAd
## 3. NEGOTIATOR_POST_JOB_RANK
## 4. cause of preemption (0=user priority,1=startd rank,2=no preemption)
## 5. PREEMPTION_RANK
## The NEGOTIATOR_PRE_JOB_RANK expression overrides all other ranks
## that are used to pick a match from the set of possibilities.
## The following expression matches jobs to unclaimed resources
## whenever possible, regardless of the job-supplied rank.
UWCS_NEGOTIATOR_PRE_JOB_RANK = RemoteOwner =?= UNDEFINED
## The NEGOTIATOR_POST_JOB_RANK expression chooses between
## resources that are equally preferred by the job.
## The following example expression steers jobs toward
## faster machines and tends to fill a cluster of multi-processors
## breadth-first instead of depth-first. In this example,
## the expression is chosen to have no effect when preemption
## would take place, allowing control to pass on to
## PREEMPTION_RANK.
#UWCS_NEGOTIATOR_POST_JOB_RANK = \
# (RemoteOwner =?= UNDEFINED) * (KFlops - VirtualMachineID)
## The negotiator will not preempt a job running on a given machine
## unless the PREEMPTION_REQUIREMENTS expression evaluates to true
## and the owner of the idle job has a better priority than the owner
## of the running job. This expression defaults to true.
UWCS_PREEMPTION_REQUIREMENTS = $(StateTimer) > (1 * $(HOUR)) && RemoteUserPrio > SubmittorPrio * 1.2
## The PREEMPTION_RANK expression is used in a case where preemption
## is the only option and all other negotiation ranks are equal. For
## example, if the job has no preference, it is usually preferable to
## preempt a job with a small ImageSize instead of a job with a large
## ImageSize. The default is to rank all preemptable matches the
## same. However, the negotiator will always prefer to match the job
## with an idle machine over a preemptable machine, if all other
## negotiation ranks are equal.
UWCS_PREEMPTION_RANK = (RemoteUserPrio * 1000000) - TARGET.ImageSize
#####################################################################
## This is a Configuration that will cause your Condor jobs to
## always run. This is intended for testing only.
######################################################################
## This mode will cause your jobs to start on a machine an will let
## them run to completion. Condor will ignore all of what is going
## on in the machine (load average, keyboard activity, etc.)
TESTINGMODE_WANT_SUSPEND = False
TESTINGMODE_WANT_VACATE = False
TESTINGMODE_START = True
TESTINGMODE_SUSPEND = False
TESTINGMODE_CONTINUE = True
TESTINGMODE_PREEMPT = False
TESTINGMODE_KILL = False
TESTINGMODE_PERIODIC_CHECKPOINT = False
TESTINGMODE_PREEMPTION_REQUIREMENTS = False
TESTINGMODE_PREEMPTION_RANK = 0
######################################################################
######################################################################
##
## ###### #
## # # ## ##### ##### # #
## # # # # # # # # #
## ###### # # # # # # #
## # ###### ##### # #######
## # # # # # # #
## # # # # # # #
##
## Part 4: Settings you should probably leave alone:
## (unless you know what you're doing)
######################################################################
######################################################################
######################################################################
## Daemon-wide settings:
######################################################################
## Pathnames
LOG = $(LOCAL_DIR)/log
SPOOL = $(LOCAL_DIR)/spool
EXECUTE = $(LOCAL_DIR)/execute
BIN = $(RELEASE_DIR)/bin
LIB = $(RELEASE_DIR)/lib
INCLUDE = $(RELEASE_DIR)/include
SBIN = $(RELEASE_DIR)/sbin
LIBEXEC = $(RELEASE_DIR)/libexec
## If you leave HISTORY undefined (comment it out), no history file
## will be created.
HISTORY = $(SPOOL)/history
## Log files
COLLECTOR_LOG = $(LOG)/CollectorLog
KBDD_LOG = $(LOG)/KbdLog
MASTER_LOG = $(LOG)/MasterLog
NEGOTIATOR_LOG = $(LOG)/NegotiatorLog
NEGOTIATOR_MATCH_LOG = $(LOG)/MatchLog
SCHEDD_LOG = $(LOG)/SchedLog
SHADOW_LOG = $(LOG)/ShadowLog
STARTD_LOG = $(LOG)/StartLog
STARTER_LOG = $(LOG)/StarterLog
## Lock files
SHADOW_LOCK = $(LOCK)/ShadowLock
## This setting primarily allows you to change the port that the
## collector is listening on. By default, the collector uses port
## 9618, but you can set the port with a ":port", such as:
## COLLECTOR_HOST = $(CONDOR_HOST):1234
COLLECTOR_HOST = $(CONDOR_HOST)
## The NEGOTIATOR_HOST parameter has been deprecated. The port where
## the negotiator is listening is now dynamically allocated and the IP
## and port are now obtained from the collector, just like all the
## other daemons. However, if your pool contains any machines that
## are running version 6.7.3 or earlier, you can uncomment this
## setting to go back to the old fixed-port (9614) for the negotiator.
#NEGOTIATOR_HOST = $(CONDOR_HOST)
## How long are you willing to let daemons try their graceful
## shutdown methods before they do a hard shutdown? (30 minutes)
#SHUTDOWN_GRACEFUL_TIMEOUT = 1800
## How much disk space would you like reserved from Condor? In
## places where Condor is computing the free disk space on various
## partitions, it subtracts the amount it really finds by this
## many megabytes. (If undefined, defaults to 0).
RESERVED_DISK = 5
## If your machine is running AFS and the AFS cache lives on the same
## partition as the other Condor directories, and you want Condor to
## reserve the space that your AFS cache is configured to use, set
## this to true.
#RESERVE_AFS_CACHE = False
## By default, if a user does not specify "notify_user" in the submit
## description file, any email Condor sends about that job will go to
## "username@UID_DOMAIN". If your machines all share a common UID
## domain (so that you would set UID_DOMAIN to be the same across all
## machines in your pool), *BUT* email to user@UID_DOMAIN is *NOT*
## the right place for Condor to send email for your site, you can
## define the default domain to use for email. A common example
## would be to set EMAIL_DOMAIN to the fully qualified hostname of
## each machine in your pool, so users submitting jobs from a
## specific machine would get email sent to user@xxxxxxxxxxxxxxxxxxx,
## instead of user@xxxxxxxxxxxx In general, you should leave this
## setting commented out unless two things are true: 1) UID_DOMAIN is
## set to your domain, not $(FULL_HOSTNAME), and 2) email to
## user@UID_DOMAIN won't work.
#EMAIL_DOMAIN = $(FULL_HOSTNAME)
## If your site needs to use TCP updates to the collector, instead of
## UDP, you can enable this feature. HOWEVER, WE DO NOT RECOMMEND
## THIS FOR MOST SITES! In general, the only sites that might want
## this feature are pools made up of machines connected via a
## wide-area network where UDP packets are frequently or always
## dropped. If you enable this feature, you *MUST* turn on the
## COLLECTOR_SOCKET_CACHE_SIZE setting at your collector, and each
## entry in the socket cache uses another file descriptor. If not
## defined, this feature is disabled by default.
#UPDATE_COLLECTOR_WITH_TCP = True
## HIGHPORT and LOWPORT let you set the range of ports that Condor
## will use. This may be useful if you are behind a firewall. By
## default, Condor uses port 9618 for the collector, 9614 for the
## negotiator, and system-assigned (apparently random) ports for
## everything else. HIGHPORT and LOWPORT only affect these
## system-assigned ports, but will restrict them to the range you
## specify here. If you want to change the well-known ports for the
## collector or negotiator, see COLLECTOR_HOST or NEGOTIATOR_HOST.
## Note that both LOWPORT and HIGHPORT must be at least 1024.
#HIGHPORT = 9700
#LOWPORT = 9600
######################################################################
## Daemon-specific settings:
######################################################################
##--------------------------------------------------------------------
## condor_master
##--------------------------------------------------------------------
## Daemons you want the master to keep running for you:
DAEMON_LIST = MASTER, STARTD, SCHEDD
## Which daemons use the Condor DaemonCore library (i.e., not the
## checkpoint server or custom user daemons)?
## Note: Daemons in this list cannot use a static command port.
DC_DAEMON_LIST = \
MASTER, STARTD, SCHEDD, KBDD, COLLECTOR, NEGOTIATOR, EVENTD, \
VIEW_SERVER, CONDOR_VIEW, VIEW_COLLECTOR, HAWKEYE
## Where are the binaries for these daemons?
MASTER = $(SBIN)/condor_master
STARTD = $(SBIN)/condor_startd
SCHEDD = $(SBIN)/condor_schedd
KBDD = $(SBIN)/condor_kbdd
NEGOTIATOR = $(SBIN)/condor_negotiator
COLLECTOR = $(SBIN)/condor_collector
## When the master starts up, it can place it's address (IP and port)
## into a file. This way, tools running on the local machine don't
## need to query the central manager to find the master. This
## feature can be turned off by commenting out this setting.
MASTER_ADDRESS_FILE = $(LOG)/.master_address
## Where should the master find the condor_preen binary? If you don't
## want preen to run at all, just comment out this setting.
PREEN = $(SBIN)/condor_preen
## How do you want preen to behave? The "-m" means you want email
## about files preen finds that it thinks it should remove. The "-r"
## means you want preen to actually remove these files. If you don't
## want either of those things to happen, just remove the appropriate
## one from this setting.
PREEN_ARGS = -m -r
## How often should the master start up condor_preen? (once a day)
#PREEN_INTERVAL = 86400
## If a daemon dies an unnatural death, do you want email about it?
#PUBLISH_OBITUARIES = True
## If you're getting obituaries, how many lines of the end of that
## daemon's log file do you want included in the obituary?
#OBITUARY_LOG_LENGTH = 20
## Should the master run?
#START_MASTER = True
## Should the master start up the daemons you want it to?
#START_DAEMONS = True
## How often do you want the master to send an update to the central
## manager?
#MASTER_UPDATE_INTERVAL = 300
## How often do you want the master to check the timestamps of the
## daemons it's running? If any daemons have been modified, the
## master restarts them.
#MASTER_CHECK_NEW_EXEC_INTERVAL = 300
## Once you notice new binaries, how long should you wait before you
## try to execute them?
#MASTER_NEW_BINARY_DELAY = 120
## What's the maximum amount of time you're willing to give the
## daemons to quickly shutdown before you just kill them outright?
#SHUTDOWN_FAST_TIMEOUT = 120
######
## Exponential backoff settings:
######
## When a daemon keeps crashing, we use "exponential backoff" so we
## wait longer and longer before restarting it. This is the base of
## the exponent used to determine how long to wait before starting
## the daemon again:
#MASTER_BACKOFF_FACTOR = 2.0
## What's the maximum amount of time you want the master to wait
## between attempts to start a given daemon? (With 2.0 as the
## MASTER_BACKOFF_FACTOR, you'd hit 1 hour in 12 restarts...)
#MASTER_BACKOFF_CEILING = 3600
## How long should a daemon run without crashing before we consider
## it "recovered". Once a daemon has recovered, we reset the number
## of restarts so the exponential backoff stuff goes back to normal.
#MASTER_RECOVER_FACTOR = 300
##--------------------------------------------------------------------
## condor_startd
##--------------------------------------------------------------------
## Where are the various condor_starter binaries installed?
STARTER_LIST = STARTER, STARTER_PVM, STARTER_STANDARD
STARTER = $(SBIN)/condor_starter
STARTER_PVM = $(SBIN)/condor_starter.pvm
STARTER_STANDARD = $(SBIN)/condor_starter.std
## When the startd starts up, it can place it's address (IP and port)
## into a file. This way, tools running on the local machine don't
## need to query the central manager to find the startd. This
## feature can be turned off by commenting out this setting.
STARTD_ADDRESS_FILE = $(LOG)/.startd_address
## When a machine is claimed, how often should we poll the state of
## the machine to see if we need to evict/suspend the job, etc?
#POLLING_INTERVAL = 5
## How often should the startd send updates to the central manager?
#UPDATE_INTERVAL = 300
## How long is the startd willing to stay in the "matched" state?
#MATCH_TIMEOUT = 300
## How long is the startd willing to stay in the preempting/killing
## state before it just kills the starter directly?
#KILLING_TIMEOUT = 30
## When a machine unclaimed, when should it run benchmarks?
## LastBenchmark is initialized to 0, so this expression says as soon
## as we're unclaimed, run the benchmarks. Thereafter, if we're
## unclaimed and it's been at least 4 hours since we ran the last
## benchmarks, run them again. The startd keeps a weighted average
## of the benchmark results to provide more accurate values.
## Note, if you don't want any benchmarks run at all, either comment
## RunBenchmarks out, or set it to "False".
BenchmarkTimer = (CurrentTime - LastBenchmark)
RunBenchmarks : (LastBenchmark == 0 ) || ($(BenchmarkTimer) >= (4 * $(HOUR)))
#RunBenchmarks : False
## Normally, when the startd is computing the idle time of all the
## users of the machine (both local and remote), it checks the utmp
## file to find all the currently active ttys, and only checks access
## time of the devices associated with active logins. Unfortunately,
## on some systems, utmp is unreliable, and the startd might miss
## keyboard activity by doing this. So, if your utmp is unreliable,
## set this setting to True and the startd will check the access time
## on all tty and pty devices.
STARTD_HAS_BAD_UTMP = False
## This entry allows the startd to monitor console (keyboard and
## mouse) activity by checking the access times on special files in
## /dev. Activity on these files shows up as "ConsoleIdle" time in
## the startd's ClassAd. Just give a comma-separated list of the
## names of devices you want considered the console, without the
## "/dev/" portion of the pathname.
CONSOLE_DEVICES = mouse, console
## The STARTD_EXPRS entry allows you to have the startd advertise
## arbitrary expressions from the config file in its ClassAd. Give
## the comma-separated list of entries from the config file you want
## in the startd ClassAd.
## Note: because of the different syntax of the config file and
## ClassAds, you might have to do a little extra work to get a given
## entry into the ClassAd. In particular, ClassAds require "'s
## around your strings. Numeric values can go in directly, as can
## boolean expressions. For example, if you wanted the startd to
## advertise its list of console devices, when it's configured to run
## benchmarks, and how often it sends updates to the central manager,
## you'd have to define the following helper macro:
#MY_CONSOLE_DEVICES = "$(CONSOLE_DEVICES)"
## Note: this must come before you define STARTD_EXPRS because macros
## must be defined before you use them in other macros or
## expressions.
## Then, you'd set the STARTD_EXPRS setting to this:
#STARTD_EXPRS = MY_CONSOLE_DEVICES, RunBenchmarks, UPDATE_INTERVAL
##
## STARTD_EXRS and STARTD_ATTRS can be defined on a per-VM basis
## The startd builds the list of things to advertise by combining
## the the lists in this order: STARTD_EXPRS, VMx_STARTD_EXPRS,
## STARTD_ATTRS, VMx_STARTD_ATTRS. In the below example, the startd
## ad for VM1 will have the value for favorite_color, favorite_season,
## and favorite_movie, and VM2 will have favorite_color, favorite_season,
## and favorite_song
##
#STARTD_EXPRS = favorite_color, favorite_season
#VM1_STARTD_EXRS = favorite_movie
#VM2_STARTD_EXPRS = favorite_song
##
## Attributes themselves in the STARTD_EXPRS and STARTD_ATTRS list can
## also be on a per-VM basis. In the below example, the startd ads will be:
## VM1 - favorite_color = "blue"; favorite_season = "spring"
## VM2 - favorite_color = "green"; favorite_season = "spring"
## VM3 - favorite_color = "blue"; favorite_season = "summer"
##
#favorite_color = "blue"
#favorite_season = "spring"
#STARTD_EXPRS = favorite_color, favorite_season
#VM2_favorite_color = "green"
#VM3_favorite_season = "summer"
#
COLLECTOR_HOST_STRING = "$(COLLECTOR_HOST)"
STARTD_EXPRS = COLLECTOR_HOST_STRING
## When the startd is claimed by a remote user, it can also advertise
## arbitrary attributes from the ClassAd of the job its working on.
## Just list the attribute names you want advertised.
## Note: since this is already a ClassAd, you don't have to do
## anything funny with strings, etc. This feature can be turned off
## by commenting out this setting (there is no default).
STARTD_JOB_EXPRS = ImageSize, ExecutableSize, JobUniverse, NiceUser
## If you want to "lie" to Condor about how many CPUs your machine
## has, you can use this setting to override Condor's automatic
## computation. If you modify this, you must restart the startd for
## the change to take effect (a simple condor_reconfig will not do).
## Please read the section on "condor_startd Configuration File
## Macros" in the Condor Administrators Manual for a further
## discussion of this setting. Its use is not recommended. This
## must be an integer ("N" isn't a valid setting, that's just used to
## represent the default).
#NUM_CPUS = N
## Normally, Condor will automatically detect the amount of physical
## memory available on your machine. Define MEMORY to tell Condor
## how much physical memory (in MB) your machine has, overriding the
## value Condor computes automatically. For example:
#MEMORY = 128
## How much memory would you like reserved from Condor? By default,
## Condor considers all the physical memory of your machine as
## available to be used by Condor jobs. If RESERVED_MEMORY is
## defined, Condor subtracts it from the amount of memory it
## advertises as available.
#RESERVED_MEMORY = 0
######
## SMP startd settings
##
## By default, Condor will evenly divide the resources in an SMP
## machine (such as RAM, swap space and disk space) among all the
## CPUs, and advertise each CPU as its own "virtual machine" with an
## even share of the system resources. If you want something other
## than this, there are a few options available to you. Please read
## the section on "Configuring The Startd for SMP Machines" in the
## Condor Administrator's Manual for full details. The various
## settings are only briefly listed and described here.
######
## The maximum number of different virtual machine types.
#MAX_VIRTUAL_MACHINE_TYPES = 10
## Use this setting to define your own virtual machine types. This
## allows you to divide system resources unevenly among your CPUs.
## You must use a different setting for each different type you
## define. The "<N>" in the name of the macro listed below must be
## an integer from 1 to MAX_VIRTUAL_MACHINE_TYPES (defined above),
## and you use this number to refer to your type. There are many
## different formats these settings can take, so be sure to refer to
## the section on "Configuring The Startd for SMP Machines" in the
## Condor Administrator's Manual for full details. In particular,
## read the section titled "Defining Virtual Machine Types" to help
## understand this setting. If you modify any of these settings, you
## must restart the condor_start for the change to take effect.
#VIRTUAL_MACHINE_TYPE_<N> = 1/4
#VIRTUAL_MACHINE_TYPE_<N> = cpus=1, ram=25%, swap=1/4, disk=1/4
# For example:
#VIRTUAL_MACHINE_TYPE_1 = 1/8
#VIRTUAL_MACHINE_TYPE_2 = 1/4
## If you define your own virtual machine types, you must specify how
## many virtual machines of each type you wish to advertise. You do
## this with the setting below, replacing the "<N>" with the
## corresponding integer you used to define the type above. You can
## change the number of a given type being advertised at run-time,
## with a simple condor_reconfig.
#NUM_VIRTUAL_MACHINES_TYPE_<N> = M
# For example:
#NUM_VIRTUAL_MACHINES_TYPE_1 = 6
#NUM_VIRTUAL_MACHINES_TYPE_2 = 1
## The number of evenly-divided virtual machines you want Condor to
## report to your pool (if less than the total number of CPUs). This
## setting is only considered if the "type" settings described above
## are not in use. By default, all CPUs are reported. This setting
## must be an integer ("N" isn't a valid setting, that's just used to
## represent the default).
#NUM_VIRTUAL_MACHINES = N
## How many of the virtual machines the startd is representing should
## be "connected" to the console (in other words, notice when there's
## console activity)? This defaults to all virtual machines (N in a
## machine with N CPUs). This must be an integer ("N" isn't a valid
## setting, that's just used to represent the default).
#VIRTUAL_MACHINES_CONNECTED_TO_CONSOLE = N
## How many of the virtual machines the startd is representing should
## be "connected" to the keyboard (for remote tty activity, as well
## as console activity). Defaults to 1.
#VIRTUAL_MACHINES_CONNECTED_TO_KEYBOARD = 1
## If there are virtual machines that aren't connected to the
## keyboard or the console (see the above two settings), the
## corresponding idle time reported will be the time since the startd
## was spawned, plus the value of this parameter. It defaults to 20
## minutes. We do this because, if the virtual machine is configured
## not to care about keyboard activity, we want it to be available to
## Condor jobs as soon as the startd starts up, instead of having to
## wait for 15 minutes or more (which is the default time a machine
## must be idle before Condor will start a job). If you don't want
## this boost, just set the value to 0. If you change your START
## expression to require more than 15 minutes before a job starts,
## but you still want jobs to start right away on some of your SMP
## nodes, just increase this parameter.
#DISCONNECTED_KEYBOARD_IDLE_BOOST = 1200
######
## Settings for computing optional resource availability statistics:
######
## If STARTD_COMPUTE_AVAIL_STATS = True, the startd will compute
## statistics about resource availability to be included in the
## classad(s) sent to the collector describing the resource(s) the
## startd manages. The following attributes will always be included
## in the resource classad(s) if STARTD_COMPUTE_AVAIL_STATS = True:
## AvailTime = What proportion of the time (between 0.0 and 1.0)
## has this resource been in a state other than "Owner"?
## LastAvailInterval = What was the duration (in seconds) of the
## last period between "Owner" states?
## The following attributes will also be included if the resource is
## not in the "Owner" state:
## AvailSince = At what time did the resource last leave the
## "Owner" state? Measured in the number of seconds since the
## epoch (00:00:00 UTC, Jan 1, 1970).
## AvailTimeEstimate = Based on past history, this is an estimate
## of how long the current period between "Owner" states will
## last.
#STARTD_COMPUTE_AVAIL_STATS = False
## If STARTD_COMPUTE_AVAIL_STATS = True, STARTD_AVAIL_CONFIDENCE sets
## the confidence level of the AvailTimeEstimate. By default, the
## estimate is based on the 80th percentile of past values.
#STARTD_AVAIL_CONFIDENCE = 0.8
## STARTD_MAX_AVAIL_PERIOD_SAMPLES limits the number of samples of
## past available intervals stored by the startd to limit memory and
## disk consumption. Each sample requires 4 bytes of memory and
## approximately 10 bytes of disk space.
#STARTD_MAX_AVAIL_PERIOD_SAMPLES = 100
##--------------------------------------------------------------------
## condor_schedd
##--------------------------------------------------------------------
## Where are the various shadow binaries installed?
SHADOW_LIST = SHADOW, SHADOW_PVM, SHADOW_STANDARD
SHADOW = $(SBIN)/condor_shadow
SHADOW_PVM = $(SBIN)/condor_shadow.pvm
SHADOW_STANDARD = $(SBIN)/condor_shadow.std
## When the schedd starts up, it can place it's address (IP and port)
## into a file. This way, tools running on the local machine don't
## need to query the central manager to find the schedd. This
## feature can be turned off by commenting out this setting.
SCHEDD_ADDRESS_FILE = $(LOG)/.schedd_address
## How often should the schedd send an update to the central manager?
#SCHEDD_INTERVAL = 300
## How long should the schedd wait between spawning each shadow?
#JOB_START_DELAY = 2
## How often should the schedd send a keep alive message to any
## startds it has claimed? (5 minutes)
#ALIVE_INTERVAL = 300
## This setting controls the maximum number of times that a
## condor_shadow processes can have a fatal error (exception) before
## the condor_schedd will simply relinquish the match associated with
## the dying shadow.
#MAX_SHADOW_EXCEPTIONS = 5
## Estimated virtual memory size of each condor_shadow process.
## Specified in kilobytes.
SHADOW_SIZE_ESTIMATE = 1800
## The condor_schedd can renice the condor_shadow processes on your
## submit machines. How how "nice" do you want the shadows? (1-19).
## The higher the number, the lower priority the shadows have.
## This feature can be disabled entirely by commenting it out.
SHADOW_RENICE_INCREMENT = 10
## By default, when the schedd fails to start an idle job, it will
## not try to start any other idle jobs in the same cluster during
## that negotiation cycle. This makes negotiation much more
## efficient for large job clusters. However, in some cases other
## jobs in the cluster can be started even though an earlier job
## can't. For example, the jobs' requirements may differ, because of
## different disk space, memory, or operating system requirements.
## Or, machines may be willing to run only some jobs in the cluster,
## because their requirements reference the jobs' virtual memory size
## or other attribute. Setting NEGOTIATE_ALL_JOBS_IN_CLUSTER to True
## will force the schedd to try to start all idle jobs in each
## negotiation cycle. This will make negotiation cycles last longer,
## but it will ensure that all jobs that can be started will be
## started.
#NEGOTIATE_ALL_JOBS_IN_CLUSTER = False
## This setting controls how often, in seconds, the schedd considers
## periodic job actions given by the user in the submit file.
## (Currently, these are periodic_hold, periodic_release, and periodic_remove.)
PERIODIC_EXPR_INTERVAL = 60
######
## Queue management settings:
######
## How often should the schedd truncate it's job queue transaction
## log? (Specified in seconds, once a day is the default.)
#QUEUE_CLEAN_INTERVAL = 86400
## How often should the schedd commit "wall clock" run time for jobs
## to the queue, so run time statistics remain accurate when the
## schedd crashes? (Specified in seconds, once per hour is the
## default. Set to 0 to disable.)
#WALL_CLOCK_CKPT_INTERVAL = 3600
## What users do you want to grant super user access to this job
## queue? (These users will be able to remove other user's jobs).
## By default, this only includes root.
QUEUE_SUPER_USERS = root, condor
##--------------------------------------------------------------------
## condor_shadow
##--------------------------------------------------------------------
## If the shadow is unable to read a checkpoint file from the
## checkpoint server, it keeps trying only if the job has accumulated
## more than MAX_DISCARDED_RUN_TIME seconds of CPU usage. Otherwise,
## the job is started from scratch. Defaults to 1 hour. This
## setting is only used if USE_CKPT_SERVER (from above) is True.
#MAX_DISCARDED_RUN_TIME = 3600
## Should periodic checkpoints be compressed?
#COMPRESS_PERIODIC_CKPT = False
## Should vacate checkpoints be compressed?
#COMPRESS_VACATE_CKPT = False
## Should we commit the application's dirty memory pages to swap
## space during a periodic checkpoint?
#PERIODIC_MEMORY_SYNC = False
## Should we write vacate checkpoints slowly? If nonzero, this
## parameter specifies the speed at which vacate checkpoints should
## be written, in kilobytes per second.
#SLOW_CKPT_SPEED = 0
##--------------------------------------------------------------------
## condor_shadow.pvm
##--------------------------------------------------------------------
## Where is the condor pvm daemon installed?
PVMD = $(SBIN)/condor_pvmd
## Where is the condor pvm group server daemon installed?
PVMGS = $(SBIN)/condor_pvmgs
##--------------------------------------------------------------------
## condor_starter
##--------------------------------------------------------------------
## The condor_starter can renice the processes from remote Condor
## jobs on your execute machines. If you want this, uncomment the
## following entry and set it to how "nice" do you want the user
## jobs. (1-19) The larger the number, the lower priority the
## process gets on your machines.
#JOB_RENICE_INCREMENT = 10
## Should the starter do local logging to its own log file, or send
## debug information back to the condor_shadow where it will end up
## in the ShadowLog?
#STARTER_LOCAL_LOGGING = TRUE
## If the UID_DOMAIN settings match on both the execute and submit
## machines, but the UID of the user who submitted the job isn't in
## the passwd file of the execute machine, the starter will normally
## exit with an error. Do you want the starter to just start up the
## job with the specified UID, even if it's not in the passwd file?
SOFT_UID_DOMAIN = FALSE
##--------------------------------------------------------------------
## condor_submit
##--------------------------------------------------------------------
## If you want condor_submit to automatically append an expression to
## the Requirements expression or Rank expression of jobs at your
## site, uncomment these entries.
#APPEND_REQUIREMENTS = (expression to append job requirements)
#APPEND_RANK = (expression to append job rank)
## If you want expressions only appended for either standard or
## vanilla universe jobs, you can uncomment these entries. If any of
## them are defined, they are used for the given universe, instead of
## the generic entries above.
#APPEND_REQ_VANILLA = (expression to append to vanilla job requirements)
#APPEND_REQ_STANDARD = (expression to append to standard job requirements)
#APPEND_RANK_STANDARD = (expression to append to vanilla job rank)
#APPEND_RANK_VANILLA = (expression to append to standard job rank)
## This can be used to define a default value for the rank expression
## if one is not specified in the submit file.
#DEFAULT_RANK = (default rank expression for all jobs)
## If you want universe-specific defaults, you can use the following
## entries:
#DEFAULT_RANK_VANILLA = (default rank expression for vanilla jobs)
#DEFAULT_RANK_STANDARD = (default rank expression for standard jobs)
## If you want condor_submit to automatically append expressions to
## the job ClassAds it creates, you can uncomment and define the
## SUBMIT_EXPRS setting. It works just like the STARTD_EXPRS
## described above with respect to ClassAd vs. config file syntax,
## strings, etc. One common use would be to have the full hostname
## of the machine where a job was submitted placed in the job
## ClassAd. You would do this by uncommenting the following lines:
#MACHINE = "$(FULL_HOSTNAME)"
#SUBMIT_EXPRS = MACHINE
## Condor keeps a buffer of recently-used data for each file an
## application opens. This macro specifies the default maximum number
## of bytes to be buffered for each open file at the executing
## machine.
#DEFAULT_IO_BUFFER_SIZE = 524288
## Condor will attempt to consolidate small read and write operations
## into large blocks. This macro specifies the default block size
## Condor will use.
#DEFAULT_IO_BUFFER_BLOCK_SIZE = 32768
##--------------------------------------------------------------------
## condor_preen
##--------------------------------------------------------------------
## Who should condor_preen send email to?
#PREEN_ADMIN = $(CONDOR_ADMIN)
## What files should condor_preen leave in the spool directory?
VALID_SPOOL_FILES = job_queue.log, job_queue.log.tmp, history, \
Accountant.log, Accountantnew.log, \
local_univ_execute
## What files should condor_preen remove from the log directory?
INVALID_LOG_FILES = core
##--------------------------------------------------------------------
## Java parameters:
##--------------------------------------------------------------------
## If you would like this machine to be able to run Java jobs,
## then set JAVA to the path of your JVM binary. If you are not
## interested in Java, there is no harm in leaving this entry
## empty or incorrect.
JAVA = /opt/osg/osg_0.2.0/jdk1.4/bin/java
## Some JVMs need to be told the maximum amount of heap memory
## to offer to the process. If your JVM supports this, give
## the argument here, and Condor will fill in the memory amount.
## If left blank, your JVM will choose some default value,
## typically 64 MB. The default (-Xmx) works with the Sun JVM.
JAVA_MAXHEAP_ARGUMENT = -Xmx
## JAVA_CLASSPATH_DEFAULT gives the default set of paths in which
## Java classes are to be found. Each path is separated by spaces.
## If your JVM needs to be informed of additional directories, add
## them here. However, do not remove the existing entries, as Condor
## needs them.
JAVA_CLASSPATH_DEFAULT = $(LIB) $(LIB)/scimark2lib.jar .
## JAVA_CLASSPATH_ARGUMENT describes the command-line parameter
## used to introduce a new classpath:
JAVA_CLASSPATH_ARGUMENT = -classpath
## JAVA_CLASSPATH_SEPARATOR describes the character used to mark
## one path element from another:
JAVA_CLASSPATH_SEPARATOR = :
## JAVA_BENCHMARK_TIME describes the number of seconds for which
## to run Java benchmarks. A longer time yields a more accurate
## benchmark, but consumes more otherwise useful CPU time.
## If this time is zero or undefined, no Java benchmarks will be run.
JAVA_BENCHMARK_TIME = 2
## If your JVM requires any special arguments not mentioned in
## the options above, then give them here.
JAVA_EXTRA_ARGUMENTS =
##
##--------------------------------------------------------------------
## Condor-G settings
##--------------------------------------------------------------------
## Where is the GridManager binary installed?
GRIDMANAGER = $(SBIN)/condor_gridmanager
GAHP = $(SBIN)/gahp_server
GRID_MONITOR = $(SBIN)/grid_monitor.sh
##--------------------------------------------------------------------
## Settings that control the daemon's debugging output:
##--------------------------------------------------------------------
##
## Note that the Gridmanager runs as the User, not a Condor daemon, so
## all users must have write permssion to the directory that the
## Gridmanager will use for it's logfile. Our suggestion is to create a
## directory called GridLogs in $(LOG) with UNIX permissions 1777
## (just like /tmp )
## Another option is to use /tmp as the location of the GridManager log.
##
MAX_GRIDMANAGER_LOG = 1000000
GRIDMANAGER_DEBUG = D_COMMAND
#GRIDMANAGER_LOG = $(LOG)/GridLogs/GridmanagerLog.$(USERNAME)
GRIDMANAGER_LOG = /tmp/GridmanagerLog.$(USERNAME)
##--------------------------------------------------------------------
## Various other settings that the Condor-G can use.
##--------------------------------------------------------------------
## If we're talking to a Globus 2.0 resource, Condor-G will use the new
## version of the GRAM protocol. The first option is how often to check the
## proxy on the submit site of things. If the GridManager discovers a new
## proxy, it will restart itself and use the new proxy for all future
## jobs launched. In seconds, and defaults to 10 minutes
GRIDMANAGER_CHECKPROXY_INTERVAL = 6000
## The GridManager will shut things down 3 minutes before loosing Contact
## because of an expired proxy.
## In seconds, and defaults to 3 minutes
#GRDIMANAGER_MINIMUM_PROXY_TIME = 180
## Condor requires that each submitted job be designated to run under a
## particular "universe". Condor-G is active when jobs are as marked as
## "GLOBUS" universe jobs. The universe of a job is set in the submit file
## with the 'universe = GLOBUS' line.
##
## If no universe is specificed in the submit file, Condor must pick one
## for the job to use. By default, it chooses the "standard" universe.
## The default can be overridden in the config file with the DEFAULT_UNIVERSE
## setting, which is a string to insert into a job submit description if the
## job does not try and define it's own universe
##
#DEFAULT_UNIVERSE = globus
#
# The Cred_min_time_left is the first-pass at making sure that Condor-G
# does not submit your job without it having enough time left for the
# job to finish. For example, if you have a job that runs for 20 minutes, and
# you might spend 40 minutes in the queue, it's a bad idea to submit with less
# than an hour left before your proxy expires.
# 2 hours seemed like a reasonable default.
#
CRED_MIN_TIME_LEFT = 120
##
## The location of the wrapper for invoking
## GT3 GAHP server
##
GT3_GAHP = $(SBIN)/gt3_gahp
##
## The location of GT3 files. This should normally be lib/gt3
##
GT3_LOCATION = $(LIB)/gt3
##
## The location of the wrapper for invoking
## GT4 GAHP server
##
GT4_GAHP = $(SBIN)/gt4_gahp
##
## The location of GT4 files. This should normally be lib/gt4
##
GT4_LOCATION = $(LIB)/gt4
##
## gt4-gahp requires gridftp server. This should be the address of gridftp
## server to use
##
GRIDFTP_URL_BASE = gsiftp://$(FULL_HOSTNAME)
## Condor-G and CredD can use MyProxy to refresh GSI proxies which are
## about to expire.
#MYPROXY_GET_DELEGATION = /path/to/myproxy-get-delegation
##
##--------------------------------------------------------------------
## condor_credd credential managment daemon
##--------------------------------------------------------------------
## Where is the CredD binary installed?
CREDD = $(SBIN)/condor_credd
## When the credd starts up, it can place it's address (IP and port)
## into a file. This way, tools running on the local machine don't
## need an additional "-n host:port" command line option. This
## feature can be turned off by commenting out this setting.
CREDD_ADDRESS_FILE = $(LOG)/.credd_address
## Specify a remote credd server here,
#CREDD_HOST = $(CONDOR_HOST):$(CREDD_PORT)
## CredD startup arguments
## Start the CredD on a well-known port. Uncomment to to simplify
## connecting to a remote CredD. Note: that this interface may change
## in a future release.
CREDD_PORT = 9620
CREDD_ARGS = -p $(CREDD_PORT) -f
## CredD daemon debugging log
CREDD_LOG = $(LOG)/CredLog
CREDD_DEBUG = D_FULLDEBUG
MAX_CREDD_LOG = 4000000
## The credential owner submits the credential. This list specififies
## other user who are also permitted to see all credentials. Defaults
## to root on Unix systems, and Administrator on Windows systems.
#CRED_SUPER_USERS =
## Credential storage location. This directory must exist
## prior to starting condor_credd. It is highly recommended to
## restrict access permissions to _only_ the directory owner.
CRED_STORE_DIR = $(LOCAL_DIR)/cred_dir
## Index file path of saved credentials.
## This file will be automatically created if it does not exist.
#CRED_INDEX_FILE = $(CRED_STORE_DIR/cred-index
## condor_credd will attempt to refresh credentials when their
## remaining lifespan is less than this value. Units = seconds.
#DEFAULT_CRED_EXPIRE_THRESHOLD = 3600
## condor-credd periodically checks remaining lifespan of stored
## credentials, at this interval.
#CRED_CHECK_INTERVAL = 60
##
##--------------------------------------------------------------------
## Stork data placment server
##--------------------------------------------------------------------
## Where is the Stork binary installed?
STORK = $(SBIN)/stork_server
## When Stork starts up, it can place it's address (IP and port)
## into a file. This way, tools running on the local machine don't
## need an additional "-n host:port" command line option. This
## feature can be turned off by commenting out this setting.
STORK_ADDRESS_FILE = $(LOG)/.stork_address
## Specify a remote Stork server here,
#STORK_HOST = $(CONDOR_HOST):$(STORK_PORT)
## STORK_LOG_BASE specifies the basename for heritage Stork log files.
## Stork uses this macro to create the following output log files:
## $(STORK_LOG_BASE): Stork server job queue classad collection
## journal file.
## $(STORK_LOG_BASE).history: Used to track completed jobs.
## $(STORK_LOG_BASE).user_log: User level log, also used by DAGMan.
STORK_LOG_BASE = $(LOG)/Stork
## Modern Condor DaemonCore logging feature.
STORK_LOG = $(LOG)/StorkLog
STORK_DEBUG = D_FULLDEBUG
MAX_STORK_LOG = 4000000
## Stork startup arguments
## Start Stork on a well-known port. Uncomment to to simplify
## connecting to a remote Stork. Note: that this interface may change
## in a future release.
#STORK_PORT = 34048
STORK_PORT = 9621
STORK_ARGS = -p $(STORK_PORT) -f -Serverlog $(STORK_LOG_BASE)
## Stork environment. Stork modules may require external programs and
## shared object libraries. These are located using the PATH and
## LD_LIBRARY_PATH environments. Further, some modules may require
## further specific environments. By default, Stork inherits a full
## environment when invoked from condor_master or the shell. If the
## default environment is not adequate for all Stork modules, specify
## a replacement environment here. This environment will be set by
## condor_master before starting Stork, but does not apply if Stork is
## started directly from the command line.
#STORK_ENVIRONMENT = TMP=/tmp;CONDOR_CONFIG=/special/config;PATH=/lib
## Limits the number of concurrent data placements handled by Stork.
#STORK_MAX_NUM_JOBS = 5
## Limits the number of retries for a failed data placement.
#STORK_MAX_RETRY = 5
## Limits the run time for a data placement job, after which the
## placement is considered failed.
#STORK_MAXDELAY_INMINUTES = 10
## Temporary credential storage directory used by Stork.
#STORK_TMP_CRED_DIR = /tmp
## Directory containing Stork modules.
#STORK_MODULE_DIR = $(LIBEXEC)
#TRUST_UID_DOMAIN = TRUE