Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[HTCondor-users] shim_dmtcp problems with command arguments.
- Date: Mon, 11 Feb 2013 15:29:45 -0600
- From: Russell Poyner <rpoyner@xxxxxxxxxxxxx>
- Subject: [HTCondor-users] shim_dmtcp problems with command arguments.
We have a small condor in Electrical Engineering here at UW Madison. The
users want to be able to use checkpointing, and I had high hopes for
dmtcp, but command arguments aren't working.
For testing I have a submit file that just runs uname under dmtcp. When
the command is /sbin/uname the job returns .err and .out files with the
expected content.
When the command is /sbin/uname -a no such files are created. The uname
-a version of the submit file is below.
We have condor 7.4.4, and shim_dmtcp version 0.4
Thanks
Russ Poyner
universe = vanilla
executable = /mnt/condor/bin/shim_dmtcp
###############################################################################
# Argument Meaning
#------------------
# --log log file name for actions in shim_dmtcp script, if n/a use /dev/null
# --stdin stdin file, if n/a use /dev/null
# --stdout stdout file, if n/a use /dev/null
# --stderr stderr file, if n/a use /dev/null
# --ckptint checkpointing interval in seconds
# 1 the executable name you should have transferred in
# 2+ arguments to the executable
###############################################################################
arguments = --log shim_dmtcp.$(CLUSTER).$(PROCESS).log --stdout \
uname_job.$(CLUSTER).$(PROCESS).out --stderr
uname_job.$(CLUSTER).$(PROCESS).err \
--ckptint 1800 /bin/uname -a
requirements = (Machine == "<ahost>.ece.wisc.edu")
###############################################################################
# Enable file transfer. Here is where you ���mixin��� the user���s input and
# output fles along with what is needed for DMTCP. Don���t forget to
transfer
# the actual executable along.
###############################################################################
should_transfer_files = YES
when_to_transfer_output = ON_EXIT_OR_EVICT
transfer_input_files = /usr/bin/dmtcp_checkpoint, \
/usr/bin/dmtcp_command, \
/usr/bin/dmtcp_coordinator, \
/usr/bin/dmtcp_restart, \
/usr/lib/dmtcp/dmtcphijack.so, \
/mnt/condor_pharm/dmtcp-1.2.6/mtcp/libmtcp.so, \
/usr/lib/libmtcp.so.1, \
/mnt/condor_pharm/dmtcp-1.2.6/mtcp/mtcp_restart
###############################################################################
# Set up various environment variables. If you need to specify more, mix
them
# in here.
###############################################################################
environment=DMTCP_TMPDIR=./;JALIB_STDERR_PATH=/dev/null; \
PATH=/bin:/usr/bin:/mnt/condor/bin:$(PATH); \
DMTCP_PREFIX_ID=$(CLUSTER)_$(PROCESS); \
DMTCP_BIN=/usr/bin/; \
DMTCP_LIB=/mnt/condor_pharm/dmtcp-1.2.6/mtcp/;
###############################################################################
# SIGINT is our soft checkpointing signal
###############################################################################
kill_sig = 2
###############################################################################
# Output and log files for the shim process which performs the work.
###############################################################################
output = shim.$(CLUSTER).$(PROCESS).out
error = shim.$(CLUSTER).$(PROCESS).err
log = shim.$(CLUSTER).$(PROCESS).log
Notification = Never
queue 1