Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] Condor expects files to spool, even if I tell it not to?
- Date: Wed, 22 Feb 2006 18:34:36 -0800
- From: Adam Lathers <alathers@xxxxxxxxxxxxxx>
- Subject: [Condor-users] Condor expects files to spool, even if I tell it not to?
Hi all,
As I baby step along my process here, I find that I've now managed
to set up my pool get jobs running within the pool, and made several
people happy. Now, the next step I need to follow. I need to be
able to submit a job, to my pool, from a remote host, using GSI
authentication. This works (Note previous e-mail where I bungled
around with GRIDMAP macro).
Any help on my next step would really be SUPER appreciated. Thanx
in advance. Now, onto the issues I'm having. Feel free to point out
dumb mistakes as well....like I said, I'm still learning here. ;)>
Now, my problem. When submitting from the remote host I issue the
following command as a test.
condor_submit -verbose -pool schedd-host -r schedd-host hostname.submit
hostname.submit looks like. (I know the requirements are a bit odd,
but it's to resolve some issues I've had with matlab core dumping
when run on i686 hosts via condor, and to emulate a job run that
handles data delivery internally)
Universe = vanilla
Executable = /bin/hostname
Error = hostname.err
Log = hostname.log
GetEnv = False
Arguments = -f
Notification = Error
should_transfer_files = IF_NEEDED
transfer_executable = False
copy_to_spool = False
when_to_transfer_output = ON_EXIT
Requirements = (FileSystemDomain =!= "") && (Arch =!= "IA64") &&
(Memory >= ImageSize) && ((OpSys == "LINUX") || (Op
Sys == "SOLARIS29") || (OpSys == "SOLARIS5.10") ) && (Arch =!= "INTEL")
remote_universe = vanilla
+remote_ShouldTransferFiles = IN_NEEDED
+remote_TransferExecutable = False
+remote_WhenToTransferFiles = ON_EXIT
+remote_requirements = '(FileSystemDomain =!= "") && (Arch =!=
"IA64") && (Memory >= ImageSize) && ((OpSys == "LINUX")
|| (OpSys == "SOLARIS29") || (OpSys == "SOLARIS5.10") ) && (Arch =!=
"INTEL")'
+remote_copytospool = False
Queue
when I try this job run, I get the following in the various log files
on the schedd host I'm trying to submit to:
==> /opt/condor/local.divot/log/SchedLog <==
2/22 18:19:51 (pid:7249) DaemonCore: Command received via TCP from
host <IP_ADDR:9677>
2/22 18:19:51 (pid:7249) DaemonCore: received command 488
(SPOOL_JOB_FILES_WITH_PERMS), calling handler (spoolJobFiles)
2/22 18:19:51 (pid:7735) Scheduler::spoolJobFilesWorkerThread(void
*arg, Stream* s) NAP TIME
2/22 18:19:51 (pid:7249) DaemonCore: Command received via UDP from
host < IP_ADDR:9637>
2/22 18:19:51 (pid:7249) DaemonCore: received command 421
(RESCHEDULE), calling handler (reschedule_negotiator)
2/22 18:19:51 (pid:7249) Sent ad to central manager for
alathers@schedd-host
2/22 18:19:51 (pid:7249) Sent ad to 1 collectors for alathers@schedd-
host
2/22 18:19:51 (pid:7249) Called reschedule_negotiator()
2/22 18:19:52 (pid:7249) Job 2722.0 released from hold: Data files
spooled
2/22 18:19:52 (pid:7249) Called reschedule_negotiator()
2/22 18:19:56 (pid:7249) Sent ad to central manager for
alathers@schedd-host
2/22 18:19:56 (pid:7249) Sent ad to 1 collectors for alathers@schedd-
host
2/22 18:19:59 (pid:7249) Starting add_shadow_birthdate(2722.0)
2/22 18:19:59 (pid:7249) Started shadow for job 2722.0 on "<IP_ADDR:
9652>", (shadow pid = 7737)
2/22 18:20:00 (pid:7249) Shadow pid 7737 for job 2722.0 exited with
status 4
2/22 18:20:00 (pid:7249) ERROR: Shadow exited with job exception code!
2/22 18:20:01 (pid:7249) Sent ad to central manager for
alathers@schedd-host
2/22 18:20:01 (pid:7249) Sent ad to 1 collectors for alathers@schedd-
host
2/22 18:20:02 (pid:7249) Starting add_shadow_birthdate(2722.0)
2/22 18:20:02 (pid:7249) Started shadow for job 2722.0 on "<IP_ADDR:
9652>", (shadow pid = 7738)
==> /opt/condor/local.divot/log/ShadowLog <==
2/22 18:19:59 ******************************************************
2/22 18:19:59 ** condor_shadow (CONDOR_SHADOW) STARTING UPschedd-host
2/22 18:19:59 ** /export/condor-6.7.13/sbin/condor_shadow
2/22 18:19:59 ** $CondorVersion: 6.7.13 Nov 7 2005 $
2/22 18:19:59 ** $CondorPlatform: I386-LINUX_RH9 $
2/22 18:19:59 ** PID = 7737
2/22 18:19:59 ******************************************************
2/22 18:19:59 Using config file: /export/condor/etc/condor_config
2/22 18:19:59 Using local config files: /export/condor-6.7.13/
local.divot/condor_config.local
2/22 18:19:59 DaemonCore: Command Socket at <IP_ADDR:46242>
2/22 18:19:59 Initializing a VANILLA shadow for job 2722.0
2/22 18:19:59 (2722.0) (7737): Request to run on <IP_ADDR:9652> was
ACCEPTED
2/22 18:20:00 (2722.0) (7737): ERROR "Error from starter on
vm1@workernode: Failed to execute '/export/condor/local.divot/spool/
cluster2722.proc0.subproc0/hostname condor_exec.exe -f': No such file
or directory" at line 597 in file pseudo_ops.C
_______________________________________________________
Adam Lathers
NCMIR: National Center for Microscopy and Imaging Research
Distributed Systems Engineer
phone: (858) 534-7968
web: http://ncmir.ucsd.edu