[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Args not found error



Ryan,

I would not have expected a 6.7.18 schedd to cause this problem, so there may be a bug. Unfortunately, I am just now leaving to fly off on a week's vacation, so I haven't been able to verify if this is the case.

You should not have this problem with all daemons being the same version. Sorry I can't offer a better solution at the moment.

--Dan

On Apr 6, 2006, at 5:59 PM, Ryan Garver wrote:

In fact I submitted from a 6.7.14 schedd to a 6.7.14 startd.  This did
originate however from a Condor-C schedd running 6.7.18. The job file I
used is:

universe = grid
executable = pi-compute
arguments= 5000000
output = out.$(Process)
log = log.$(Process)

# Condor
grid_resource = $$(Resource)
queue 1

Dan Bradley wrote:
Ryan,

There was a bug in 6.7.15 through 6.7.17 that caused an incompatibility
problem when submitting jobs from these versions to an older starter
(in your case 6.7.14).  Am I guessing correctly that you submitted the
job from a 6.7.15-6.7.17 schedd?

Upgrading to 6.7.18 should solve the problem.  Another possible
workaround is to always use the "old style" arguments syntax in your
submit file (no quoting) and if you have no arguments at all, to
explicitly set arguments to an empty value in your submit file.
Example:

arguments=

--Dan

On Apr 5, 2006, at 6:01 PM, Ryan Garver wrote:


I'm getting a weird error when I submit a job.  The program runs fine
from a local console; however, when run through condor (in the vanilla
universe) I get a strange error:

4/5 15:54:08 ******************************************************
4/5 15:54:08 ** condor_starter (CONDOR_STARTER) STARTING UP
4/5 15:54:08 ** /home/condor/6.7.14/sbin/condor_starter
4/5 15:54:08 ** $CondorVersion: 6.7.14 Dec 13 2005 $
4/5 15:54:08 ** $CondorPlatform: I386-LINUX_RH9 $
4/5 15:54:08 ** PID = 28678
4/5 15:54:08 ******************************************************
4/5 15:54:08 Using config file: /home/condor/condor_config
4/5 15:54:08 Using local config files:
/home/condor/hosts/sei/condor_config.local
4/5 15:54:08 DaemonCore: Command Socket at <128.111.45.22:45276>
4/5 15:54:08 Done setting resource limits
4/5 15:54:08 Communicating with shadow <128.111.45.35:51873>
4/5 15:54:08 Submitting machine is "pompone.cs.ucsb.edu"
4/5 15:54:08 Starting a VANILLA universe job with ID: 1956.0
4/5 15:54:08 Args not found in JobAd.  Aborting OsProc::StartJob.
4/5 15:54:08 Failed to start job, exiting
4/5 15:54:08 ShutdownFast all jobs.
4/5 15:54:08 **** condor_starter (condor_STARTER) EXITING WITH STATUS 0

This is funny because I do have an Arguments value set in my JobAd,
and the binary that ends up in the spool directory runs as expected:

$ condor_q -long
-- Submitter: pompone.cs.ucsb.edu : <128.111.45.35:34041> :
pompone.cs.ucsb.edu
MyType = "Job"
TargetType = "Machine"
GlobalJobId = "pompone.cs.ucsb.edu#1144276112#1956.0"
RootDir = "/"
MinHosts = 1
WantRemoteSyscalls = FALSE
WantCheckpoint = FALSE
RemoteSpoolDir =
"/tmp/home/rgarver/dynamic_condor/localcondor/conf.noir/spool/
cluster2.proc0.subproc0"
JobPrio = 0
NiceUser = FALSE
WantRemoteIO = TRUE
CoreSize = 0
KillSig = "SIGTERM"
Rank = 0.000000
In = "/dev/null"
TransferIn = FALSE
Out = "out.0"
StreamOut = FALSE
Err = "/dev/null"
TransferErr = FALSE
BufferSize = 524288
BufferBlockSize = 32768
ShouldTransferFiles = "NO"
TransferFiles = "NEVER"
ImageSize = 12
ExecutableSize = 12
DiskUsage = 12
Requirements = TRUE
GlobusResubmit = FALSE
GlobusStatus = 32
NumGlobusSubmits = 0
JobUniverse = 5
QDate = 1144276072
CompletionDate = 0
LocalUserCpu = 0.000000
LocalSysCpu = 0.000000
RemoteUserCpu = 0.000000
RemoteSysCpu = 0.000000
ExitStatus = 0
NumCkpts = 0
NumRestarts = 0
NumSystemHolds = 0
CommittedTime = 0
TotalSuspensions = 0
CumulativeSuspensionTime = 0
ExitBySignal = FALSE
JobNotification = 0
LeaveJobInQueue = JobStatus == 4
User = "rgarver@xxxxxxxxxxx"
Owner = "rgarver"
PeriodicRemove = (StageInFinish > 0) =!= TRUE && CurrentTime > QDate +
28800
SubmitterId = "rgarver@xxxxxxxxxxxxxxxxxxx"
Arguments = "5000000"
Environment = ""
ClusterId = 1956
ProcId = 0
StageInStart = 1144276132
SUBMIT_Iwd =
"/tmp/home/rgarver/condor_install/daisy/conf.pompone/spool/
cluster2.proc0.subproc0"
Iwd = "/home/condor/hosts/pompone/spool/cluster1956.proc0.subproc0"
SUBMIT_Cmd =
"/tmp/home/rgarver/condor_install/daisy/conf.pompone/spool/
cluster2.proc0.subproc0/pi-compute"
Cmd =
"/home/condor/hosts/pompone/spool/cluster1956.proc0.subproc0/pi-
compute"
StageInFinish = 1144276133
ReleaseReason = "Data files spooled"
LastHoldReason = "Spooling input data files"
JobStartDate = 1144276138
PeriodicHold = FALSE
PeriodicRelease = FALSE
OnExitHold = FALSE
OnExitRemove = TRUE
WantMatchDiagnostics = TRUE
LastMatchTime = 1144277635
NumJobMatches = 7
OrigMaxHosts = 1
LastJobLeaseRenewal = 1144277648
JobLastStartDate = 1144277645
JobCurrentStartDate = 1144277648
JobRunCount = 30
RemoteWallClockTime = 14.000000
LastRemoteHost = "sei.cs.ucsb.edu"
LastClaimId = "<128.111.45.22:34762>#1140125522#550"
CurrentHosts = 0
JobStatus = 1
EnteredCurrentStatus = 1144277648
LastSuspensionTime = 0
MaxHosts = 1
ServerTime = 1144277881

Any suggestions?

--
Ryan Garver
<rgarver@xxxxxxxxxxx>

_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-users


_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-users



--
Ryan Garver
<rgarver@xxxxxxxxxxx>

_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-users