Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] ProcAPI sanity failure, age = -98161996
- Date: Mon, 13 Mar 2006 16:45:05 -0500
- From: "Matthew Galati" <Matthew.Galati@xxxxxxx>
- Subject: Re: [Condor-users] ProcAPI sanity failure, age = -98161996
This problem seems to have been related to job privileges.
I followed ("Job Privileges):
http://condor.optena.com/display/CONDOR/Common+Windows+Problems
And made it so condor runs as me (a privileged user) on each machine. My "hello-world" seems to work fine now.
Thanks!
Matt
> -----Original Message-----
> From: condor-users-bounces@xxxxxxxxxxx
> [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Matthew Galati
> Sent: Monday, March 13, 2006 12:56 PM
> To: Condor-Users Mail List
> Subject: Re: [Condor-users] ProcAPI sanity failure, age = -98161996
>
> Here is the corresponding StarterLog.vm2 on ORCLUS01.na.sas.com.
>
> Thanks,
> Matt
>
>
> 3/13 10:57:06 ******************************************************
> 3/13 10:57:06 ** condor_starter (CONDOR_STARTER) STARTING UP
> 3/13 10:57:06 ** C:\condor\bin\condor_starter.exe
> 3/13 10:57:06 ** $CondorVersion: 6.7.17 Feb 18 2006 $
> 3/13 10:57:06 ** $CondorPlatform: INTEL-WINNT50 $
> 3/13 10:57:06 ** PID = 3660
> 3/13 10:57:06 ******************************************************
> 3/13 10:57:06 Using config file: C:\condor\condor_config
> 3/13 10:57:06 Using local config files: C:\condor/condor_config.local
> 3/13 10:57:06 DaemonCore: Command Socket at <10.40.12.183:4696>
> 3/13 10:57:06 SEC_DEFAULT_SESSION_DURATION is undefined,
> using default value of 3600
> 3/13 10:57:06 Setting resource limits not implemented!
> 3/13 10:57:06 STARTER_TIMEOUT_MULTIPLIER is undefined, using
> default value of 0
> 3/13 10:57:06 Communicating with shadow <10.40.12.183:4689>
> 3/13 10:57:06 Shadow version: $CondorVersion: 6.7.17 Feb 18 2006 $
> 3/13 10:57:06 Submitting machine is "ORCLUS01.na.sas.com"
> 3/13 10:57:06 ShouldTransferFiles is "YES", transfering files
> 3/13 10:57:06 STARTER_ALLOW_RUNAS_OWNER is undefined, using
> default value of False
> 3/13 10:57:06 init_user_ids: want user 'nobody@.', current is
> '(null)@(null)'
> 3/13 10:57:06 Using dynamic user account.
> 3/13 10:57:06 dynuser: Re-enabling account (condor-reuse-vm2)
> 3/13 10:57:06 dynuser::createuser(condor-reuse-vm2) successful
> 3/13 10:57:06 perm::init() starting up for account
> (condor-reuse-vm2) domain (NULL)
> 3/13 10:57:06 perm::init: Found Account Name condor-reuse-vm2
> 3/13 10:57:06 Done moving to directory "C:\condor\execute\dir_3660"
> 3/13 10:57:06 TokenCache contents:
> condor-reuse-vm2@.
> 3/13 10:57:06 JICShadow::initIOProxy(): Job does not define
> WantIOProxy
> 3/13 10:57:06 No StarterUserLog found in job ClassAd
> 3/13 10:57:06 Starter will not write a local UserLog
> 3/13 10:57:06 Changing the executable name
> 3/13 10:57:06 entering FileTransfer::Init
> 3/13 10:57:06 entering FileTransfer::SimpleInit
> 3/13 10:57:06 TransferIntermediate="(none)"
> 3/13 10:57:06 entering FileTransfer::DownloadFiles
> 3/13 10:57:06 STARTER_TIMEOUT_MULTIPLIER is undefined, using
> default value of 0
> 3/13 10:57:06 entering FileTransfer::Download
> 3/13 10:57:06 About to sock duplicate, old sock=6C0 new
> sock=FFFFFFFF state=0
> 3/13 10:57:06 Socket duplicated, old sock=6C0 new sock=698 state=0
> 3/13 10:57:06 In win32_thread_start_func
> 3/13 10:57:06 entering FileTransfer::DownloadThread
> 3/13 10:57:06 entering FileTransfer::DoDownload sync=1
> 3/13 10:57:06 TokenCache contents:
> condor-reuse-vm2@.
> 3/13 10:57:06 get_file(): going to write to filename
> C:\condor/execute\dir_3660\condor_exec.exe
> 3/13 10:57:06 get_file: Receiving 473 bytes
> 3/13 10:57:06 get_file: wrote 473 bytes to file
> 3/13 10:57:06 ReliSock::get_file_with_permissions(): received
> null permissions from peer, not setting
> 3/13 10:57:06 ProcAPI sanity failure, cpuusage = -0.000000
> 3/13 10:57:06 ProcAPI sanity failure, age = -98162766
> 3/13 10:57:06 STARTER_TIMEOUT_MULTIPLIER is undefined, using
> default value of 0
> 3/13 10:57:06 File transfer completed successfully.
> 3/13 10:57:07 Calling client FileTransfer handler function.
> 3/13 10:57:07 Job 13.1 set to execute immediately
> 3/13 10:57:07 DaemonCore: in SendAliveToParent()
> 3/13 10:57:07 DaemonCore: attempting to connect to
> '<10.40.12.183:1737>'
> 3/13 10:57:07 STARTER_TIMEOUT_MULTIPLIER is undefined, using
> default value of 0
> 3/13 10:57:07 SEC_TCP_SESSION_TIMEOUT is undefined, using
> default value of 20
> 3/13 10:57:07 Starting a VANILLA universe job with ID: 13.1
> 3/13 10:57:07 In OsProc::OsProc()
> 3/13 10:57:07 Main job KillSignal: 15 (Unknown)
> 3/13 10:57:07 Main job RmKillSignal: 15 (Unknown)
> 3/13 10:57:07 Main job HoldKillSignal: 15 (Unknown)
> 3/13 10:57:07 in VanillaProc::StartJob()
> 3/13 10:57:07 Executable is .bat, so running
> C:\WINDOWS\system32\cmd.exe /Q /C condor_exec.bat
> 3/13 10:57:07 in OsProc::StartJob()
> 3/13 10:57:07 IWD: C:\condor/execute\dir_3660
> 3/13 10:57:07 TokenCache contents:
> condor-reuse-vm2@.
> 3/13 10:57:07 Input file: NUL
> 3/13 10:57:07 Output file: C:\condor/execute\dir_3660\hello1.out
> 3/13 10:57:07 Error file: NUL
> 3/13 10:57:07 Renice expr "10" evaluated to 10
> 3/13 10:57:07 About to exec C:\WINDOWS\system32\cmd.exe
> condor_exec.exe /Q /C condor_exec.bat
> 3/13 10:57:07 Env = _CONDOR_SCRATCH_DIR=C:\condor\execute\dir_3660
> 3/13 10:57:07 GetBinaryType() returned 0
> 3/13 10:57:07 TokenCache contents:
> condor-reuse-vm2@.
> 3/13 10:57:07 Create_Process: CreateProcess failed, errno=5
> 3/13 10:57:07 ERROR
> "Create_Process(C:\WINDOWS\system32\cmd.exe,condor_exec.exe
> /Q /C condor_exec.bat, ...) failed" at line 373 in file
> ..\src\condor_starter.V6.1\os_proc.C
> 3/13 10:57:07 ShutdownFast all jobs.
> 3/13 10:57:07 Got ShutdownFast when no jobs running.
> 3/13 10:57:31 NET_REMAP_ENABLE is undefined, using default
> value of False
>
>
>
> > > Here's the shadow log on the submit machine - I am not sure
> > if that helps...
> > >
> >
> > What would be more useful would be StarterLog.vm2 on
> > ORCLUS01.na.sas.com
> >
> > >
> > > In the MasterLog, I also keep seeing the following:
> > "ProcAPI sanity failure, age = xxxx". This error seems serious.
> >
> > I think we fixed this bug just this morning (the tyep we were using
> > didn't have enough precision, hence the bogus value) - it
> will be in
> > 6.7.18.
> >
> > -Erik
>
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>