Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Shadow exception errors
- Date: Tue, 14 Feb 2006 13:08:18 +0800
- From: <Greg.Hitchen@xxxxxxxx>
- Subject: Re: [Condor-users] Shadow exception errors
OK, this one is fixed. The user had changed their password and
needed to run the condor_store_cred command again.
> -----Original Message-----
> From: condor-users-bounces@xxxxxxxxxxx
> [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of
> Hitchen, Greg (E&M, Kensington)
> Sent: Tuesday, 14 February 2006 9:34 AM
> To: condor-users@xxxxxxxxxxx
> Subject: Re: [Condor-users] Shadow exception errors
>
>
>
> The shadowlog of the submitter is also giving these errors:
>
> 2/14 12:28:10 ******************************************************
> 2/14 12:28:10 ** condor_shadow (CONDOR_SHADOW) STARTING UP
> 2/14 12:28:10 ** C:\Condor\bin\condor_shadow.exe
> 2/14 12:28:10 ** $CondorVersion: 6.6.10 Jun 22 2005 $
> 2/14 12:28:10 ** $CondorPlatform: INTEL-WINNT50 $
> 2/14 12:28:10 ** PID = 3060
> 2/14 12:28:10 ******************************************************
> 2/14 12:28:11 Using config file: c:\condor\condor_config
> 2/14 12:28:11 Using local config files:
> C:\Condor/condor_config.local 2/14 12:28:11 DaemonCore:
> Command Socket at <130.155.67.83:9434> 2/14 12:28:12
> Initializing a VANILLA shadow 2/14 12:28:12 (2.0) (3060):
> init_user_ids: LogonUser failed with NT Status 1326 2/14
> 12:28:12 (2.0) (3060): init_user_ids() failed! 2/14 12:28:12
> (2.0) (3060): init_user_ids: LogonUser failed with NT Status
> 1326 2/14 12:28:12 (2.0) (3060): init_user_ids() failed! 2/14
> 12:28:12 (2.0) (3060): ERROR "set_user_priv() failed!" at
> line 400 in file ..\src\condor_c++_util\uids.C
>
> > -----Original Message-----
> > From: condor-users-bounces@xxxxxxxxxxx
> > [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of
> > Hitchen, Greg (E&M, Kensington)
> > Sent: Monday, 13 February 2006 2:02 PM
> > To: condor-users@xxxxxxxxxxx
> > Subject: [Condor-users] Shadow exception errors
> >
> >
> >
> > Hi
> >
> > We have been setting up and experimenting with condor for a
> > while and now have some "real" users onboard using the system.
> >
> > This user has submitted a number of jobs that keep trying to
> > start, fail and start again. There are shadow execption
> > problems and eviction problems. Just concentrating on the
> > shadow exception problems for now I have including logs from
> > the submitting machine and from 2 different execute machines.
> >
> > What problem is likely to cause these type of error messages?
> >
> > The first example involves flocking to a different pool at a
> > different site. The second involves a jobs in the same pool,
> > but machines still at a physically different site. In both
> > cases hardware firewalls (PIX's) site between but we have set
> > highport, lowport in the configs and enabled tcp/udp for the
> > 9000-10000 port range.
> >
> > Thanks.
> >
> > Cheers
> >
> > Greg
> >
> > SHADOW LOG OF SUBMITTING MACHINE
> >
> > 2/13 10:54:09 ******************************************************
> > 2/13 10:54:09 ** condor_shadow (CONDOR_SHADOW) STARTING UP 2/13
> > 10:54:09 ** C:\Condor\bin\condor_shadow.exe 2/13 10:54:09 **
> > $CondorVersion: 6.6.10 Jun 22 2005 $ 2/13 10:54:09 **
> $CondorPlatform:
> > INTEL-WINNT50 $ 2/13 10:54:09 ** PID = 1268
> > 2/13 10:54:09 ******************************************************
> > 2/13 10:54:09 Using config file: c:\condor\condor_config
> > 2/13 10:54:09 Using local config files:
> > C:\Condor/condor_config.local 2/13 10:54:09 DaemonCore:
> > Command Socket at <130.155.67.83:9091> 2/13 10:54:32
> > Initializing a VANILLA shadow 2/13 10:54:32 (7.0) (1268):
> > Request to run on <130.116.147.52:9590> was ACCEPTED 2/13
> > 10:54:45 (7.0) (1268): ReliSock: put_file: Failed to open
> > file C:\Documents and
> > Settings\odw010\.condorqueue\D78aUAA.egs, errno = 2. 2/13
> > 10:54:45 (7.0) (1268): ERROR "DoUpload: Failed to send file
> > C:\Documents and Settings\odw010\.condorqueue\D78aUAA.egs,
> > exiting at 1398 " at line 1397 in file
> > ..\src\condor_c++_util\file_transfer.C
> > 2/13 10:54:46 ******************************************************
> > 2/13 10:54:46 ** condor_shadow (CONDOR_SHADOW) STARTING UP
> > 2/13 10:54:46 ** C:\Condor\bin\condor_shadow.exe
> > 2/13 10:54:46 ** $CondorVersion: 6.6.10 Jun 22 2005 $
> > 2/13 10:54:46 ** $CondorPlatform: INTEL-WINNT50 $
> > 2/13 10:54:46 ** PID = 2676
> > 2/13 10:54:46 ******************************************************
> > 2/13 10:54:47 Using config file: c:\condor\condor_config
> > 2/13 10:54:47 Using local config files:
> > C:\Condor/condor_config.local 2/13 10:54:47 DaemonCore:
> > Command Socket at <130.155.67.83:9741> 2/13 10:55:09
> > Initializing a VANILLA shadow 2/13 10:55:09 (7.0) (2676):
> > Request to run on <130.116.147.52:9590> was ACCEPTED 2/13
> > 10:55:14 (7.0) (2676): ReliSock: put_file: Failed to open
> > file C:\Documents and
> > Settings\odw010\.condorqueue\D78aUAA.egs, errno = 2. 2/13
> > 10:55:14 (7.0) (2676): ERROR "DoUpload: Failed to send file
> > C:\Documents and Settings\odw010\.condorqueue\D78aUAA.egs,
> > exiting at 1398 " at line 1397 in file
> > ..\src\condor_c++_util\file_transfer.C
> > 2/13 11:07:43 (5.0) (1076): Job 5.0 is being evicted
> > 2/13 11:07:43 (5.0) (1076): **** condor_shadow
> > (condor_SHADOW) EXITING WITH STATUS 107
> >
> > STARTER LOG OF EXECUTE MACHINE
> >
> > 2/13 06:40:56 ******************************************************
> > 2/13 06:40:56 ** condor_starter (CONDOR_STARTER) STARTING UP
> > 2/13 06:40:56 ** C:\Condor\bin\condor_starter.exe 2/13
> > 06:40:56 ** $CondorVersion: 6.6.10 Jun 22 2005 $ 2/13
> > 06:40:56 ** $CondorPlatform: INTEL-WINNT50 $ 2/13 06:40:56 **
> > PID = 4048 2/13 06:40:56
> > ******************************************************
> > 2/13 06:40:56 Using config file: c:\condor\condor_config
> > 2/13 06:40:56 Using local config files:
> > C:\Condor/condor_config.local 2/13 06:40:56 DaemonCore:
> > Command Socket at <130.116.147.52:9448> 2/13 06:40:56 Setting
> > resource limits not implemented! 2/13 06:41:15 Starter
> > communicating with condor_shadow <130.155.67.83:9691> 2/13
> > 06:41:15 Submitting machine is
> > "student3-lu.minerals.csiro.au" 2/13 06:41:33 File transfer
> > completed successfully. 2/13 06:41:33 Starting a VANILLA
> > universe job with ID: 3.0 2/13 06:41:33 IWD:
> > C:\Condor/execute\dir_4048 2/13 06:41:33 Output file:
> > C:\Condor/execute\dir_4048\D7EG9AB.log
> > 2/13 06:41:34 Renice expr "10" evaluated to 10
> > 2/13 06:41:34 About to exec
> C:\Condor\execute\dir_4048\condor_exec.exe
> > D7EG9AB.egs
> > 2/13 06:41:34 Create_Process succeeded, pid=2932
> > 2/13 07:10:28 Got SIGQUIT. Performing fast shutdown.
> > 2/13 07:10:28 ShutdownFast all jobs.
> > 2/13 07:10:28 Process exited, pid=2932, status=0
> > 2/13 07:10:28 Last process exited, now Starter is exiting
> > 2/13 07:10:28 **** condor_starter (condor_STARTER) EXITING
> > WITH STATUS 0 2/13 07:38:11
> > ******************************************************
> > 2/13 07:38:11 ** condor_starter (CONDOR_STARTER) STARTING UP
> > 2/13 07:38:11 ** C:\Condor\bin\condor_starter.exe 2/13
> > 07:38:11 ** $CondorVersion: 6.6.10 Jun 22 2005 $ 2/13
> > 07:38:11 ** $CondorPlatform: INTEL-WINNT50 $ 2/13 07:38:11 **
> > PID = 3688 2/13 07:38:11
> > ******************************************************
> > 2/13 07:38:11 Using config file: c:\condor\condor_config
> > 2/13 07:38:11 Using local config files:
> > C:\Condor/condor_config.local 2/13 07:38:11 DaemonCore:
> > Command Socket at <130.116.147.52:9413> 2/13 07:38:11 Setting
> > resource limits not implemented! 2/13 07:38:11 Starter
> > communicating with condor_shadow <130.155.67.83:9541> 2/13
> > 07:38:11 Submitting machine is
> > "student3-lu.minerals.csiro.au" 2/13 07:38:29 File transfer
> > completed successfully. 2/13 07:38:29 Starting a VANILLA
> > universe job with ID: 7.0 2/13 07:38:29 IWD:
> > C:\Condor/execute\dir_3688 2/13 07:38:29 Output file:
> > C:\Condor/execute\dir_3688\D78aUAA.log
> > 2/13 07:38:29 Renice expr "10" evaluated to 10
> > 2/13 07:38:29 About to exec
> C:\Condor\execute\dir_3688\condor_exec.exe
> > D78aUAA.egs
> > 2/13 07:38:29 Create_Process succeeded, pid=2716
> > 2/13 07:44:09 Process exited, pid=2716, status=0
> > 2/13 07:44:10 ReliSock: put_file: Failed to open file
> > C:\Condor/execute\dir_3688\D78aUAA.condorlog, errno = 2. 2/13
> > 07:44:10 ERROR "DoUpload: Failed to send file
> > C:\Condor/execute\dir_3688\D78aUAA.condorlog, exiting at 1398
> > " at line 1397 in file ..\src\condor_c++_util\file_transfer.C
> > 2/13 07:44:10 ShutdownFast all jobs.
> > 2/13 07:44:10 Error disabling account condor-reuse-vm1
> (ACCESS DENIED)
> >
> >
> > SHADOW LOG OF SUBMITTING MACHINE
> >
> > 2/12 16:55:49 ******************************************************
> > 2/12 16:55:49 ** condor_shadow (CONDOR_SHADOW) STARTING UP 2/12
> > 16:55:49 ** C:\Condor\bin\condor_shadow.exe 2/12 16:55:49 **
> > $CondorVersion: 6.6.10 Jun 22 2005 $ 2/12 16:55:49 **
> $CondorPlatform:
> > INTEL-WINNT50 $ 2/12 16:55:49 ** PID = 1068
> > 2/12 16:55:49 ******************************************************
> > 2/12 16:55:49 Using config file: c:\condor\condor_config
> > 2/12 16:55:49 Using local config files:
> > C:\Condor/condor_config.local 2/12 16:55:50 DaemonCore:
> > Command Socket at <130.155.67.83:9698> 2/12 16:56:12
> > Initializing a VANILLA shadow 2/12 16:56:12 (5.0) (1068):
> > Request to run on <138.194.10.81:9018> was ACCEPTED 2/12
> > 16:56:40 (5.0) (1068): condor_read(): recv() returned -1,
> > errno = 10054, assuming failure. 2/12 16:56:40 (5.0) (1068):
> > condor_read(): recv() returned -1, errno = 10054, assuming
> > failure. 2/12 16:56:41 (5.0) (1068): ERROR "Can no longer
> > talk to condor_starter on execute machine (138.194.10.81)" at
> > line 63 in file ..\src\condor_shadow.V6.1\NTreceivers.C
> > 2/12 16:56:42 ******************************************************
> > 2/12 16:56:42 ** condor_shadow (CONDOR_SHADOW) STARTING UP
> > 2/12 16:56:42 ** C:\Condor\bin\condor_shadow.exe
> > 2/12 16:56:42 ** $CondorVersion: 6.6.10 Jun 22 2005 $
> > 2/12 16:56:42 ** $CondorPlatform: INTEL-WINNT50 $
> > 2/12 16:56:42 ** PID = 492
> > 2/12 16:56:42 ******************************************************
> > 2/12 16:56:42 Using config file: c:\condor\condor_config
> > 2/12 16:56:42 Using local config files:
> > C:\Condor/condor_config.local 2/12 16:56:42 DaemonCore:
> > Command Socket at <130.155.67.83:9289> 2/12 16:57:04
> > Initializing a VANILLA shadow 2/12 16:57:04 (5.0) (492):
> > Request to run on <138.194.10.81:9018> was ACCEPTED 2/12
> > 16:57:12 (5.0) (492): condor_read(): recv() returned -1,
> > errno = 10054, assuming failure. 2/12 16:57:12 (5.0) (492):
> > condor_read(): recv() returned -1, errno = 10054, assuming
> > failure. 2/12 16:57:12 (5.0) (492): ERROR "Can no longer talk
> > to condor_starter on execute machine (138.194.10.81)" at line
> > 63 in file ..\src\condor_shadow.V6.1\NTreceivers.C
> >
> > STARTER LOG OF EXECUTING MACHINE
> >
> > 2/10 23:44:22 ******************************************************
> > 2/10 23:44:22 ** condor_starter (CONDOR_STARTER) STARTING UP
> > 2/10 23:44:22 ** C:\Condor\bin\condor_starter.exe 2/10
> > 23:44:22 ** $CondorVersion: 6.6.10 Jun 22 2005 $ 2/10
> > 23:44:22 ** $CondorPlatform: INTEL-WINNT50 $ 2/10 23:44:22 **
> > PID = 3508 2/10 23:44:22
> > ******************************************************
> > 2/10 23:44:22 Using config file: C:\Condor\condor_config
> > 2/10 23:44:22 Using local config files:
> > C:\Condor/condor_config.local 2/10 23:44:22 DaemonCore:
> > Command Socket at <138.194.10.81:9790> 2/10 23:44:22 Setting
> > resource limits not implemented! 2/10 23:44:41 Starter
> > communicating with condor_shadow <130.155.67.83:9344> 2/10
> > 23:44:41 Submitting machine is
> > "student3-lu.minerals.CSIRO.AU" 2/10 23:44:47 File transfer
> > completed successfully. 2/10 23:44:47 Starting a VANILLA
> > universe job with ID: 4.0 2/10 23:44:47 IWD:
> > C:\Condor/execute\dir_3508 2/10 23:44:47 Output file:
> > C:\Condor/execute\dir_3508\D7EG9AC.log
> > 2/10 23:44:47 Renice expr "10" evaluated to 10
> > 2/10 23:44:47 About to exec
> C:\Condor\execute\dir_3508\condor_exec.exe
> > D7EG9AC.egs
> > 2/10 23:44:47 Create_Process succeeded, pid=3860
> > 2/10 23:45:08 Process exited, pid=3860, status=-1
> > 2/10 23:45:09 ReliSock: put_file: Failed to open file
> > C:\Condor/execute\dir_3508\D7EG9AC.condorlog, errno = 2. 2/10
> > 23:45:09 ERROR "DoUpload: Failed to send file
> > C:\Condor/execute\dir_3508\D7EG9AC.condorlog, exiting at 1398
> > " at line 1397 in file ..\src\condor_c++_util\file_transfer.C
> > 2/10 23:45:09 ShutdownFast all jobs.
> > 2/10 23:45:09 Error disabling account condor-reuse-vm1
> > (ACCESS DENIED) 2/10 23:45:32
> > ******************************************************
> > 2/10 23:45:32 ** condor_starter (CONDOR_STARTER) STARTING UP
> > 2/10 23:45:32 ** C:\Condor\bin\condor_starter.exe 2/10
> > 23:45:32 ** $CondorVersion: 6.6.10 Jun 22 2005 $ 2/10
> > 23:45:32 ** $CondorPlatform: INTEL-WINNT50 $ 2/10 23:45:32 **
> > PID = 3624 2/10 23:45:32
> > ******************************************************
> > 2/10 23:45:32 Using config file: C:\Condor\condor_config
> > 2/10 23:45:32 Using local config files:
> > C:\Condor/condor_config.local 2/10 23:45:32 DaemonCore:
> > Command Socket at <138.194.10.81:9438> 2/10 23:45:32 Setting
> > resource limits not implemented! 2/10 23:45:33 Starter
> > communicating with condor_shadow <130.155.67.83:9216> 2/10
> > 23:45:33 Submitting machine is
> > "student3-lu.minerals.CSIRO.AU" 2/10 23:45:39 File transfer
> > completed successfully. 2/10 23:45:39 Starting a VANILLA
> > universe job with ID: 4.0 2/10 23:45:39 IWD:
> > C:\Condor/execute\dir_3624 2/10 23:45:39 Output file:
> > C:\Condor/execute\dir_3624\D7EG9AC.log
> > 2/10 23:45:39 Renice expr "10" evaluated to 10
> > 2/10 23:45:39 About to exec
> C:\Condor\execute\dir_3624\condor_exec.exe
> > D7EG9AC.egs
> > 2/10 23:45:39 Create_Process succeeded, pid=4092
> > 2/10 23:45:39 Process exited, pid=4092, status=-1
> > 2/10 23:45:40 ReliSock: put_file: Failed to open file
> > C:\Condor/execute\dir_3624\D7EG9AC.condorlog, errno = 2. 2/10
> > 23:45:40 ERROR "DoUpload: Failed to send file
> > C:\Condor/execute\dir_3624\D7EG9AC.condorlog, exiting at 1398
> > " at line 1397 in file ..\src\condor_c++_util\file_transfer.C
> > 2/10 23:45:40 ShutdownFast all jobs.
> > 2/10 23:45:40 Error disabling account condor-reuse-vm1
> (ACCESS DENIED)
> >
> > --------------------------------------------------------------
> > ---------
> > Greg Hitchen
> > greg.hitchen@xxxxxxxx
> > CSIRO Exploration and Mining
> phone:+61 8 6436
> > 8663
> > Australian Resources Research Centre (ARRC) fax: +61 8 6436 8555
> > Postal address:
> > mob: 0407 952
> > 748
> > PO Box 1130, Bentley WA 6102, Australia
> > Street Address:
> > 26 Dick Perry Avenue, Kensington WA 6151
> > --------------------------------------------------------------
> > ---------
> >
> > _______________________________________________
> > Condor-users mailing list
> > Condor-users@xxxxxxxxxxx
> > https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> >
>
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>