[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

AW: [condor-users] Shadow Exception!



Hi Roberto,

thanks for your advise, but I already tried this. By the way, this is the
default value for windows-machines as far as I know.
But I think I found the problem. I checked the ShadowLog (thanks to Colin)
and found the following:

10/10 18:57:51 ******************************************************
10/10 18:57:51 Using config file: C:\Condor\condor_config
10/10 18:57:51 Using local config files: C:\Condor\condor_config.local
10/10 18:57:51 DaemonCore: Command Socket at <128.176.208.220:4327>
10/10 18:57:52 Initializing a VANILLA shadow
10/10 18:57:52 (7.0) (888): Request to run on <128.176.206.148:1281> was
ACCEPTED
10/10 18:58:45 (7.0) (888): perm::NetGetDCName() failed: DCNotFound (domain
looked up: VORDEFINIERT)10/10 18:58:52 (7.0) (888): perm::NetGetDCName()
failed: DCNotFound (domain looked up: NT-AUTORITAT)10/10 18:58:59 (7.0)
(888): perm::NetGetDCName() failed: DCNotFound (domain looked up:
VORDEFINIERT)10/10 18:58:59 (7.0) (888): DoDownload: Permission denied to
write file C:\BATCH\fort.19!
10/10 18:58:59 (7.0) (888): ERROR "Can no longer talk to condor_starter on
execute machine (128.176.206.148)" at line 141 in file
..\src\condor_shadow.V6.1\NTreceivers.C
10/10 18:59:00 ******************************************************

I don't know why, but Condor seems to look for the domains VORDEFINIERT and
NT-AUTORITAT. Maybe our multi-language-pack causes this poblem. We have the
english version of Windows2000 installed but our "interface-language" is
german. I check this by now and post my experience later.

Thomas Bauer

-----Ursprungliche Nachricht-----
Von: owner-condor-users@xxxxxxxxxxx
[mailto:owner-condor-users@xxxxxxxxxxx]Im Auftrag von Roberto Gonzalez
Gesendet: Montag, 13. Oktober 2003 08:41
An: condor-users@xxxxxxxxxxx
Betreff: Re: [condor-users] Shadow Exception!


Try to add

should_transfer_files = YES
when_to_transfer_output = ON_EXIT

into your .sub file

Regards,

Roberto

Thomas Bauer wrote:

>Hello again,
>
>I am still trying to get Condor working on my little testing-pool of 4
>Intel-Windows2000(SP4)-machines. I found out, that there seems to be a
>problem with writing the results back to the submitting machines, but I
>don't know how to solve this problem. I have a program for testing, which
>creates three files (fort.19,fort.20,fort.21) with the results. This
program
>works fine on any machine of the pool without using condor. When I submit
>this program to the pool, everything works fine till the first result is
>calculated. The log-file of the job says the following:
>===========================================================================
=
>===============
>000 (007.000.000) 10/10 18:57:48 Job submitted from host: <x.x.x.x:4178>
>...
>001 (007.000.000) 10/10 18:57:56 Job executing on host: <y.y.y.y:1281>
>...
>006 (007.000.000) 10/10 18:58:05 Image size of job updated: 792
>...
>007 (007.000.000) 10/10 18:58:59 Shadow exception!
>	Can no longer talk to condor_starter on execute machine (y.y.y.y)
>	0  -  Run Bytes Sent By Job
>	528441  -  Run Bytes Received By Job
>...
>===========================================================================
=
>===============
>(The 528441 received bytes are exactly the size of the executable)
>
>To look what had happened, I checked the starterlog on the executing
>machine:
>===========================================================================
=
>==============
>10/10 18:57:52 ******************************************************
>10/10 18:57:52 ** condor_starter (CONDOR_STARTER) STARTING UP
>10/10 18:57:52 ** $CondorVersion: 6.5.5 Sep 17 2003 $
>10/10 18:57:53 ** $CondorPlatform: INTEL-WINNT40 $
>10/10 18:57:53 ** PID = 652
>10/10 18:57:53 ******************************************************
>10/10 18:57:53 Using config file: C:\Condor\condor_config
>10/10 18:57:53 Using local config files: C:\Condor\condor_config.local
>10/10 18:57:53 DaemonCore: Command Socket at <y.y.y.y:1328>
>10/10 18:57:53 Setting resource limits not implemented!
>10/10 18:57:53 Starter communicating with condor_shadow <x.x.x.x:4327>
>10/10 18:57:53 Submitting machine is "COMPUTERNAME.DOMAIN.COM"
>10/10 18:57:55 File transfer completed successfully.
>10/10 18:57:56 Starting a VANILLA universe job with ID: 7.0
>10/10 18:57:56 IWD: C:\Condor\execute\dir_652
>10/10 18:57:56 Output file: C:\Condor\execute\dir_652\trapez.out
>10/10 18:57:56 Error file: C:\Condor\execute\dir_652\trapez.err
>10/10 18:57:56 Renice expr "10" evaluated to 10
>10/10 18:57:56 About to exec C:\Condor\execute\dir_652\condor_exec.exe
>10/10 18:57:56 Create_Process succeeded, pid=1416
>10/10 18:58:38 Process exited, pid=1416, status=0
>10/10 18:58:59 ReliSock: put_file: TransmitFile() failed, errno=10054
>10/10 18:58:59 ERROR "DoUpload: Failed to send file
>C:\Condor\execute\dir_652\fort.19, exiting at 1371
>" at line 1370 in file ..\src\condor_c++_util\file_transfer.C
>10/10 18:58:59 ShutdownFast all jobs.
>10/10 18:58:59 Error disabling account condor-reuse-vm1 (ACCESS DENIED)
>===========================================================================
=
>==============
>
>In one of the last lines there seems to be the failure. The file fort.19 is
>calculated and created, but can't be send back. I don't have the
>source-code, because of that, I don't know, why the program exits at line
>1371.
>Than, I tested a job, which had a batch-file (@echo Hello!) as executable.
>This job executed without any problems. I did't find any error-messages,
but
>the output (Hello!) was not written to the hello.out-file, which I
>designated to be the output-file.
>
>Does anybody know, what I am doing wrong? I don't believe that this has
>something to do with user-rights, because I already made tests with very
low
>privilegs needed to write on those harddisks. Maybe one of you can tell me,
>what is written in the 1370th line of that c-program?
>
>Thanks in forward,
>Thomas Bauer
>
>Condor Support Information:
>http://www.cs.wisc.edu/condor/condor-support/
>To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with
>unsubscribe condor-users <your_email_address>
>
>
>
>


Condor Support Information:
http://www.cs.wisc.edu/condor/condor-support/
To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with
unsubscribe condor-users <your_email_address>


Condor Support Information:
http://www.cs.wisc.edu/condor/condor-support/
To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with
unsubscribe condor-users <your_email_address>