Hi Everyone, Just so this goes on the Record, the fix was setting the VM1_USER and the VM2_USER on the condor_config.local to the administrator account. I had it set to DOMAIN/Administrator... which I found out was wrong from the Starter Logs... it was supposed to be DOMAIN\Administrator Thanks for your response Roman. Regards Mark Roman Zubatyuk wrote: Mark, Not really bug, but enhanced security on Windows 2003. Check permissions for cmd.exe. By default members of users group can not execute this file. Hope this helps, Roman. 2007/1/15, Mark Ellul <mark@xxxxxxxxxxx>:Hi Everyone, I think there might be a bug with condor v6.8.3 working with Windows 2003. I have 2 Windows 2003 Servers and a Windows XP box connected to one pool. The pool Manager is on a windows 2003 box, which does not run any jobs. I have a job which consists of a batch file which runs a PHP script by copying PHP onto the machine and runs the script. With the Same Pool when the Win XP machine is assigned the job, it runs no problem. However when it is assigned to the windows 2003 box, I get an error as below... (more info to follow.....) ------------------------------------------------------------------------------------------------------- 001 (030.000.000) 01/15 16:51:18 Job executing on host: <192.168.2.202:4544> ... 007 (030.000.000) 01/15 16:51:18 Shadow exception! Error from starter on vm1@STAGING: Create_Process(C:\WINDOWS\system32\cmd.exe,/Q /C condor_exec.bat translate_desc_en_pt.php, VIDEOID, ...) failed 0 - Run Bytes Sent By Job 8139560 - Run Bytes Received By Job ... --------------------------------------------------------------------------------------------- Submit file -------------------------------------------------------------------------------------------- # file name: my_program.condor # Condor submit description file for my_program Executable = p.bat Universe = vanilla Error = logs/$(cluster).err.log Output = logs/$(cluster).out.log Log = logs/$(cluster).log initialdir = files should_transfer_files = YES when_to_transfer_output = ON_EXIT transfer_input_files = translate_desc_en_pt.php,php.exe,gtkextra.dll,iconv.dll,intl.dll,libgdk-0.dll,libglade.dll,libglib-2.0-0.dll,libgmodule-2.0-0.dll,libgobject-2.0-0.dll,libgthread-2.0-0.dll,libgtk-0.dll,libxml2.dll,php4ts.dll,php.ini,php.ini-gtk,php_gtk.dll,php_gtk_combobutton.dll,php_gtk_extra.dll,php_gtk_libglade.dll,php_gtk_scintilla.dll,php_gtk_scrollpane.dll,php_gtk_spaned.dll,php_gtk_sqpane.dll,php_win.exe, php-cgi.exe,zlib.dll Arguments = translate_desc_en_pt.php, VIDEOID #Arguments = -? Requirements = OpSys != "Dummy" && Arch != "Dummy" Queue -------------------------------------------------------------------------------------------- 1/15 16:51:15 ****************************************************** 1/15 16:51:15 ** condor_shadow (CONDOR_SHADOW) STARTING UP 1/15 16:51:15 ** C:\condor\bin\condor_shadow.exe 1/15 16:51:15 ** $CondorVersion: 6.8.3 Jan 5 2007 $ 1/15 16:51:15 ** $CondorPlatform: INTEL-WINNT50 $ 1/15 16:51:15 ** PID = 3948 1/15 16:51:15 ** Log last touched 1/15 16:51:13 1/15 16:51:15 ****************************************************** 1/15 16:51:15 Using config source: C:\condor\condor_config 1/15 16:51:15 Using local config sources: 1/15 16:51:15 C:\condor/condor_config.local 1/15 16:51:15 DaemonCore: Command Socket at <192.168.2.124:4788> 1/15 16:51:15 Initializing a VANILLA shadow for job 30.0 1/15 16:51:15 (30.0) (3948): Request to run on <192.168.2.202:4544> was ACCEPTED 1/15 16:51:18 (30.0) (3948): ERROR "Error from starter on vm1@STAGING: Create_Process(C:\WINDOWS\system32\cmd.exe,/Q /C condor_exec.bat translate_desc_en_pt.php, VIDEOID, ...) failed" at line 643 in file ..\src\condor_shadow.V6.1\pseudo_ops.C 1/15 16:53:59 ****************************************************** I then looked up a previous users post whose problem was similar and using the http://condor.optena.com/display/CONDOR/Common+Windows+Problems page I can see that there needs a VM1_USER in the configuration which I have done... Then the error I get is below.... 1/15 17:18:48 ****************************************************** 1/15 17:18:48 ** condor_shadow (CONDOR_SHADOW) STARTING UP 1/15 17:18:48 ** C:\condor\bin\condor_shadow.exe 1/15 17:18:48 ** $CondorVersion: 6.8.3 Jan 5 2007 $ 1/15 17:18:48 ** $CondorPlatform: INTEL-WINNT50 $ 1/15 17:18:48 ** PID = 2080 1/15 17:18:48 ** Log last touched 1/15 17:11:04 1/15 17:18:48 ****************************************************** 1/15 17:18:48 Using config source: C:\condor\condor_config 1/15 17:18:48 Using local config sources: 1/15 17:18:48 C:\condor/condor_config.local 1/15 17:18:48 DaemonCore: Command Socket at <192.168.2.124:1098> 1/15 17:18:48 Initializing a VANILLA shadow for job 32.0 1/15 17:18:48 (32.0) (2080): Request to run on <192.168.2.202:3310> was ACCEPTED 1/15 17:18:49 (32.0) (2080): condor_read(): recv() returned -1, errno = 10054, assuming failure reading 5 bytes from <192.168.2.202:3310>. 1/15 17:18:49 (32.0) (2080): Can no longer talk to condor_starter <192.168.2.202:3310> 1/15 17:18:49 (32.0) (2080): Trying to reconnect to disconnected job 1/15 17:18:49 (32.0) (2080): LastJobLeaseRenewal: 1168881529 Mon Jan 15 17:18:49 2007 1/15 17:18:49 (32.0) (2080): JobLeaseDuration: 1200 seconds 1/15 17:18:49 (32.0) (2080): JobLeaseDuration remaining: 1200 1/15 17:18:49 (32.0) (2080): Attempting to locate disconnected starter 1/15 17:18:49 (32.0) (2080): Found starter: <192.168.2.202:3362> 1/15 17:18:49 (32.0) (2080): Attempting to reconnect to starter <192.168.2.202:3362> 1/15 17:18:50 (32.0) (2080): attempt to connect to <192.168.2.202:3362> failed: connect errno = 10061 connection refused. 1/15 17:18:50 (32.0) (2080): Attempt to reconnect failed: Failed to connect to starter <192.168.2.202:3362> 1/15 17:18:50 (32.0) (2080): JobLeaseDuration remaining: 1199 1/15 17:18:50 (32.0) (2080): Scheduling another attempt to reconnect in 8 seconds 1/15 17:18:58 (32.0) (2080): Attempting to locate disconnected starter 1/15 17:18:58 (32.0) (2080): locateStarter(): ClaimId (<192.168.2.202:3310>#1168881462#1) and GlobalJobId ( cellast-cxo5mw2#1168881032#32.0 ) not found 1/15 17:18:58 (32.0) (2080): Reconnect FAILED: Job not found at execution machine 1/15 17:18:58 (32.0) (2080): **** condor_shadow (condor_SHADOW) EXITING WITH STATUS 107 My Gut feeling is that its a bug with the file transfer of multiple files with Windows 2003. The reason I say its the multiple files... is that I can get a simple hello world transferring the hello.exe accross no problems... its just when its multiple files. The exact same job description works fine on the same pool to windows XP. Any thoughts would be muchly appreciated. Regards Mark -- Mark Ellul Research and Development Manager This email and any attachments may be confidential or legally privileged. If you received this message in error or are not the intended recipient. you should destroy the e-mail message and any attachments or copies, and you are prohibited from retaining, distributing, disclosing or using any information containing herein. Please inform us of the erroneous delivery by return e-mail. Thank you for your co-operation. www.cellcast.tv 150 Great Portland Street London W1W 6QD UK Tel: (020) 7190 0300 Fax: (020) 7190 0301 _______________________________________________ Condor-users mailing list To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/condor-users The archives can be found at either https://lists.cs.wisc.edu/archive/condor-users/ http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR_______________________________________________ Condor-users mailing list To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/condor-users The archives can be found at either https://lists.cs.wisc.edu/archive/condor-users/ http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR -- Mark Ellul Research and Development Manager This email and any attachments may be confidential or legally privileged. If you received this message in error or are not the intended recipient. you should destroy the e-mail message and any attachments or copies, and you are prohibited from retaining, distributing, disclosing or using any information containing herein. Please inform us of the erroneous delivery by return e-mail. Thank you for your co-operation. www.cellcast.tv 150 Great Portland Street London W1W 6QD UK Tel: (020) 7190 0300 Fax: (020) 7190 0301 |