Hello,
Iâm hoping with your wisdom you can give me a lead. This is the version information C:\Projects\FMAChallanges\Challenge_Model_1\TUFLOW\runs]condor_status -version $CondorVersion: 8.2.6 Dec 10 2014 BuildID: 287355 $ $CondorPlatform: x86_64_Windows8 $ CONFIGURATION I am running a two computer pool, a central manager and computer C1. I have no local configuration files setup. I have changed the ip addresses and host locations to Central Manager and C1 respectively for this example.
On my central manager I have the following config content: #=============================================== #---------------------CENTRAL MANAGER----------------------------- # # RELEASE_DIR = C:\condor LOCAL_CONFIG_FILE = $(LOCAL_DIR)\condor_config.local REQUIRE_LOCAL_CONFIG_FILE = FALSE LOCAL_CONFIG_DIR = $(LOCAL_DIR)\config use SECURITY : HOST_BASED CONDOR_HOST = WEBR1525.xxxxx.xxxx UID_DOMAIN = xxxxx CONDOR_ADMIN = xxxx.xxx@xxxx SMTP_SERVER = mailblahblah COLLECTOR_NAME = FloodiesMod COLLECTOR_HOST = $(CONDOR_HOST) ALLOW_READ = * ALLOW_WRITE = $(CONDOR_HOST), $(IP_ADDRESS), * ALLOW_ADMINISTRATOR = $(IP_ADDRESS) ALLOW_NEGOTIATOR = $(IP_ADDRESS) ALLOW_DAEMON = * JAVA = C:\PROGRA~2\Java\JRE18~1.0_2\bin\java.exe START = TRUE SUSPEND = FALSE WANT_SUSPEND = TRUE WANT_VACATE = FALSE PREEMPT = FALSE DAEMON_LIST = MASTER SCHEDD COLLECTOR NEGOTIATOR STARTD #=============================================== On C1 I have the following config: #=============================================== #-------------------------------C1----------------------------------------- # # RELEASE_DIR = C:\condor LOCAL_CONFIG_FILE = $(LOCAL_DIR)\condor_config.local REQUIRE_LOCAL_CONFIG_FILE = FALSE LOCAL_CONFIG_DIR = $(LOCAL_DIR)\config use SECURITY : HOST_BASED CONDOR_HOST = WEBR1525.xxxxx.xxxx UID_DOMAIN = xxxxx CONDOR_ADMIN = xxxx.xxx@xxxx SMTP_SERVER = mailblahblah COLLECTOR_NAME = FloodiesMod COLLECTOR_HOST = $(CONDOR_HOST) ALLOW_READ = * ALLOW_WRITE = $(CONDOR_HOST), $(IP_ADDRESS), * ALLOW_ADMINISTRATOR = $(IP_ADDRESS) JAVA = C:\PROGRA~2\Java\JRE18~1.0_2\bin\java.exe START = TRUE SUSPEND = FALSE WANT_SUSPEND = TRUE WANT_VACATE = FALSE PREEMPT = FALSE DAEMON_LIST = MASTER SCHEDD STARTD #=============================================== RUN TESTING I have managed to successfully run the following description file by submitting on the central manager and running on the central manager. This results in the successful simulation of the exe C:\TUFLOW\w64\TUFLOW_iSP_w64.exe #=============================================== ## ## Runfile.txt #=============================================== universe = vanilla executable = C:\TUFLOW\w64\TUFLOW_iSP_w64.exe arguments = "-b -x -s 15ft FMA_T1_~s1~_001.tcf" output = TUFLOW.out error = TUFLOW.err log = example1.log should_transfer_files = IF_NEEDED when_to_transfer_output = ON_EXIT queue #=============================================== Where the problems are startingâ. If I try to run the same job on another on computer C1 from my central server using the requirements command as per RunfileC1.txt: #=============================================== ## ## RunfileC1.txt #=============================================== universe = vanilla executable = C:\TUFLOW\w64\TUFLOW_iSP_w64.exe arguments = "-b -x -s 15ft FMA_T1_~s1~_001.tcf" #input = input.in output = TUFLOW.out error = TUFLOW.err log = example1.log Requirements = (machine == "WEBR1436. .xxxxx.xxxx ") should_transfer_files = IF_NEEDED when_to_transfer_output = ON_EXIT queue #=============================================== LOG ERRORS I donât get any errors per se. The starter log on slot 2 suggests that the scratch directory execute\dir_2456 has been successful. However when I go into this execute folder on computer C1 there is no files or folders within the execute
directory on computer C1. StarterLog.slot2 on computer C1 /20/15 18:35:17 (pid:2456) ****************************************************** 01/20/15 18:35:17 (pid:2456) ** condor_starter (CONDOR_STARTER) STARTING UP 01/20/15 18:35:17 (pid:2456) ** C:\condor\bin\condor_starter.exe 01/20/15 18:35:17 (pid:2456) ** SubsystemInfo: name=STARTER type=STARTER(8) class=DAEMON(1) 01/20/15 18:35:17 (pid:2456) ** Configuration: subsystem:STARTER local:<NONE> class:DAEMON 01/20/15 18:35:17 (pid:2456) ** $CondorVersion: 8.2.6 Dec 10 2014 BuildID: 287355 $ 01/20/15 18:35:17 (pid:2456) ** $CondorPlatform: x86_64_Windows8 $ 01/20/15 18:35:17 (pid:2456) ** PID = 2456 01/20/15 18:35:17 (pid:2456) ** Log last touched 1/20 18:27:07 01/20/15 18:35:17 (pid:2456) ****************************************************** 01/20/15 18:35:17 (pid:2456) Using config source: C:\condor\condor_config 01/20/15 18:35:17 (pid:2456) Using local config sources: 01/20/15 18:35:17 (pid:2456) C:\condor\condor_config.local 01/20/15 18:35:17 (pid:2456) config Macros = 48, Sorted = 47, StringBytes = 1190, TablesBytes = 1368 01/20/15 18:35:17 (pid:2456) CLASSAD_CACHING is OFF 01/20/15 18:35:17 (pid:2456) Daemon Log is logging: D_ALWAYS D_ERROR 01/20/15 18:35:17 (pid:2456) DaemonCore: command socket at <IP C1> 01/20/15 18:35:17 (pid:2456) DaemonCore: private command socket at < IP C1> 01/20/15 18:35:17 (pid:2456) GLEXEC_JOB not supported on this platform; ignoring 01/20/15 18:35:17 (pid:2456) Communicating with shadow < IP Central Master > 01/20/15 18:35:17 (pid:2456) Submitting machine is "webr1525.xxxxx" 01/20/15 18:35:17 (pid:2456) setting the orig job name in starter 01/20/15 18:35:17 (pid:2456) setting the orig job iwd in starter 01/20/15 18:35:17 (pid:2456) Chirp config summary: IO false, Updates false, Delayed updates true. 01/20/15 18:35:17 (pid:2456) Initialized IO Proxy. 01/20/15 18:35:17 (pid:2456) Setting resource limits not implemented! 01/20/15 18:35:18 (pid:2456) File transfer completed successfully. 01/20/15 18:35:19 (pid:2456) Job 121.0 set to execute immediately 01/20/15 18:35:19 (pid:2456) Starting a VANILLA universe job with ID: 121.0 01/20/15 18:35:19 (pid:2456) Tracking process family by login "condor-slot2" 01/20/15 18:35:19 (pid:2456) IWD: C:\condor\execute\dir_2456 01/20/15 18:35:19 (pid:2456) Output file: C:\condor\execute\dir_2456\_condor_stdout 01/20/15 18:35:19 (pid:2456) Error file: C:\condor\execute\dir_2456\_condor_stderr 01/20/15 18:35:19 (pid:2456) Renice expr "10" evaluated to 10 01/20/15 18:35:19 (pid:2456) About to exec C:\condor\execute\dir_2456\condor_exec.exe -b -x -s 15ft FMA_T1_~s1~_001.tcf 01/20/15 18:35:19 (pid:2456) Running job as user condor-slot2 01/20/15 18:35:19 (pid:2456) Create_Process succeeded, pid=3296 01/20/15 18:35:19 (pid:2456) Process exited, pid=3296, status=-1073741515 01/20/15 18:35:19 (pid:2456) Got SIGQUIT. Performing fast shutdown. 01/20/15 18:35:19 (pid:2456) ShutdownFast all jobs. 01/20/15 18:35:23 (pid:2456) **** condor_starter (condor_STARTER) pid 2456 EXITING WITH STATUS 0 ShadowLog on Central Server 01/20/15 18:35:17 ** condor_shadow (CONDOR_SHADOW) STARTING UP 01/20/15 18:35:17 ** C:\condor\bin\condor_shadow.exe 01/20/15 18:35:17 ** SubsystemInfo: name=SHADOW type=SHADOW(6) class=DAEMON(1) 01/20/15 18:35:17 ** Configuration: subsystem:SHADOW local:<NONE> class:DAEMON 01/20/15 18:35:17 ** $CondorVersion: 8.2.6 Dec 10 2014 BuildID: 287355 $ 01/20/15 18:35:17 ** $CondorPlatform: x86_64_Windows8 $ 01/20/15 18:35:17 ** PID = 6404 01/20/15 18:35:17 ** Log last touched 1/20 18:29:37 01/20/15 18:35:17 ****************************************************** 01/20/15 18:35:17 Using config source: C:\condor\condor_config 01/20/15 18:35:17 Using local config sources: 01/20/15 18:35:17 C:\condor\condor_config.local 01/20/15 18:35:17 config Macros = 47, Sorted = 47, StringBytes = 1205, TablesBytes = 400 01/20/15 18:35:17 CLASSAD_CACHING is OFF 01/20/15 18:35:17 Daemon Log is logging: D_ALWAYS D_ERROR 01/20/15 18:35:17 DaemonCore: command socket at <IP Central Master> 01/20/15 18:35:17 DaemonCore: private command socket at < IP Central Master > 01/20/15 18:35:17 Initializing a VANILLA shadow for job 121.0 01/20/15 18:35:17 (121.0) (6404): Request to run on slot2@WEBR1436. .xxxxx.xxxx < IP C1> was ACCEPTED 01/20/15 18:35:19 (121.0) (6404): Job 121.0 terminated: exited with status -1073741515 01/20/15 18:35:19 (121.0) (6404): Reporting job exit reason 100 and attempting to fetch new job. 01/20/15 18:35:19 (121.0) (6404): **** condor_shadow (condor_SHADOW) pid 6404 EXITING WITH STATUS 100 If you could help me out that would be so helpful. If you have any other pointers regarding the config that would also be much appreciated. Kind regards, Mitch. Mitchell Smith
Tel:
+61 7 3831 6744
BMT WBM Pty Ltd, Level 8, 200 Creek Street, Brisbane QLD 4000 Australia
E-mail confidentiality notice and disclaimer: |