Hello,zhaokun
You give me three advices,but I also have some puzzle 1.mpi can run well without condor 2.how to add some "echo ..." statement to trace errors?can you tell me in detail 3.as follows: 7/8 10:41:34 ****************************************************** 7/8 10:41:34 ** condor_shadow (CONDOR_SHADOW) STARTING UP 7/8 10:41:34 ** /usr/local/src/condor/sbin/condor_shadow 7/8 10:41:34 ** $CondorVersion: 7.0.5 Sep 20 2008 BuildID: 105846 $ 7/8 10:41:34 ** $CondorPlatform: I386-LINUX_RH9 $ 7/8 10:41:34 ** PID = 6554 7/8 10:41:34 ** Log last touched 7/8 10:33:26 7/8 10:41:34 ****************************************************** 7/8 10:41:34 Using config source: /usr/local/src/condor/etc/condor_config 7/8 10:41:34 Using local config sources: 7/8 10:41:34 /usr/local/src/condor/local.node1/condor_config.local 7/8 10:41:34 DaemonCore: Command Socket at <192.168.0.101:33644> 7/8 10:41:34 Initializing a PARALLEL shadow for job 44.0 7/8 10:41:35 (44.0) (6554): Request to run on <192.168.0.116:33302> was ACCEPTED 7/8 10:41:35 (44.0) (6554): Request to run on <192.168.0.101:32793> was ACCEPTED 7/8 10:41:35 ****************************************************** 7/8 10:41:35 ** condor_starter (CONDOR_STARTER) STARTING UP 7/8 10:41:35 ** /usr/local/src/condor/sbin/condor_starter 7/8 10:41:35 ** $CondorVersion: 7.0.5 Sep 20 2008 BuildID: 105846 $ 7/8 10:41:35 ** $CondorPlatform: I386-LINUX_RH9 $ 7/8 10:41:35 ** PID = 6555 7/8 10:41:35 ** Log last touched 7/8 10:32:56 7/8 10:41:35 ****************************************************** 7/8 10:41:35 Using config source: /usr/local/src/condor/etc/condor_config 7/8 10:41:35 Using local config sources: 7/8 10:41:35 /usr/local/src/condor/local.node1/condor_config.local 7/8 10:41:35 DaemonCore: Command Socket at <192.168.0.101:33651> 7/8 10:41:35 Done setting resource limits 7/8 10:41:36 Communicating with shadow <192.168.0.101:33644> 7/8 10:41:36 Submitting machine is "node1.localdomain" 7/8 10:41:36 setting the orig job name in starter 7/8 10:41:36 setting the orig job iwd in starter 7/8 10:41:36 Job has WantIOProxy=true 7/8 10:41:36 Initialized IO Proxy. 7/8 10:41:36 File transfer completed successfully. 7/8 10:41:37 Job 44.0 set to execute immediately 7/8 10:41:37 Starting a PARALLEL universe job with ID: 44.0 7/8 10:41:37 IWD: /usr/local/src/condor/local.node1/execute/dir_6555 7/8 10:41:37 Output file: /usr/local/src/condor/local.node1/execute/dir_6555/hello.out 7/8 10:41:37 Error file: /usr/local/src/condor/local.node1/execute/dir_6555/hello.err 7/8 10:41:37 About to exec /usr/local/src/condor/local.node1/execute/dir_6555/condor_exec.exe hello 2 7/8 10:41:37 Create_Process succeeded, pid=6557 7/8 10:41:37 IOProxy: accepting connection from 192.168.0.101 7/8 10:41:37 IOProxyHandler: closing connection to 192.168.0.101 what is wrong with it? I really need a help! Any help will be appraciated. Regards, Han --- 09年7月8日,周三, zhaokun <zhaokun@xxxxxxxxxxxxx> 写道: > 发件人: zhaokun <zhaokun@xxxxxxxxxxxxx> > 主题: Re: [Condor-users] THE MPI JOB ALWAYS IN "RUNNING" > 收件人: "Condor-Users Mail List" <condor-users@xxxxxxxxxxx> > 日期: 2009年7月8日,周三,上午10:55 > Hi Condor-Users Mail List, > > Sorry to reply so late. > > 1. check you mpi settings > 2. add some "echo ..." statement to trace > errors. > 3. view log files to get more info. > SchedLog,StartLog,StarterLog ... > ------------------ > > > zhaokun > > 2009-07-08 > > ------------------------------------------------------------- > From:Hehe cmesunoom@xxxxxxxx > Date:2009-07-07 09:36:01 > To:Condor-Users Mail List condor-users@xxxxxxxxxxx > cc: > Title:Re: [Condor-users] THE MPI JOB ALWAYS IN "RUNNING" > > hello,zhaokun > my mpi job submit description file is as followed: > universe=parallel > executable=/usr/local/condor/etc/examples/mp1script > arguments=hello > log=hello.log > output=hello.out > error=hello.err > machine_count=2 > should_transfer_files=yes > when_to_transfer_output=on_exit > transfer_input_files=hello > queue > > that is all,does it have any problem? > thanks in advance. > Han.(你是中国人吧?方便的话可以直接用汉语交流吗?我的英语很糟粕) > > --- 09年7月7日,周二, zhaokun <zhaokun@xxxxxxxxxxxxx> > 写道: > > > 发件人: zhaokun <zhaokun@xxxxxxxxxxxxx> > 主题: Re: [Condor-users] THE MPI JOB ALWAYS IN "RUNNING" > 收件人: "Condor-Users Mail List" <condor-users@xxxxxxxxxxxx> > 日期: 2009年7月7日,周二,上午9:15 > > > Hi Condor-Users Mail List, > > Please attach your job script file to find the > reason. > ------------------ > zhaokun > 2009-07-07 > > ------------------------------------------------------------- > From:Hehe cmesunoom@xxxxxxxx > Date:2009-07-06 18:47:50 > To:condor-users condor-users@xxxxxxxxxxx > cc: > Title:[Condor-users] THE MPI JOB ALWAYS IN "RUNNING" > > hello,all > when I submit mpi job on condor,the job stay in the state > "running" all the time > > ************hello_log file*************** > Job submitted from host:<.......> > Node 0 executing on host:<........> > Job executing on host:MPI_job > > so I want to know the reason for it > > Any help will be appraciated. > Regards, > Han > > > > ___________________________________________________________ > > 好玩贺卡等你发,邮箱贺卡全新上线! > http://card.mail.cn.yahoo.com/ > _______________________________________________ > Condor-users mailing list > To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx > with a > subject: Unsubscribe > You can also unsubscribe by visiting > https://lists.cs.wisc.edu/mailman/listinfo/condor-users > > > The archives can be found at: > https://lists.cs.wisc.edu/archive/condor-users/ > _______________________________________________ > Condor-users mailing list > To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx > with a > subject: Unsubscribe > You can also unsubscribe by visiting > https://lists.cs.wisc.edu/mailman/listinfo/condor-users > > The archives can be found at: > https://lists.cs.wisc.edu/archive/condor-users/ > > > > > ___________________________________________________________ > > 好玩贺卡等你发,邮箱贺卡全新上线! > > http://card.mail.cn.yahoo.com/ > _______________________________________________ > Condor-users mailing list > To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx > with a > subject: Unsubscribe > You can also unsubscribe by visiting > https://lists.cs.wisc.edu/mailman/listinfo/condor-users > > The archives can be found at: > https://lists.cs.wisc.edu/archive/condor-users/ > _______________________________________________ > Condor-users mailing list > To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx > with a > subject: Unsubscribe > You can also unsubscribe by visiting > https://lists.cs.wisc.edu/mailman/listinfo/condor-users > > The archives can be found at: > https://lists.cs.wisc.edu/archive/condor-users/ > |