[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Help Me.



Dear Adarsh,
I resolve my problem by setting the following options in condor
config files for machines in condor pool.

UID_DOMAIN=eng4.shirazu.ac.ir
TRUST_UID_DOMAIN=True
SOFT_UID_DOMAIN=True
FILESYSTEM_DOMAIN=eng4.shirazu.ac.ir


On 4/25/07, Adarsh Patil <adarshvp@xxxxxxxxx> wrote:
Hi Mehdi,

1)  I would say there is some problem with the Condor universe.

2) Check the permission of the folder/directory where you are writing
stdout/stderr

3) Check ports are opened and firewalls are not blocking .

4) you are using pre-ws-gram . Check their compatibility with condor-G.

Please follow these links

http://vdt.cs.wisc.edu/releases/1.4.0/submitting_wsgram_jobs.html

http://www.cs.wisc.edu/condor/manual/v6.8/5_3Grid_Universe.html

Let me know if you succeed or else we have to know the reason !

Good Luck,
Adarsh


On 4/24/07, Mehdi Sheikhalishahi <mehdi.alishahi@xxxxxxxxx> wrote:
> Hi Adarsh,
> I installed Globus and Condor(as Central Manager) on a Server named
> Server.eng4.shirazu.ac.ir. I can submit my jobs to fork jobmanager by
> grid user. I want to submit a job to Condor pool(ver 6.8.4) via
> Globus(ver 4.0.3) GRAM(non-WS-GRAM). I define the
> following RSL script. I submit my job by "globusrun  -f test2.rsl"
> from Server itself as a Client. My job goes to Held state. My RSL
> script file(test2.rsl) is:
>
------------------------test2.rsl-----------------------------------
> +
> ( &(resourceManagerContact="
Server.eng4.shirazu.ac.ir/jobmanager-condor")
>   (count=1)
>   (label="subjob 0")
>   (environment=(GLOBUS_DUROC_SUBJOB_INDEX 0)
>                (LD_LIBRARY_PATH /usr/local/globus- 4.0.3/lib/))
>   (directory="/home/grid/globusTest/GRAM/Test2")
>   (executable="/bin/ls")
>   (arguments  = "-R" "/tmp")
>   (stdout="lsoutput")
>   (stderr="lserr")
> )
>
-----------------------------------------------------------------------
>
> The output of globus-condor.log file is:
> --------------------------
globus-condor.log-----------------------------
> <c>
>    <a n="MyType"><s>SubmitEvent</s></a>
>    <a n="EventTypeNumber"><i>0</i></a>
>    <a n="EventTime"><s>2007-04-18T10:47:58</s></a>
>    <a n="Cluster"><i>126</i></a>
>    <a n="Proc"><i>0</i></a>
>    <a n="Subproc"><i>0</i></a>
>    <a
n="SubmitHost"><s>&lt;192.168.1.254:47104&gt;</s></a>
> </c>
> <c>
>    <a n="MyType"><s>SubmitEvent</s></a>
>    <a n="EventTypeNumber"><i>0</i></a>
>    <a n="EventTime"><s>2007-04-18T10:47:58</s></a>
>    <a n="Cluster"><i>126</i></a>
>    <a n="Proc"><i>0</i></a>
>    <a n="Subproc"><i>0</i></a>
>    <a
n="SubmitHost"><s>&lt;192.168.1.254:47104&gt;</s></a>
> </c>
> <c>
>    <a n="MyType"><s>ShadowExceptionEvent</s></a>
>    <a n="EventTypeNumber"><i>7</i></a>
>    <a n="EventTime"><s>2007-04-18T10:48:02</s></a>
>    <a n="Cluster"><i>126</i></a>
>    <a n="Proc"><i>0</i></a>
>    <a n="Subproc"><i>0</i></a>
>    <a n="Message"><s>Error from starter on localhost001: Failed to
> open
'/home/grid/.globus/job/server.eng4.shirazu.ac.ir/15222.1176880678/stdout'
> as standard output: Permission denied (errno 13)</s></a>
>    <a n="SentBytes"><r>0.000000000000000E+00 </r></a>
>    <a n="ReceivedBytes"><r>0.000000000000000E+00</r></a>
> </c>
> <c>
>    <a n="MyType"><s>JobHeldEvent</s></a>
>    <a n="EventTypeNumber"><i>12</i></a>
>    <a n="EventTime"><s>2007-04-18T10:48:02</s></a>
>    <a n="Cluster"><i>126</i></a>
>    <a n="Proc"><i>0</i></a>
>    <a n="Subproc"><i>0</i></a>
>    <a n="HoldReason"><s>Error from starter on localhost001: Failed to
> open
'/home/grid/.globus/job/server.eng4.shirazu.ac.ir/15222.1176880678/stdout'
> as standard output: Permission denied (errno 13)</s></a>
>    <a n="HoldReasonCode"><i>7</i></a>
>    <a n="HoldReasonSubCode"><i>7</i></a>
> </c>
> <c>
>    <a n="MyType"><s>ShadowExceptionEvent</s></a>
>    <a n="EventTypeNumber"><i>7</i></a>
>    <a n="EventTime"><s>2007-04-18T10:48:02</s></a>
>    <a n="Cluster"><i>126</i></a>
>    <a n="Proc"><i>0</i></a>
>    <a n="Subproc"><i>0</i></a>
>    <a n="Message"><s>Error from starter on localhost001: Failed to
> open
'/home/grid/.globus/job/server.eng4.shirazu.ac.ir/15222.1176880678/stdout'
> as standard output: Permission denied (errno 13)</s></a>
>    <a n="SentBytes"><r>0.000000000000000E+00</r></a>
>    <a n="ReceivedBytes"><r>0.000000000000000E+00 </r></a>
> </c>
> <c>
>    <a n="MyType"><s>JobHeldEvent</s></a>
>    <a n="EventTypeNumber"><i>12</i></a>
>    <a n="EventTime"><s>2007-04-18T10:48:02</s></a>
>    <a n="Cluster"><i>126</i></a>
>    <a n="Proc"><i>0</i></a>
>    <a n="Subproc"><i>0</i></a>
>    <a n="HoldReason"><s>Error from starter on localhost001: Failed to
> open
'/home/grid/.globus/job/server.eng4.shirazu.ac.ir/15222.1176880678/stdout'
> as standard output: Permission denied (errno 13)</s></a>
>    <a n="HoldReasonCode"><i>7</i></a>
>    <a n="HoldReasonSubCode"><i>7</i></a>
> </c>
>
-------------------------------------------------------------------
>
> Can u please help me?
>
>
> --
> Best Regards,
> S.Mehdi Sheikhalishahi,
> Web: http://www.cse.shirazu.ac.ir/~alishahi/
> Bye.
>




--
Best Regards,
S.Mehdi Sheikhalishahi,
Web: http://www.cse.shirazu.ac.ir/~alishahi/
Bye.