Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] How to solve problem between condor and globus?
- Date: Wed, 28 Dec 2005 11:22:28 +0800
- From: "Fu-Ming Tsai" <sary357@xxxxxxxxxxxxxxxxxx>
- Subject: Re: [Condor-users] How to solve problem between condor and globus?
Thank you, Pedro,
I want to execute globus jobs because condor here is used for OSG. And
could you explain the meaning of the parameters "TESTINGMODE_*" and "UWCS_*"?
BR
On Fri, 23 Dec 2005 11:22:11 +0100, Pedro R. Br??gger Taboada wrote
> Hi Fu-Ming (i suppose this is your firstname)
>
> I can see in the condor_config osgc01.grid.sinica.edu.tw is the
> cetral manager and you're submitting the job to this host. I suppose
> you have Globus with Condor as Scheduler on it.
>
> So why do you use the Globus Universe and not Vanilla?
>
> More important are the requirements in the condor_config. Try to
> replace UWCS_* with TESTINGMODE_*. You can see the settings after
> Part3 in section:
> #####################################################################
> ## This where you choose the configuration that you would like to
> ## use. It has no defaults so it must be defined. We start this
> ## file off with the UWCS_* policy.
> ######################################################################
>
> I send you one of my config_files as example.
>
> Pedro
>
> -----Urspr??ngliche Nachricht-----
> Von: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-
> bounces@xxxxxxxxxxx] Im Auftrag von Fu-Ming Tsai Gesendet:
> Donnerstag, 22. Dezember 2005 09:44 An: Condor-Users Mail List
> Betreff: Re: [Condor-users] How to solve problem between condor and globus?
>
> Hello, Pedro,
> Please refer the attched file. It's my global condor_config file.
>
> the following is my submit file.
> [root@osgc01 root]# more /home/sary357/job/job4.jdl
> Universe = globus
> globusscheduler = osgc01.grid.sinica.edu.tw/jobmanager-condor
> Executable = job4.sh
> Output = job4.out
> Error = job4.err
> Log = job4.log
> Requirements = (Name=="vm2@xxxxxxxxxxxxxxxxxxxxxxxxx")
> should_transer_file = IF_NEEDED
> when_to_transfer_output = ON_EXIT
> Queue
> [root@osgc01 root]# more /home/sary357/job/job4.sh
> #!/bin/bash
> /bin/hostname
>
> Thank you for your attention!!
>
> BR
> On Wed, 21 Dec 2005 19:12:40 +0100, Pedro R. Br輍ger Taboada wrote
> > I see many problems, staging, universe and expression. I need to see
> > the submit file and the condor_config file. Perhaps the I can solve
> > your problem.
> >
> > Pedro
> >
> > -----Urspr??gliche Nachricht-----
> > Von: condor-users-bounces@xxxxxxxxxxx
> > [mailto:condor-users-bounces@xxxxxxxxxxx] Im Auftrag von Fu-Ming Tsai
> > Gesendet: Dienstag, 20. Dezember 2005 11:06
> > An: Condor-Users Mail List
> > Betreff: Re: [Condor-users] How to solve problem between condor and
globus?
> >
> > Sorry, all,
> > After trying so many times, I gave up and used NFS.
> > However, I still can not submit globus job to condor.
> > so, I tried to get some debug information.
> >
> > [sary357@osgc01 job]$ condor_q -analyze
> > ---
> > 4206.000: Run analysis summary. Of 4 machines,
> > 3 are rejected by your job's requirements
> > 0 reject your job because of their own requirements
> > 0 match but are serving users with a better priority in the
> > pool 1 match but reject the job for unknown reasons 0
> > match but will not currently preempt their existing job 0 are
> > available to run your job
> >
> > WARNING: Analysis is only meaningful for Globus universe jobs using
> > matchmaking.
> > ---
> > 4207.000: Run analysis summary. Of 4 machines,
> > 0 are rejected by your job's requirements
> > 3 reject your job because of their own requirements
> > 0 match but are serving users with a better priority in the
> > pool 1 match but reject the job for unknown reasons 0
> > match but will not currently preempt their existing job 0 are
> > available to run your job
> > Last successful match: Tue Dec 20 09:45:24 2005
> > Last failed match: Tue Dec 20 09:55:31 2005 Reason
> > for last match failure: no match found
> >
> > == StarterLog.vm2==
> > 12/20 17:35:36 Shadow version: $CondorVersion: 6.7.7 Apr 27 2005 $
> > 12/20 17:35:36 Submitting machine is "osgc01.grid.sinica.edu.tw"
> > 12/20 17:35:36 ShouldTransferFiles is "NO", NOT transfering files
> > 12/20 17:35:36 Submit UidDomain: "grid.sinica.edu.tw"
> > 12/20 17:35:36 Local UidDomain: "grid.sinica.edu.tw"
> > 12/20 17:35:36 Initialized user_priv as "sary357"
> >
> > 12/20 17:35:36 Done moving to directory "/opt/osg/osgs01/execute/dir_6591"
> >
> > 12/20 17:35:36 JICShadow::initIOProxy(): Job does not define WantIOProxy
> > 12/20 17:35:36 No StarterUserLog found in job ClassAd
> > 12/20 17:35:36 Starter will not write a local UserLog
> > 12/20 17:35:36 Starting a VANILLA universe job with ID: 4207.0
> > 12/20 17:35:36 In OsProc::OsProc()
> > 12/20 17:35:36 Main job KillSignal: 15 (SIGTERM)
> > 12/20 17:35:36 Main job RmKillSignal: 15 (SIGTERM)
> > 12/20 17:35:36 Main job HoldKillSignal: 15 (SIGTERM)
> > 12/20 17:35:36 in VanillaProc::StartJob()
> > 12/20 17:35:36 in OsProc::StartJob()
> > 12/20 17:35:36 IWD: /home/sary357/gram_scratch_tUb21E3Wqv
> > 12/20 17:35:36 Input file: /dev/null
> > 12/20 17:35:36 Failed to
> > open
> > '/home/sary357/.globus/job/osgc01.grid.sinica.edu.tw/17186.1135070994/std
> > out' as standard output: No such file or directory (errno 2)
> > 12/20 17:35:36 Failed to
> > open
> > '/home/sary357/.globus/job/osgc01.grid.sinica.edu.tw/17186.1135070994/std
> > err' as standard error: No such file or directory (errno 2)
> > 12/20 17:35:36 Failed to open some/all of the std files...
> > 12/20 17:35:36 Aborting OsProc::StartJob.
> > 12/20 17:35:36 Failed to start job, exiting
> > 12/20 17:35:36 ShutdownFast all jobs.
> > 12/20 17:35:36 Got ShutdownFast when no jobs running.
> > 12/20 17:35:36 Removing /opt/osg/osgs01/execute/dir_6591
> >
> > 12/20 17:35:36 Attempting to remove /opt/osg/osgs01/execute/dir_6591
> > as SuperUser (root)
> > =========================
> >
> > [sary357@osgc01 job]$ condor_q -better-analyze 4206
> >
> > -- Submitter: osgc01.grid.sinica.edu.tw : <140.109.98.41:41846> :
> > osgc01.grid.sinica.edu.tw
> > ---
> > 4206.000: Run analysis summary. Of 4 machines,
> > 3 are rejected by your job's requirements
> > 0 reject your job because of their own requirements
> > 0 match but are serving users with a better priority in the
> > pool 1 match but reject the job for unknown reasons 0
> > match but will not currently preempt their existing job 0 are
> > available to run your job
> >
> > The Requirements expression for your job is:
> >
> > ( ( target.Name == "vm2@xxxxxxxxxxxxxxxxxxxxxxxxx" ) )
> >
> > Condition Machines Matched Suggestion
> > --------- ---------------- ----------
> > 1 ( ( target.Name == "vm2@xxxxxxxxxxxxxxxxxxxxxxxxx" ) )
> > 1
> >
> > WARNING: Analysis is only meaningful for Globus universe jobs using
> > matchmaking.
> > [sary357@osgc01 job]$ condor_q -better-analyze 4207
> >
> > -- Submitter: osgc01.grid.sinica.edu.tw : <140.109.98.41:41846> :
> > osgc01.grid.sinica.edu.tw
> > Segmentation fault
> >
> > I'm sure the FileDomain in those 2 machines are the same.
> > It looks like the output file and error file can not be built. Does
> > anyone know?
> >
> > BR
> >
> > ----------------------------------------------------------------------
> > "Gravitation is not responsible for people falling in love."
> >
> > Fu-Ming Tsai
> > Academia Sinica Computing Centre, Academia Sinica
> > sary357@xxxxxxxxxxxxxxxxxx
> > ------------------------------------------------------------------------
> >
> > _______________________________________________
> > Condor-users mailing list
> > Condor-users@xxxxxxxxxxx
> > https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> >
> > _______________________________________________
> > Condor-users mailing list
> > Condor-users@xxxxxxxxxxx
> > https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> ----------------------------------------------------------------------
> "Gravitation is not responsible for people falling in love."
>
> Fu-Ming Tsai
> Academia Sinica Computing Centre, Academia Sinica
> sary357@xxxxxxxxxxxxxxxxxx
> ------------------------------------------------------------------------
----------------------------------------------------------------------
"Gravitation is not responsible for people falling in love."
Fu-Ming Tsai
Academia Sinica Computing Centre, Academia Sinica
sary357@xxxxxxxxxxxxxxxxxx
------------------------------------------------------------------------