Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Problems in Condor-C
- Date: Mon, 04 Jan 2010 15:51:07 -0600
- From: Dan Bradley <dan@xxxxxxxxxxxx>
- Subject: Re: [Condor-users] Problems in Condor-C
Hi Hailong,
I have reproduced the problem you reported. I havn't fully understood
it, but I did find that I could make things work if I submit the
original job with file transfer turned on. In other words, change your
submit file to this:
universe = grid
grid_resource = condor euchina08.buaa.edu.cn euchina08.buaa.edu.cn
executable = simple.sh
output = simple.out
error = simple.err
log = simple.log
remote_universe = vanilla
+remote_requirements = True
ShouldTransferFiles = yes
WhenToTransferOutput = ON_EXIT
queue
--Dan
hailong.yang1115 wrote:
> Hi Alain,
> There are the corresponding log files from the execute node in the
> attachment.
> -Hailong
> 2010-01-02
> ------------------------------------------------------------------------
> ***********************************************
> * Hailong Yang, PhD. Candidate
> * Sino-German Joint Software Institute,
> * School of Computer Science&Engineering, Beihang University
> * Phone: (86-010)82315908
> * Email: hailong.yang1115@xxxxxxxxx <mailto:hailong.yang1115@xxxxxxxxx>
> * Address: G413, New Main Building in Beihang University,
> * No.37 XueYuan Road,HaiDian District,
> * Beijing,P.R.China,100191
> ***********************************************
> ------------------------------------------------------------------------
> *发件人:* Alain Roy
> *发送时间:* 2010-01-01 00:14:03
> *收件人:* Condor-Users Mail List
> *抄送:*
> *主题:* Re: [Condor-users] Problems in Condor-C
> Hi Hailong,
> Do you have the corresponding logs from the execute side? The StartLog
> or StarterLog might have more detail on that error.
> -alain
> On Dec 31, 2009, at 9:53 AM, hailong.yang1115 wrote:
> > Hi everyone,
> >
> > Recently we configured two condor pools to flock jobs using
> Condor-C. The problem is when the jobs appear in the remote condor
> pool, they stay idle all the way. There is error in the ShadowLog file:
> > 06/07 12:43:20 ******************************************************
> > 06/07 12:43:20 ** condor_shadow (CONDOR_SHADOW) STARTING UP
> > 06/07 12:43:20 ** /opt/condor-7.4.1/sbin/condor_shadow
> > 06/07 12:43:20 ** SubsystemInfo: name=SHADOW type=SHADOW(6)
> class=DAEMON(1)
> > 06/07 12:43:20 ** Configuration: subsystem:SHADOW local:<NONE>
> class:DAEMON
> > 06/07 12:43:20 ** $CondorVersion: 7.4.1 Dec 17 2009 BuildID: 204351 $
> > 06/07 12:43:20 ** $CondorPlatform: I386-LINUX_RHEL3 $
> > 06/07 12:43:20 ** PID = 11152
> > 06/07 12:43:20 ** Log last touched 6/7 12:43:20
> > 06/07 12:43:20 ******************************************************
> > 06/07 12:43:20 Using config source: /opt/condor-7.4.1/etc/condor_config
> > 06/07 12:43:20 Using local config sources:
> > 06/07 12:43:20 /opt/condor-7.4.1/local.euchina08/condor_config.local
> > 06/07 12:43:20 DaemonCore: Command Socket at <202.38.140.91:38889>
> > 06/07 12:43:20 Initializing a VANILLA shadow for job 5.0
> > 06/07 12:43:20 (5.0) (11152): Request to run on
> slot1@xxxxxxxxxxxxxxxxxxxxx <202.38.140.91:38395> was ACCEPTED
> > 06/07 12:43:20 (5.0) (11152): ERROR "Error from
> slot1@xxxxxxxxxxxxxxxxxxxxx: FileTransfer: DownloadFiles called on
> server sid
> > e" at line 655 in file pseudo_ops.cpp
> >
> > Here is the job description file:
> > [ddg2@www simple_test]$ cat simple.submit
> > universe = grid
> > grid_resource = condor euchina08.buaa.edu.cn euchina08.buaa.edu.cn
> > executable = simple.sh
> > output = simple.out
> > error = simple.err
> > log = simple.log
> > remote_universe = vanilla
> > +remote_requirements = True
> > +remote_ShouldTransferFiles = "YES"
> > +remote_WhenToTransferOutput = "ON_EXIT"
> > queue
> >
> > [ddg2@www simple_test]$ cat simple.sh
> > #!/bin/sh
> > echo "Start to sleep for 5 seconds"
> > sleep 5
> > echo "All done"
> >
> > Any clue?
> >
> > -Hailong
> >
> > 2009-12-31
> > ***********************************************
> > * Hailong Yang, PhD. Candidate
> > * Sino-German Joint Software Institute,
> > * School of Computer Science&Engineering, Beihang University
> > * Phone: (86-010)82315908
> > * Email: hailong.yang1115@xxxxxxxxx
> > * Address: G413, New Main Building in Beihang University,
> > * No.37 XueYuan Road,HaiDian District,
> > * Beijing,P.R.China,100191
> > ***********************************************
> > _______________________________________________
> > Condor-users mailing list
> > To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
> with a
> > subject: Unsubscribe
> > You can also unsubscribe by visiting
> > https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> >
> > The archives can be found at:
> > https://lists.cs.wisc.edu/archive/condor-users/
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
> ------------------------------------------------------------------------
>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/