Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Error funning jobs on hetrogenous cluster
- Date: Wed, 24 Oct 2007 09:37:21 +0100
- From: "Kewley, J \(John\)" <j.kewley@xxxxxxxx>
- Subject: Re: [Condor-users] Error funning jobs on hetrogenous cluster
I declare my own "OPSYS_FLAVOUR" for each LINUX in my Pool.
You will also need to add it to the STARTD ClassAds.
It can then be specified in the REQUIREMENTS statement
See
http://epubs.cclrc.ac.uk/bitstream/1725/CondorGotchas.ppt
and
http://epubs.cclrc.ac.uk/bitstream/1723/Gotchas2.ppt
for hints+tips for this
I use it generally for building releases of tarballs, then I know I have
a built version for that distro.
Unfortunately this is done manually. It's a shame that finer-grained
information isn't available from Condor by default, but I think the current
string is obtained in the same way as "uname -a".
If you use cron + Hawkeye for automatically updated ClassAds, then you can
always add something to "work out" what distro you have.
Cheers
JK
> -----Original Message-----
> From: condor-users-bounces@xxxxxxxxxxx
> [mailto:condor-users-bounces@xxxxxxxxxxx]On Behalf Of Atle Rudshaug
> Sent: Wednesday, October 24, 2007 9:22 AM
> To: condor-users@xxxxxxxxxxx
> Subject: [Condor-users] Error funning jobs on hetrogenous cluster
>
>
> I have a test cluster with one Debian, one Kubuntu and one Fedora
> node. I get different errors on all the nodes. I guess I need a local
> executable on every node compiled for that spesific distro? Is there
> some kind of requirement I can state in the submit file that can
> specify distro the executable needs to run? Is there some way to send
> my own libraries that my executable needs or do I have to have them on
> the same path on each node? Can I have them on NFS? Guess I need to
> compile them with NFS paths to lib-files in the Makefile then?
>
> #Submit file
> universe = vanilla
> executable = dagoc
> output = dagoc.out.$(CLUSTER).$(PROCESS)
> error = dagoc.err.$(CLUSTER).$(PROCESS)
> log = dagoc.log.$(CLUSTER)
> should_transfer_files = YES
> when_to_transfer_output = ON_EXIT
> transfer_input_files = /mnt/dagocproject/dbases/TEST.db
> arguments = -c -start=10 -stop=20
> /mnt/dagocproject/setups/TEST_remote.sup
> queue 5
>
>
> What does the following error mean?
> dagoc.err.102.0 and dagoc.err.102.4
> --------------------------------------------------------------
> ----------------------
> condor_exec.exe: symbol lookup error: condor_exec.exe: undefined
> symbol:
> _ZSt22__uninitialized_copy_aIN9__gnu_cxx17__normal_iteratorIPK
> SsSt6vectorISsSaISsEEEEPSsSsET0_T_SA_S9_SaIT1_E
>
>
> Here I need to compile the executable on the node that got this error.
> dagoc.err.102.1
> --------------------------------------------------------------
> ----------------------
> condor_exec.exe: /lib/tls/i686/cmov/libc.so.6: version `GLIBC_2.4' not
> found (required by condor_exec.exe)
>
> dagoc.log.11:
> --------------------------------------------------------------
> ----------------------
> 000 (102.000.000) 10/24 09:37:23 Job submitted from host: <xxx.247>
> ...
> 000 (102.001.000) 10/24 09:37:23 Job submitted from host: <xxx.247>
> ...
> 000 (102.002.000) 10/24 09:37:23 Job submitted from host: <xxx.247>
> ...
> 000 (102.003.000) 10/24 09:37:23 Job submitted from host: <xxx.247>
> ...
> 000 (102.004.000) 10/24 09:37:23 Job submitted from host: <xxx.247>
> ...
> 001 (102.000.000) 10/24 09:37:30 Job executing on host: <xxx.251>
> ...
> 005 (102.000.000) 10/24 09:37:32 Job terminated.
> (1) Normal termination (return value 127)
> Usr 0 00:00:00, Sys 0 00:00:00 - Run Remote Usage
> Usr 0 00:00:00, Sys 0 00:00:00 - Run Local Usage
> Usr 0 00:00:00, Sys 0 00:00:00 - Total Remote Usage
> Usr 0 00:00:00, Sys 0 00:00:00 - Total Local Usage
> 819 - Run Bytes Sent By Job
> 23796974 - Run Bytes Received By Job
> 819 - Total Bytes Sent By Job
> 23796974 - Total Bytes Received By Job
> ...
> 001 (102.001.000) 10/24 09:37:32 Job executing on host: <xxx.245>
> ...
> 005 (102.001.000) 10/24 09:37:32 Job terminated.
> (1) Normal termination (return value 1)
> Usr 0 00:00:00, Sys 0 00:00:00 - Run Remote Usage
> Usr 0 00:00:00, Sys 0 00:00:00 - Run Local Usage
> Usr 0 00:00:00, Sys 0 00:00:00 - Total Remote Usage
> Usr 0 00:00:00, Sys 0 00:00:00 - Total Local Usage
> 107 - Run Bytes Sent By Job
> 23796974 - Run Bytes Received By Job
> 107 - Total Bytes Sent By Job
> 23796974 - Total Bytes Received By Job
> ...
> 001 (102.002.000) 10/24 09:37:32 Job executing on host: <xxx.247>
> ...
> 001 (102.003.000) 10/24 09:37:34 Job executing on host: <xxx.247>
> ...
> 001 (102.004.000) 10/24 09:37:39 Job executing on host: <xxx.251>
> ...
> 005 (102.004.000) 10/24 09:37:39 Job terminated.
> (1) Normal termination (return value 127)
> Usr 0 00:00:00, Sys 0 00:00:00 - Run Remote Usage
> Usr 0 00:00:00, Sys 0 00:00:00 - Run Local Usage
> Usr 0 00:00:00, Sys 0 00:00:00 - Total Remote Usage
> Usr 0 00:00:00, Sys 0 00:00:00 - Total Local Usage
> 819 - Run Bytes Sent By Job
> 23796974 - Run Bytes Received By Job
> 819 - Total Bytes Sent By Job
> 23796974 - Total Bytes Received By Job
> ...
> 005 (102.002.000) 10/24 09:37:46 Job terminated.
> (1) Normal termination (return value 0)
> Usr 0 00:00:07, Sys 0 00:00:00 - Run Remote Usage
> Usr 0 00:00:00, Sys 0 00:00:00 - Run Local Usage
> Usr 0 00:00:07, Sys 0 00:00:00 - Total Remote Usage
> Usr 0 00:00:00, Sys 0 00:00:00 - Total Local Usage
> 4536073 - Run Bytes Sent By Job
> 23796974 - Run Bytes Received By Job
> 4536073 - Total Bytes Sent By Job
> 23796974 - Total Bytes Received By Job
> ...
> 005 (102.003.000) 10/24 09:37:48 Job terminated.
> (1) Normal termination (return value 0)
> Usr 0 00:00:07, Sys 0 00:00:00 - Run Remote Usage
> Usr 0 00:00:00, Sys 0 00:00:00 - Run Local Usage
> Usr 0 00:00:07, Sys 0 00:00:00 - Total Remote Usage
> Usr 0 00:00:00, Sys 0 00:00:00 - Total Local Usage
> 4536071 - Run Bytes Sent By Job
> 23796974 - Run Bytes Received By Job
> 4536071 - Total Bytes Sent By Job
> 23796974 - Total Bytes Received By Job
> ...
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to
> condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
>