Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Condor on X86_64 no run works
- Date: Mon, 5 Nov 2007 13:52:30 -0000
- From: "Kewley, J \(John\)" <j.kewley@xxxxxxxx>
- Subject: Re: [Condor-users] Condor on X86_64 no run works
Some thoughts:
1. You mention "flock". You shouldn't need this if you just have a
single pool.
2. I notice you have vm1, vm2 ... vm5 mentioned, that implies more than
4 processors
per node, you might have hyperthreading turned on, in which case
condor will register
(possibly) 8 slots per node.
3. Have you tried
condor_q -anal
or
condor_q -better-anal
to see why it isn't matching?
4. You do a "queue 5", but all the jobs write to the same error and
output files,
this may not be what is desired. To write to different ones, use
something like
output = loop$(PROCESS).out
error = loop$(PROCESS).err
5. I can't see a
log = loop.log
line, this is useful - have a look in there to see what is produced.
[Note: don't use $(PROCESS) for this one
6. Have a look in the SchedLog of your submit node to see what is in
there
7. Are these nodes on a cluster, i.e. on a private network, if so then
you
will need full connectivity between all submit nodes and all execute
nodes.
See paper and presentation on
http://epubs.cclrc.ac.uk/work-details?w=34452
for more details
Good luck
JK
> -----Original Message-----
> From: condor-users-bounces@xxxxxxxxxxx
> [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of jmferrer
> Sent: Monday, November 05, 2007 12:34 PM
> To: condor-users@xxxxxxxxxxx
> Subject: [Condor-users] Condor on X86_64 no run works
>
> Hi.
>
> I'm trying build a Cluster with:
>
> OpenSuse 10.2
> Condor-6.8.6
> Kernel suse 2.6.18.2-34-default
>
>
> System:
>
> 1 Central Manager 1cpu x P4 ----------> no execute and yes flock
> 19 nodes 2 quadcore inet X86_64
>
> I share /home in Central manger (for all nodes NFS)
>
> If I run condor_status
>
> gargamel:/home/condor # condor_status
>
> Name OpSys Arch State Activity LoadAv Mem
> ActvtyTime
>
> vm1@smurf0 LINUX X86_64 Owner Idle 0.000
> 996 0+00:06:45
> vm2@smurf0 LINUX X86_64 Unclaimed Idle 0.000
> 996 4+23:45:04
> vm3@smurf0 LINUX X86_64 Unclaimed Idle 0.000
> 996 4+23:45:05
> vm4@smurf0 LINUX X86_64 Unclaimed Idle 0.000
> 996 4+23:45:07
> vm5@smurf0 LINUX X86_64 Unclaimed Idle 0.000
> 996 4+23:45:08
> ..............................
> Total 87 1 0 86 0
> 0 0
>
> some nodes is off
>
> My submit file
> gargamel:/home/condor # cat /home/pepe/test_condor/loop.submit
> #archivo de descripcion generado automaticamente universe =
> vanilla executable = loop output = loop.out error = loop.err
> Requirements = (Arch =="INTEL" && OpSys == "LINUX") || \
> (Arch =="X86_64" && OpSys == "LINUX") queue 5
>
>
>
>
> somebody can show me how do work this?
>
>
>
> Sorry for my englis, I'm from almeria IR.
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to
> condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
>