Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] Ask a question about condor setting
- Date: Sun, 9 Apr 2006 18:44:16 +0800
- From: "Fu-Ming Tsai" <sary357@xxxxxxxxxxxxxxxxxx>
- Subject: [Condor-users] Ask a question about condor setting
Dear all,
I'd like to submit 2 condor jobs to my testing cluster. Those 2 jobs use
the same script which counts the square root of a number. There are 2
machines in my cluster. Also, I installed a P4 CPU with Hyper-Threading in
each machine. So, we can use the condor_status to get the status result.
[root@tb032 log]# condor_status
Name OpSys Arch State Activity LoadAv Mem
ActvtyTime
vm1@xxxxxxxxx LINUX INTEL Unclaimed Idle 0.000 496
0+00:34:49
vm2@xxxxxxxxx LINUX INTEL Unclaimed Idle 0.000 496
0+00:00:06
vm1@xxxxxxxxx LINUX INTEL Owner Idle 0.000 1009
0+00:15:09
vm2@xxxxxxxxx LINUX INTEL Owner Idle 0.000 1009
0+00:15:10
Machines Owner Claimed Unclaimed Matched Preempting
INTEL/LINUX 4 2 0 2 0 0
Total 4 2 0 2 0 0
However, when i submit 2 jobs like the following
[sary357@tb032 job]$ condor_submit job1.jdl
Submitting job(s).
Logging submit event(s).
1 job(s) submitted to cluster 71.
[sary357@tb032 job]$ condor_submit job2.jdl
Submitting job(s).
Logging submit event(s).
1 job(s) submitted to cluster 72.
I've got kind of strange status like the following:
[sary357@tb032 job]$ condor_q
-- Submitter: tb032.grid.sinica.edu.tw : <140.109.98.82:33472> :
tb032.grid.sinica.edu.tw
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
71.0 sary357 4/9 18:03 0+00:08:34 R 0 0.0 job1.sh
72.0 sary357 4/9 18:03 0+00:00:00 I 0 0.0 job1.sh
2 jobs; 1 idle, 1 running, 0 held
[root@tb032 log]# condor_status
Name OpSys Arch State Activity LoadAv Mem
ActvtyTime
vm1@xxxxxxxxx LINUX INTEL Claimed Busy 0.000 496
0+00:00:03
vm2@xxxxxxxxx LINUX INTEL Owner Idle 0.070 496
0+00:00:06
vm1@xxxxxxxxx LINUX INTEL Owner Idle 0.020 1009
0+00:20:09
vm2@xxxxxxxxx LINUX INTEL Owner Idle 0.000 1009
0+00:20:10
Machines Owner Claimed Unclaimed Matched Preempting
INTEL/LINUX 4 3 1 0 0 0
Total 4 3 1 0 0 0
I've trid to display all information of osgs01:
Start = ((KeyboardIdle > 15 * 60) && (((LoadAvg - CondorLoadAvg) <=
0.000000) || (State != "Unclaimed" && State != "Owner")))
I found out LoadAvg is 0.0 and CondorLoadAvg is 0.0, too. KeyboardIdle is
5707.
It's kind of strange why only vm1@xxxxxxxxx run a job, but the state of
vm2@xxxxxxxxx change from unclaimed to claimed without executing any jobs?
Of course, the loading of the host osgs01 is very high when running single
job like the following.
[root@osgs01 log]# top
18:06:36 up 14 days, 3:27, 1 user, load average: 0.94, 0.44, 0.16
61 processes: 59 sleeping, 2 running, 0 zombie, 0 stopped
CPU states: cpu user nice system irq softirq iowait idle
total 0.0% 48.1% 1.9% 0.0% 0.0% 0.0% 49.9%
cpu00 0.0% 45.2% 1.8% 0.0% 0.0% 0.0% 52.9%
cpu01 0.0% 51.0% 2.0% 0.0% 0.0% 0.0% 47.0%
Mem: 1017308k av, 997708k used, 19600k free, 0k shrd, 188832k buff
365088k actv, 254800k in_d, 20000k in_c
Swap: 2096472k av, 0k used, 2096472k free 491280k
cached
PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND
32467 sary357 35 10 984 984 872 R N 50.1 0.0 2:56 0
condor_exec.e
...
But I do not understand why the state of vm2@osgs01 changed automatically.
It's because of loading of osgs01.gr? or? What setting can I modify to
utilize the whole computing power? Could anyone know?
Best regards,
Fu-Ming
----------------------------------------------------------------------
"Gravitation is not responsible for people falling in love."
Fu-Ming Tsai
Academia Sinica Grid Computing Centre
sary357@xxxxxxxxxxxxxxxxxx
------------------------------------------------------------------------