[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] Resending: Solaris 10 - All jobs idling for ever...
- Date: Mon, 19 Sep 2005 15:08:59 -0400
- From: Bruno Goncalves <bgoncalves@xxxxxxxxx>
- Subject: [Condor-users] Resending: Solaris 10 - All jobs idling for ever...
Hi,
I'm trying to set up a pool in Solaris 10 (using the Solaris 9 distribution since there doesn't seem to be a version 10 distro yet), but I'm running in to a few problems... All the jobs I submit remain idle for ever... I tried with quick and dirty unix commands like "sleep 10" and "date" just to try it out but with no luck. What I'm seeing right now is this:
bgoncal@lab1a> condor_q
-- Submitter: lab1a : <170.140.151.110:60209> : lab1a
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
1.0 bgoncal 9/14 13:24 0+00:00:00 I 0 0.0 sleep 10
1.1 bgoncal 9/14 13:24 0+00:00:00 I 0 0.0 sleep 10
1.2 bgoncal 9/14 13:24 0+00:00:00 I 0 0.0 sleep 10
1.3
bgoncal 9/14 13:24 0+00:00:00 I 0 0.0 sleep 10
.
.
.
bgoncal@lab1a> condor_q -analyze 1.0
-- Submitter: lab1a : <
170.140.151.110:60209
> : lab1a
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
---
001.000: Run analysis summary. Of 123 machines,
0 are rejected by your job's requirements
3 reject your job because of their own requirements
0 match but are serving users with a better priority in the pool
120 match but reject the job for unknown reasons
0 match but will not currently preempt their existing job
0 are available to run your job
Last successful match: Thu Sep 15 11:51:06 2005
bgoncal@lab1a>
bgoncal@lab1a> condor_status
Name OpSys Arch State Activity LoadAv Mem ActvtyTime
vm1@lab1a
SOLARIS5.10 SUN4u Owner Idle 0.020 512 0+00:10:17
vm2@lab1a SOLARIS5.10 SUN4u Unclaimed Idle 0.000 512 0+00:00:05
vm1@lab1b SOLARIS5.10 SUN4u Unclaimed Idle 0.010 512 0+00:03:04
vm2@lab1b SOLARIS5.10 SUN4u Unclaimed Idle 0.000 512 0+00:03:05
vm1@lab1c SOLARIS5.10 SUN4u Unclaimed Idle 0.000 512 0+00:03:25
.
.
.
bgoncal@lab3c> more condor/hosts/lab3c/log/SchedLog
9/14 12:00:49 (pid:11210) passwd_cache::cache_uid(): getpwnam("condor") failed:
Error 0
9/14 12:00:49 (pid:11210) passwd_cache::cache_uid(): getpwnam("condor") failed:
Error 0
9/14 12:00:49 (pid:11210) ******************************************************
9/14 12:00:49 (pid:11210) ** condor_schedd (CONDOR_SCHEDD) STARTING UP
9/14 12:00:49 (pid:11210) ** /home/bgoncal/condor/sbin/condor_schedd
9/14 12:00:49 (pid:11210) ** $CondorVersion: 6.7.10 Aug 3 2005 $
9/14 12:00:49 (pid:11210) ** $CondorPlatform: SUN4X-SOLARIS29 $
9/14 12:00:49 (pid:11210) ** PID = 11210
9/14 12:00:49 (pid:11210) ******************************************************
9/14 12:00:49 (pid:11210) Using config file: /home/bgoncal/condor/etc/condor_con
fig
9/14 12:00:49 (pid:11210) Using local config files: /home/bgoncal/condor//hosts/
lab3c/condor_config.local
9/14 12:00:49 (pid:11210) DaemonCore: Command Socket at <
170.140.151.128:50890
>
9/15 11:40:54 (pid:11210) DaemonCore: Command received via UDP from host <170.14
0.151.110:63801>
9/15 11:40:54 (pid:11210) DaemonCore: received command 60014 (DC_INVALIDATE_KEY)
, calling handler (handle_invalidate_key())
and on the StarterLog on the same machine we see:
9/14 13:21:05 get_mouse_info(): Failed to open /proc/interrupts
9/14 13:21:05 Failed to obtain keyboard or mouse idle information.
9/14 13:21:05 Assuming the keyboard and mouse to be infinitely idle.
9/14 13:24:57 DaemonCore: Command received via UDP from host <170.140.151.110:56
916>
9/14 13:24:57 DaemonCore: received command 440 (MATCH_INFO), calling handler (co
mmand_match_info)
9/14 13:24:57 vm1: match_info called
9/14 13:24:57 vm1: Received match <170.140.151.128:50889
>#1126713649#3
9/14 13:24:57 vm1: State change: match notification protocol successful
9/14 13:24:57 vm1: Changing state: Unclaimed -> Matched
9/14 13:24:58 DaemonCore: Command received via UDP from host <
170.140.151.110:56
925>
9/14 13:24:58 DaemonCore: received command 440 (MATCH_INFO), calling handler (co
mmand_match_info)
9/14 13:24:58 vm2: match_info called
9/14 13:24:58 vm2: Received match <170.140.151.128:50889
>#1126713649#2
9/14 13:24:58 vm2: State change: match notification protocol successful
9/14 13:24:58 vm2: Changing state: Unclaimed -> Matched
9/14 13:25:01 DaemonCore: Command received via UDP from host <
170.140.151.110:57
022>
9/14 13:25:01 DaemonCore: received command 443 (RELEASE_CLAIM), calling handler
(command_release_claim)
9/14 13:25:01 vm1: State change: received RELEASE_CLAIM command
9/14 13:25:01 vm1: Changing state: Matched -> Owner
9/14 13:25:01 vm1: State change: IS_OWNER is false
9/14 13:25:01 vm1: Changing state: Owner -> Unclaimed
9/14 13:25:02 DaemonCore: Command received via UDP from host <170.140.151.110:57
030>
9/14 13:25:02 DaemonCore: received command 443 (RELEASE_CLAIM), calling handler
(command_release_claim)
9/14 13:25:02 vm2: State change: received RELEASE_CLAIM command
9/14 13:25:02 vm2: Changing state: Matched -> Owner
9/14 13:25:02 vm2: State change: IS_OWNER is false
9/14 13:25:02 vm2: Changing state: Owner -> Unclaimed
9/14 13:25:52 DaemonCore: Command received via UDP from host <170.140.151.110:57
143>
9/14 13:25:52 DaemonCore: received command 440 (MATCH_INFO), calling handler (co
mmand_match_info)
9/14 13:25:52 vm1: match_info called
9/14 13:25:52 vm1: Received match <170.140.151.128:50889
>#1126713649#4
9/14 13:25:52 vm1: State change: match notification protocol successful
9/14 13:25:52 vm1: Changing state: Unclaimed -> Matched
9/14 13:25:53 DaemonCore: Command received via UDP from host <
170.140.151.110:57
151>
9/14 13:25:53 DaemonCore: received command 440 (MATCH_INFO), calling handler (co
mmand_match_info)
9/14 13:25:53 vm2: match_info called
9/14 13:25:53 vm2: Received match <170.140.151.128:50889
>#1126713649#5
9/14 13:25:53 vm2: State change: match notification protocol successful
9/14 13:25:53 vm2: Changing state: Unclaimed -> Matched
9/14 13:25:56 DaemonCore: Command received via UDP from host <
170.140.151.110:57
246>
9/14 13:25:56 DaemonCore: received command 443 (RELEASE_CLAIM), calling handler
(command_release_claim)
9/14 13:25:56 vm1: State change: received RELEASE_CLAIM command
9/14 13:25:56 vm1: Changing state: Matched -> Owner
9/14 13:25:56 vm1: State change: IS_OWNER is false
9/14 13:25:56 vm1: Changing state: Owner -> Unclaimed
9/14 13:25:56 DaemonCore: Command received via UDP from host <170.140.151.110:57
254>
9/14 13:25:56 DaemonCore: received command 443 (RELEASE_CLAIM), calling handler
(command_release_claim)
9/14 13:25:56 vm2: State change: received RELEASE_CLAIM command
9/14 13:25:56 vm2: Changing state: Matched -> Owner
9/14 13:25:56 vm2: State change: IS_OWNER is false
9/14 13:25:56 vm2: Changing state: Owner -> Unclaimed
9/14 13:26:05 Failed to open /proc/interrupts
9/14 13:26:05 get_mouse_info(): Failed to open /proc/interrupts
9/14 13:26:05 Failed to obtain keyboard or mouse idle information.
9/14 13:26:05 Assuming the keyboard and mouse to be infinitely idle.
9/14 13:26:51 DaemonCore: Command received via UDP from host <170.140.151.110:57
366>
9/14 13:26:51 DaemonCore: received command 440 (MATCH_INFO), calling handler (co
mmand_match_info)
9/14 13:26:51 vm1: match_info called
9/14 13:26:51 vm1: Received match <170.140.151.128:50889
>#1126713649#6
9/14 13:26:51 vm1: State change: match notification protocol successful
9/14 13:26:51 vm1: Changing state: Unclaimed -> Matched
9/14 13:26:51 DaemonCore: Command received via UDP from host <
170.140.151.110:57
374>
9/14 13:26:51 DaemonCore: received command 440 (MATCH_INFO), calling handler (co
mmand_match_info)
9/14 13:26:51 vm2: match_info called
9/14 13:26:51 vm2: Received match <170.140.151.128:50889
>#1126713649#7
9/14 13:26:51 vm2: State change: match notification protocol successful
9/14 13:26:51 vm2: Changing state: Unclaimed -> Matched
9/14 13:26:54 DaemonCore: Command received via UDP from host <
170.140.151.110:57
469>
9/14 13:26:54 DaemonCore: received command 443 (RELEASE_CLAIM), calling handler
(command_release_claim)
9/14 13:26:54 vm1: State change: received RELEASE_CLAIM command
9/14 13:26:54 vm1: Changing state: Matched -> Owner
9/14 13:26:54 vm1: State change: IS_OWNER is false
9/14 13:26:54 vm1: Changing state: Owner -> Unclaimed
9/14 13:26:54 DaemonCore: Command received via UDP from host <170.140.151.110:57
477>
9/14 13:26:54 DaemonCore: received command 443 (RELEASE_CLAIM), calling handler
(command_release_claim)
9/14 13:26:54 vm2: State change: received RELEASE_CLAIM command
9/14 13:26:54 vm2: Changing state: Matched -> Owner
9/14 13:26:54 vm2: State change: IS_OWNER is false
9/14 13:26:54 vm2: Changing state: Owner -> Unclaimed
9/14 13:31:05 Failed to open /proc/interrupts
9/14 13:31:05 get_mouse_info(): Failed to open /proc/interrupts
9/14 13:31:05 Failed to obtain keyboard or mouse idle information.
9/14 13:31:05 Assuming the keyboard and mouse to be infinitely idle.
and just goes on like this for a while... Any ideas as to what is going on?
Bruno
--
*******************************************
Bruno Miguel Tavares Goncalves, MS
PhD Candidate
Emory University
Department of Physics
Office No. N117-C
400 Dowman Drive
Atlanta, Georgia 30322
Homepage:
www.bgoncalves.com
Email:
bgoncalves@xxxxxxxxx
Phone: (404) 712-2441
Fax: (404) 727-0873
*******************************************
--
*******************************************
Bruno Miguel Tavares Goncalves, MS
PhD Candidate
Emory University
Department of Physics
Office No. N117-C
400 Dowman Drive
Atlanta, Georgia 30322
Homepage: www.bgoncalves.com
Email: bgoncalves@xxxxxxxxx
Phone: (404) 712-2441
Fax: (404) 727-0873
*******************************************