Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Problem running mpi job on Condor 6.7.5 Feb 28 2005, I386-LINUX_RH9
- Date: Thu, 21 Apr 2005 10:14:00 +0200
- From: Philipp Kolmann <kolmann@xxxxxxxxxxxxxxxx>
- Subject: Re: [Condor-users] Problem running mpi job on Condor 6.7.5 Feb 28 2005, I386-LINUX_RH9
Good morning to you too.
Thanks for your answer. The hosts are rebooted every day, since the condor
setup is a special setup using linux remote boot.
On Wed, Apr 20, 2005 at 03:33:43PM -0500, Greg Thain wrote:
> Thanks for your complete logs -- this really helps us debug this kind
> of thing. The key line in the log is this:
>
> > Found 0 potential dedicated resources
>
> This means that, despite your setup, the dedicated scheduler has not
> found any of the machines you have dedicated to it. Did you restart the
> startds after changing their configuration?
I have activated the Full Debug for the Startds and here is the output, when I
submit the mpi job:
reconfig:
4/21 10:06:15 Got SIGHUP. Re-reading config files.
4/21 10:06:15 STARTD_TIMEOUT_MULTIPLIER is undefined, using default value of 0
4/21 10:06:15 Will use UDP to update collector gridmaster.ben.tuwien.ac.at
<193.170.74.44:9618>
4/21 10:06:15 UidDomain = "ben.tuwien.ac.at"
4/21 10:06:15 FileSystemDomain = "ben.tuwien.ac.at"
4/21 10:06:15 Subnet = "193.170.74"
4/21 10:06:15 Swap space: 0
4/21 10:06:15 186015992 kbytes available for
"/grid/condor/hosts/zid30/execute"
4/21 10:06:15 Looking up RESERVED_DISK parameter
4/21 10:06:15 Reserving 5120 kbytes for file system
4/21 10:06:15 Disk space: 186010872
4/21 10:06:16 MainConfig finish
4/21 10:06:16 CronMgr: Doing config (reconfig)
4/21 10:06:16 DaemonCore: in SendAliveToParent()
4/21 10:06:16 DaemonCore: attempting to connect to '<193.170.74.30:32768>'
4/21 10:06:16 STARTD_TIMEOUT_MULTIPLIER is undefined, using default value of 0
4/21 10:06:16 SEC_DEBUG_PRINT_KEYS is undefined, using default value of False
4/21 10:06:16 DaemonCore: No more children processes to reap.
4/21 10:06:20 Trying to update collector <193.170.74.44:9618>
4/21 10:06:20 Attempting to send update via UDP to collector
gridmaster.ben.tuwien.ac.at <193.170.74.44:9618>
4/21 10:06:20 SEC_DEBUG_PRINT_KEYS is undefined, using default value of False
4/21 10:06:20 Sent update to 1 collector(s)
4/21 10:07:35 Getting monitoring info for pid 4231
4/21 10:08:16 Swap space: 0
4/21 10:08:16 186015992 kbytes available for
"/grid/condor/hosts/zid30/execute"
4/21 10:08:16 Looking up RESERVED_DISK parameter
4/21 10:08:16 Reserving 5120 kbytes for file system
4/21 10:08:16 Disk space: 186010872
4/21 10:08:20 Trying to update collector <193.170.74.44:9618>
4/21 10:08:20 Attempting to send update via UDP to collector
gridmaster.ben.tuwien.ac.at <193.170.74.44:9618>
4/21 10:08:20 SEC_DEBUG_PRINT_KEYS is undefined, using default value of False
4/21 10:08:20 Sent update to 1 collector(s)
submission:
4/21 10:10:16 Swap space: 0
4/21 10:10:16 186014528 kbytes available for
"/grid/condor/hosts/zid30/execute"
4/21 10:10:16 Looking up RESERVED_DISK parameter
4/21 10:10:16 Reserving 5120 kbytes for file system
4/21 10:10:16 Disk space: 186009408
4/21 10:10:20 Trying to update collector <193.170.74.44:9618>
4/21 10:10:20 Attempting to send update via UDP to collector
gridmaster.ben.tuwien.ac.at <193.170.74.44:9618>
4/21 10:10:20 SEC_DEBUG_PRINT_KEYS is undefined, using default value of False
4/21 10:10:20 Sent update to 1 collector(s)
4/21 10:11:35 Getting monitoring info for pid 4231
4/21 10:12:16 Swap space: 0
4/21 10:12:16 186014520 kbytes available for
"/grid/condor/hosts/zid30/execute"
4/21 10:12:16 Looking up RESERVED_DISK parameter
4/21 10:12:16 Reserving 5120 kbytes for file system
4/21 10:12:16 Disk space: 186009400
4/21 10:12:20 Trying to update collector <193.170.74.44:9618>
4/21 10:12:20 Attempting to send update via UDP to collector
gridmaster.ben.tuwien.ac.at <193.170.74.44:9618>
4/21 10:12:20 SEC_DEBUG_PRINT_KEYS is undefined, using default value of False
4/21 10:12:20 Sent update to 1 collector(s)
Thanks for your help
Philipp Kolmann
--
If you have problems in Windows: REBOOT
If you have problems in Linux: BE ROOT