Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] AMD Opteron Crashes
- Date: Fri, 4 Mar 2005 16:15:57 +0100
- From: Steffen Prohaska <prohaska@xxxxxx>
- Subject: [Condor-users] AMD Opteron Crashes
Hi,
In
https://lists.cs.wisc.edu/archive/condor-users/pre-2004-June/
msg01368.shtml I read that it should be possible to use the
linux-x86-glibc23-dynamic binary on an 64 bit Opteron system to run
Condor.
Everything's working fine until condor tries to start a job. The
condor_starter crashes with a SEGFAULT.
I tried this with the condor-6.6.8-linux-x86-glibc22-dynamic.tar.gz,
condor-6.6.8-linux-x86-glibc23-dynamic.tar.gz, and the
condor-6.7.5-linux-x86-glibc23-dynamic.tar.gz. The behaviour is always
similar. We're running a Suse Enterprise Linux. User information is
stored in LDAP. I attached excerpts from log files below. If more
details were helpful, I could also provide them.
Any thoughts on this? Is anyone successfully running Condor on a
similar Opteron system?
Steffen
--- System info
acorn:/ # cat /etc/SuSE-release
SUSE LINUX Enterprise Server 9 (x86_64)
VERSION = 9
acorn:/ # uname -a
Linux acorn 2.6.5-7.139-smp #1 SMP Fri Jan 14 15:41:33 UTC 2005 x86_64
x86_64 x86_64 GNU/Linux
--- From StartLog:
StartLog:3/4 15:48:32 Starter pid 18488 died on signal 11 (signal 11)
--- From /var/log/messages
Mar 4 15:48:32 acorn kernel: condor_starter[18488]: segfault at
00000000a4e0efc5 rip 00000000559a4dac rsp 00000000ffffc4a8 error 4
--- From StarterLog.vm2
3/4 15:48:29 (fd:9) PASSWD_CACHE_REFRESH is undefined, using default
value of 300
3/4 15:48:29 (fd:9) Finding local host information, calling
gethostname()
[...]
3/4 15:48:29 (fd:9) passwd_cache::cache_uid(): getpwnam("condor")
failed: user not found
3/4 15:48:29 (fd:9) passwd_cache::cache_uid(): getpwnam("condor")
failed: user not found
3/4 15:48:29 (fd:9) PRIV_UNKNOWN --> PRIV_CONDOR at
daemon_core_main.C:1382
3/4 15:48:29 (fd:9) KEYCACHE: created: 82ca8d8
3/4 15:48:29 (fd:9)
******************************************************
3/4 15:48:29 (fd:9) ** condor_starter (CONDOR_STARTER) STARTING UP
3/4 15:48:30 (fd:9) **
/vis/data/people/condor/linux-glibc23/sbin/condor_starter
3/4 15:48:30 (fd:9) ** $CondorVersion: 6.6.8 Jan 27 2005 $
3/4 15:48:30 (fd:9) ** $CondorPlatform: I386-LINUX_RH9 $
3/4 15:48:30 (fd:9) ** PID = 18488
3/4 15:48:30 (fd:9) ** Running as root: Privilege switching in effect
3/4 15:48:30 (fd:9)
******************************************************
[...]
TransferSocket = "<130.73.68.82:21118>"
ShadowVersion = "$CondorVersion: 6.6.8 Jan 27 2005 $"
UidDomain = "zib.de"
3/4 15:48:32 (fd:11) --- End of ClassAd ---
3/4 15:48:32 (fd:11) STARTER_TIMEOUT_MULTIPLIER is undefined, using
default value of 0
3/4 15:48:32 (fd:11) New Daemon obj (shadow) name: "onyx3.zib.de",
pool: "NULL", addr: "NULL"
3/4 15:48:32 (fd:11) Version of Shadow is $CondorVersion: 6.6.8 Jan 27
2005 $
3/4 15:48:32 (fd:11) Starter communicating with condor_shadow
<130.73.68.82:21118>
3/4 15:48:32 (fd:11) Submitting machine is "onyx3.zib.de"
3/4 15:48:32 (fd:11) Doing CONDOR_register_starter_info
3/4 15:48:32 (fd:11) ShouldTransferFiles is "NO", NOT transfering files
3/4 15:48:32 (fd:11) Submit UidDomain: "zib.de"
3/4 15:48:32 (fd:11) Local UidDomain: "zib.de"
3/4 15:48:32 (fd:11) Initialized user_priv as "..."
[ at this time the daemon crashes ]
--- End of log
--
Steffen Prohaska <prohaska@xxxxxx> <http://www.zib.de/prohaska/>
Zuse Institute Berlin, Takustraße 7, D-14195 Berlin-Dahlem, Germany
+49 (30) 841 85-337, fax -107
1024D/DA749299 print 8B59 83A8 A43D E0E2 DEDB D479 3157 2FEA DA74 9299
Attachment:
PGP.sig
Description: This is a digitally signed message part