Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Problems with checkpointing.
- Date: Fri, 29 Apr 2005 10:39:25 -0500 (CDT)
- From: Paul Armor <parmor@xxxxxxxxxxxxxxxxxxxx>
- Subject: Re: [Condor-users] Problems with checkpointing.
Hi Alan,
> I would think that these would be identical RPMs, since we don't distribute
> different binaries for RedHat 9, Fedora Core 1, or Fedora Core 3: We build
> it on RedHat 9 and it just works on the Fedora Core 1-3. I know that the
> download web page lists them separately--this is to make it clear what to
> download. But they are identical.
OK, I was feeling "superstitious" ;-)
> I'm also a bit confused--you're installing the checkpoint server on all the
> execution computers?
Yes, I inherited the spec file and process, so... (P.S. we're installing
the same RPM on all nodes, using same condor_config, using different
condor_config.local)
> Can you be more specific about the errors you are getting?
OK, I was waiting for more details from users... I'll attach a bunch of
stuff below, trying to show lifecycle of jobs, but here's a typical log
entry when a job dies... I know this job was condor_compiled on a RH9
box, I don't know where it initially ran, but here it dies on a RH9 box:
001 (12450.852.000) 04/27 17:08:09 Job executing on host: <129.89.200.78:51017>
...
005 (12450.852.000) 04/27 17:08:14 Job terminated.
(0) Abnormal termination (signal 11)
(0) No core file
Usr 0 00:00:00, Sys 0 00:00:00 - Run Remote Usage
Usr 0 00:00:00, Sys 0 00:00:00 - Run Local Usage
Usr 0 01:30:00, Sys 0 00:00:32 - Total Remote Usage
Usr 0 00:00:00, Sys 0 00:00:00 - Total Local Usage
304 - Run Bytes Sent By Job
58917520 - Run Bytes Received By Job
0 - Total Bytes Sent By Job
0 - Total Bytes Received By Job
...
> Yeah--these are the same binaries. Sorry for the confusion. :(
No worries, I still would have probably become superstitious ;-)
> I think we need to see some log files to better help you.
Actually, what's the preferred method of overwhelming you with logs?
Shall I throw them up so as to be http-able? Or would you prefer email?
Cheers,
Paul