Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Job was not checkpointed
- Date: Wed, 17 Oct 2007 18:18:55 -0400 (EDT)
- From: "Brian Dandurand" <bdandur@xxxxxxxxxxx>
- Subject: Re: [Condor-users] Job was not checkpointed
Hi,
I think we're having the same problem, where jobs are not checkpointing
when being evicted. We are working on a cluster of Solaris 9 stations.
Also, the jobs consistently get evicted every three hours.
Here is a sample from a log file. The error and output files are empty.
***************************************************************************
004 (106.000.000) 10/17 09:53:17 Job was evicted.
(0) Job was not checkpointed.
Usr 0 00:00:00, Sys 0 00:00:00 - Run Remote Usage
Usr 0 00:00:00, Sys 0 00:00:00 - Run Local Usage
1322 - Run Bytes Sent By Job
133228 - Run Bytes Received By Job
...
001 (106.000.000) 10/17 09:55:09 Job executing on host:
<130.127.206.32:32788>
...
004 (106.000.000) 10/17 12:55:15 Job was evicted.
(0) Job was not checkpointed.
Usr 0 00:00:00, Sys 0 00:00:00 - Run Remote Usage
Usr 0 00:00:00, Sys 0 00:00:00 - Run Local Usage
1322 - Run Bytes Sent By Job
133228 - Run Bytes Received By Job
...
001 (106.000.000) 10/17 12:57:40 Job executing on host:
<130.127.206.42:32781>
...
***************************************************************************
Thank you for any help you can provide.
Brian Dandurand
> Hi,
>
> My jobs do not get migrated to then next available node, its gives me
> the below error
>
> ___________________________________________________________________________________
> 011 (191.000.000) 07/03 11:06:07 Job was unsuspended.
> ...
> 004 (191.000.000) 07/03 11:06:07 Job was evicted.
> (0) Job was not checkpointed.
> Usr 0 00:00:11, Sys 0 00:00:00 - Run Remote Usage
> Usr 0 00:00:00, Sys 0 00:00:00 - Run Local Usage
> 0 - Run Bytes Sent By Job
> 0 - Run Bytes Received By Job
>
> ___________________________________________________________________________________
>
> Plz help :(
>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
>
----------------------------------------
Brian C. Dandurand
Clemson University
Department of Mathematical Sciences
Ph.D. Student
Office: Martin Hall E-6
Office Phone: (864)656-4749
----------------------------------------