Hi José,Thanks for the feedback. In my tests I had to give the job a SIGSTOP prior to checkpointing it for it to work. I'll shall look further into it and probably contact the BLCR developers, but in the meantime I hope your alteration works for you.
Regards, Mark José M. Martín wrote:
Thaks, MarkI have a doubt about your code. Why do you stop the job before checkpointing it? I have probed that, but the checkpoint crashs. So, I have removed those commands. I don't find any instructions about that in the BLCR web page.Saludos, José El Wednesday 27 February 2008 10:15:12 Mark Calleja escribió:Hi José, I'm glad you've got a version to suit your needs. Just in case, I've updated my online version (with documentation to reflect it) to perform a similar function. Cheers, Mark José M. Martín wrote:Actually, I have modified your version to my enviroment and I have added this "feature". Now, we are testing it under a production cluster with about 15 users and 150 works. Thanks for your time. -- José. El Tuesday 26 February 2008 10:48:20 Mark Calleja escribió:Hi José, Thanks for your feedback. What you suggest should not be too difficult to add, so if you're interested I can send you a modified version of the code for you to test. Cheers, Mark José M. Martín wrote:Thanks for your contribution. I used it to my Condor cluster and it works greatly. Only a suggestion: you can trap Condor's signals to force to your programs to make a checkpoint. When Condor vacates a program, it sends it a signal (killsig) (http://www.cs.wisc.edu/condor/manual/v7.0/2_7Priorities_Preemption.htm l# SECTION00373000000000000000) Trapping this signal, programs could make a checkpoint before stop. Cheers, José El Monday 04 February 2008 12:37:43 Mark Calleja escribió:Hi, In case it's of use or interest to anyone else on this mailing list, I've written some notes on how one can use Parrot and the BLCR kernel modules to transparently checkpoint Condor's vanilla universe jobs. The link is: http://www.escience.cam.ac.uk/projects/camgrid/blcr.html This is recent/ongoing work, so feedback and/or bug reports back to me please. Cheers, Mark_______________________________________________ Condor-users mailing list To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/condor-users The archives can be found at: https://lists.cs.wisc.edu/archive/condor-users/_______________________________________________ Condor-users mailing list To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/condor-users The archives can be found at: https://lists.cs.wisc.edu/archive/condor-users/_______________________________________________ Condor-users mailing list To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/condor-usersThe archives can be found at: https://lists.cs.wisc.edu/archive/condor-users/
-- Cambridge eScience Centre, University of Cambridge Centre for Mathematical Sciences, Wilberforce Road, Cambridge CB3 0WA Tel. (+44/0) 1223 765317, Fax (+44/0) 1223 765900 http://www.escience.cam.ac.uk/~mcal00