Date: | Tue, 19 Jul 2022 11:02:20 +0900 |
---|---|
From: | Geonmo Ryu <geonmo@xxxxxxxxxxx> |
Subject: | [HTCondor-devel] [RE]Re: Delayed Transfer with Signals(self-checkpoint)and checkpoint_exit_code |
Hello, Todd. I think I explained it in a confusing way. When I tested it, there was an inconvenience because ON_EXIT_OR_EVICT sends transfer_output_files instead of transfer_checkpoint_files, as shown in the manual. We are supposed to put the final result file in transfer_output_files. However, when we actually wrote the test code, we found that the checkpoint occurred while the result file was not created, which led to the job being held. To solve this problem, I modified the vanilla_proc.cpp file of condor_starterv6.1. I added some code to send checkpoint files where isSoftKilling is checked, and We confirmed that it works as we intended. I was wondering what the problem was with such a simple modification. Because we didn't understand the whole HTConder code, so we thought that code could cause problems. If there is no particular issue, I would like to create a pull-request the code. Since the code is too simple, you need to refine it, but I think it's a better way to use the delayed transfer with signals method. Regards, -- Geonmo From : Todd L Miller <tlmiller@xxxxxxxxxxx> To : "Geonmo Ryu" <geonmo@xxxxxxxxxxx> Cc : <htcondor-devel@xxxxxxxxxxx> Sent : 2022-07-19 01:44:20 Subject : Re: [HTCondor-devel] Delayed Transfer with Signals(self-checkpoint)and checkpoint_exit_code I don't know what you're asking here. The "delayed transfer with signals" method works -- when it does -- by causing setting the soft-kill signal to the one which causes the job to produce a checkpoint. If the job then produces a checkpoint before the soft-kill timeout, and when_to_transfer files is set to ON_EXIT_OR_EVICT, and transfer_output_files includes the checkpoint, then it will be transferred as a result of the eviction. The last condition wasn't explicitly stated in that section of the manual, so my apologies if that caused you confusion. There is currently no way to use transfer_checkpoint_files instead of transfer_output_files on an eviction. - ToddM |
[← Prev in Thread] | Current Thread | [Next in Thread→] |
---|---|---|
|
Previous by Date: | Re: [HTCondor-devel] Delayed Transfer with Signals(self-checkpoint) and checkpoint_exit_code, Todd L Miller |
---|---|
Next by Date: | , (nil) |
Previous by Thread: | Re: [HTCondor-devel] Delayed Transfer with Signals(self-checkpoint) and checkpoint_exit_code, Todd L Miller |
Next by Thread: | , (nil) |
Indexes: | [Date] [Thread] |