On 7/30/2015 2:31 PM, Brian Bockelman wrote:
On Jul 30, 2015, at 11:40 AM, Dimitri Maziuk <dmaziuk@xxxxxxxxxxxxx> wrote: On 07/30/2015 10:01 AM, andrew.lahiff@xxxxxxxxxx wrote:Hi Greg, Ok, I didn't realized it worked like this - I had assumed HTCondorwould do something like "docker stop", rather than send a signal to the actual executable running inside the container. Isn't this rather unsafe? It makes it very easy for people to run jobs which escape HTCondor's control - according to HTCondor the job has been killed but the Docker container continues running for as long as it wants.
Greg can correct me if I am wrong, but I believe the signal sending is only to give the job a chance to "gracefully" shut down (vacate). After HTCondor sends the signals, it sets a timer to follow up with a docker stop. Thus nothing is allowed to continue running forever. See the manual for MachineMaxVacateTime and JobMaxVacateTime - I think the default on these is 10 minutes. So to achieve today what you stated above, I think you could submit your docker universe job with something like
job_max_vacate_time = 2and then HTCondor should do a docker-stop two seconds after sending the signal if the instance is still lingering. I think Greg is thinking about changing the default JobMaxVacateTime to be much smaller for docker universe than the default of 10 minutes...
regards Todd