[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Checkpointing job restarts in expiring pilot



The starter doesn't evaluate the startd's START expression when "restarting" self-checkpointing jobs.

I haven't been able to reproduce any problems with CurrentTime being wrong when evaluating the START expression.

It's also dangerous, generally, to assume that file transfer of your job can complete during retirement time (although in this case it looks like the glide-in is giving you an hour's warning?) -- you have no a-priori idea how long youre job will sit in the AP's transfer queue, even if the actualy data transfer takes almost no time at all. This is why we have self-checkpointing jobs at all (because otherwise you would just set ON_EXIT_OR_REMOVE and the signal sent appropriately).

-- ToddM