[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Checkpointing job restarts in expiring pilot



The starter doesn't evaluate the startd's START expression when "restarting" self-checkpointing jobs.
	I haven't been able to reproduce any problems with CurrentTime 
being wrong when evaluating the START expression.
	It's also dangerous, generally, to assume that file transfer of 
your job can complete during retirement time (although in this case it 
looks like the glide-in is giving you an hour's warning?) -- you have no 
a-priori idea how long youre job will sit in the AP's transfer queue, even 
if the actualy data transfer takes almost no time at all.  This is why we
have self-checkpointing jobs at all (because otherwise you would just set 
ON_EXIT_OR_REMOVE and the signal sent appropriately).
-- ToddM