[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] dagman capabilities



I was looking at 'job recovery: the rescue DAG' in the online Condor manual (2.10.6) but couldn't decide is DAGman was capable of handling the situation of submitting N jobs (embarrassingly parallel, say) to the Vanilla universe (since we cannot link to the checkpointing for Standard) and then resubmitting those which are killed (eg due to pre- emption by work on the given nodes)? It talks about handling jobs that do not finish due to a node failure but before investing time/effort I wished to enquire whether this included such pre-emption as outlined above?

thakns, M