On 03/15/2013 06:53 AM, Chris Filo Gorgolewski wrote:
Hi, I run into a peculiar problem. Users are submitting jobs that submit more jobs. This is problematic because if gets preempted and restarted all the jobs it had submitted will be submitted again causing general chaos. So how can I prevent jobs to submit new jobs? Best, Chris
If the user's job runs with the user's identity/credentials there are no reasonable options.
If the user's job does not run w/ the user's identity/credentials you can lock down the schedd to not allow submissions from the identity/credentials that the jobs are using (possibly nobody or a slot user).
The root cause being users writing jobs that "misbehave" by your definition of misbehave (not cleaning up after themselves or not checking if they've already partially run). This is often where you have to step into policy and social engineering.
You could explore disabling preemption for the users who have jobs that submit more jobs. Condor provides a tool called DAGMan that is basically a well written job that submits more jobs, maybe your users should be using it. Alternatively, you can educate the users about your definition of misbehaving and give them guidance on how to properly behave, then provide incentives by giving misbehaving users an overall lower priority in your pool (let the fair share algorithm have a memory).
Best, matt