[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Move jobs of users from one to another worker node



Hi Guys,
        ÂThanks for your help!.  ÂI am able to achieve it by modifying the job requirementÂin the submit file.Â

However, I still need to know if we need to start jobs on another node not from startingÂbut from a checkpoint ( where they left ) , how to achieve this ? We are using shared storage among all EPs.

Thanks,
Gagan

On Thu, Nov 27, 2025 at 10:03âPM Steffen Grunewald <steffen.grunewald@xxxxxxxxxx> wrote:
On Thu, 2025-11-27 at 14:18:06 +0530, gagan tiwari wrote:
> Hi Steffen,
>Â Â Â Â Â Â Â Â Â Â Thanks for your response.
>
> So, please let me know how to achieve it even if the jobs start from the
> starting point on another node.
>
> Lets say user Tom has submitted 4 jobs. 2 of his jobs are running with
> cluster id 2021 and another 2 jobs with cluster id 2025. So, I need to
> move his job with cluster id 2025 to a different worker node.

This - to me at least - seems to involve other questions:
Do you want the node to be completely idle/unclaimed? condor_drain might help.
Do you want to match jobs with particular node features (special hardware,
licenses etc.)? Use Requirements on the job side, or/and START expressions
on the machine side.
If "a different node" means "any other", and none of the above applies, it
would be helpful to know about the "why".
If that "different node" is a particular one, then set the job Requirements
accordingly, and hold/release (or cancel/resubmit) the job to be re-matched.
If there's no rule that can be formalized, we cannot help I'm afraid.

In each case, if the executable hasn't set up any checkpointing itself, it
will restart from scratch.

- S
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe

The archives can be found at: https://www-auth.cs.wisc.edu/lists/htcondor-users/