On 5/27/2013 4:53 PM, 钱晓明 wrote:
Hi, What is the difference between condor_suspend/continue and condor_hold/release for vanilla jobs? I see this for condor_hold in manual: "A currently running job that is placed in the hold state by condor_hold is sent a hard kill signal." So I think that this job will be killed and in HOLD state. What condor_suspend do to a running job?
condor_hold kills the process(es) currently associated with the job, frees the machine slot that was running the job to run another job.
condor_suspend does NOT kill the process(es) associated with the job, but instead tells the operating system to not schedule any CPU time for the job (in Unix-land, this means sending a SIGSTOP signal to the job). The job is still keeps the machine slot occupied - it is still consuming RAM (or at least swap), kernel resources like file descriptors, and disk. But it is not consuming any CPU cycles until a condor_continue.
Weak analogy to playing a DVD of a movie: Think of condor_suspend like free-framing the playback of a movie DVD, while condor_hold is like ejecting the DVD and putting it back on a shelf to watch another day.
regards Todd