On 1/16/2014 12:02 PM, Andrey Kuznetsov wrote:
Hi, Here's the log file from a job that appears to be suspended, and I cannot resume it. Short of removing the job and resubmitting it, is there another way to force it to restart or continue?
The story here is your job landed on a machine that is configured to suspend jobs running on that machine when some condition becomes true (e.g. activity on the keyboard or increased non-condor load average) and then unsuspend or restart the job after X amount of time. This sort of policy is common when running jobs on non-dedicated desktop machines.
As a user submitting jobs, if you never want your jobs to suspend, you're only recourse is to add a requirement to your submit file to avoid machines with such a policy (if there are any such machines in your pool).
If you are also the administrator of the machines in your pool, you could put
SUSPEND = FALSE into your condor_config file... Todd
001 (1321.003.000) 01/15 15:57:23 Job executing on host: <128.114.*.*:9944> ... 006 (1321.003.000) 01/15 15:57:32 Image size of job updated: 24704 2 - MemoryUsage of job (MB) 1572 - ResidentSetSize of job (KB) ... 010 (1321.003.000) 01/15 15:58:56 Job was suspended. Number of processes actually suspended: 2