Hey all, I was hoping to get some advice on this problem: We have some machines that occasionally refuse to run the DAG from a submitter's machine. In other words, the submitter will submit a DAG job and the condor_dagman will just spin between IDLE and RUNNING, continuously evicting the job. e.g. (from the dagman output): 001 (4634.000.000) 11/18 11:03:49 Job executing on host: <10.10.xxx.xxx:1118> ... 004 (4634.000.000) 11/18 11:03:49 Job was evicted. (0) Job was not checkpointed. Usr 0 00:00:00, Sys 0 00:00:00 - Run Remote Usage Usr 0 00:00:00, Sys 0 00:00:00 - Run Local Usage 0 - Run Bytes Sent By Job 0 - Run Bytes Received By Job (this continues to repeat over and over)... These machines submit jobs only and do not handle any jobs. I don't know if the DAGMAN submission follows the same START rules as with machines in the Condor pool, but how do I ensure that, regardless of any circumstances, a user's machine will not evict the job? (We are using 7.04 on most user submit machines, but have been upgrading their condor_submit_dag executables to the latest -- I'm pretty sure this issue has been seen on users with either version). As always, appreciate the assistance :), Steve Get a great deal on Windows 7 and see how it works the way you want. Check out the offers on Windows 7now. |