Hello all I am trying a third party application in our cluster. I have prepared a bash script to run the program, and I make a submit-file to tell condor to start that script with some arguments. However, it does not behave as I would like it to. The condor-job starts and moves input-files as I expect it to. The process starts, and runs for about 8-10 minutes and then is killed by the kernel (oom-kill). The strange thing is that when I observe the running jobs on the startd-machine
(via ssh) it does not seem to use almost any resources at all (top says MEM% less than 0.5). I have tried starting the job manually (via ssh) on the startd-machine and it runs just fine. When I start it this was it also consumes a lot more resources. Top says %CPU of about 200 and %MEM of about 4. The programs clearly demands some
resources, but I find it strange that the kernel kills it when run through condor and I can’t see that it uses hardly any resources at all. Thoughts? P
|