Hello,
When submitting N instances of a job, generally N/2 jobs run in the expected time and the other N/2 jobs take longer to complete. The system has 10 nodes each with 32 slots and uses a shared filesystem (GlusterFS). All of the executables and data files are located on the shared file system; however, the problem does not seem to be an I/O or network bottleneck. When submitting 2 instances, the two times are the following: Instance 1 real 7m13.950s user 5m36.766s sys 0m14.436s Instance 2 real 6m2.555s user 5m35.747s sys 0m13.170s When submitting 22 instances, the difference in times are more drastic. The two categories that the times fall into are the following: Category 1: real 18m28.193s user 5m39.153s sys 0m15.111s Category 2: real 6m12.578s user 5m36.433s sys 0m12.644s Does anybody have insight into this issue? Thanks, Vishal |