I have been struggling with a problem the whole day. It is
probably something stupid, but I would really appreciate some
light.
I have this computer (32 cores) that is a dedicated pool, we
use it to process simulations. Today someone submitted a
simulation that needs to read and write loads of tiny files and
it caused the computer to go almost idle due to the disk
bottleneck. This computer has 64 GB ram, so I figure I would get
20GB as a ramdisk and things would work as they should. The
problem is that after the jobs update their ImageSize for the
first time they just go to the IDLE state and I get:
013.029: Run analysis summary. Of 64 machines,
64 are rejected by your job's requirements
0 reject your job because of their own requirements
0 match but are serving users with a better priority
in the pool
0 match but reject the job for unknown reasons
0 match but will not currently preempt their existing
job
0 match but are currently offline
0 are available to run your job
Last successful match: Mon May 7 19:31:42 2012
WARNING: Be advised:
No resources matched request's constraints
The Requirements _expression_ for your job is:
( ( target.OpSys == "LINUX" ) && ( TARGET.Disk
>= 0 ) ) &&
( TARGET.Arch == "X86_64" ) && ( ( TARGET.Memory *
1024 ) >= ImageSize ) &&
( ( RequestMemory * 1024 ) >= ImageSize ) && (
TARGET.HasFileTransfer )
Job ClassAd Requirements _expression_ evaluates to false
I figure it is something to do with ( TARGET.Memory * 1024 )
>= ImageSize, how can I change it? I dont really care about
check pointing, I just need the end result and if something
fails I will restart it from beginning. Can I disable check
pointing somehow in the vanilla universe? FYI: The jobs do not
use much memory.
Mac.