Steven Timm wrote:
> On Wed, 7 Jun 2006, Michael Thomas wrote:
>
>
>>I have a cluster of 50 nodes, 4 vms per node. On all but one node I
>>have a certain directory mounted via read-only nfs. On the remaining
>>node the directory is mounted read-write.
>>
>>Every user coming into the system only needs read-only access to the
>>certain directory. But one special user always needs read-write access.
>>
>>How can I guarantee that this special user always gets sent to the one
>>node that has read-write access to this directory? Note that I don't
>>mind if other users also get sent to this read-write node.
>
>
> First, define the one node to have an extra attribute in its machine
> classad
>
> [root@fnpcsrv1 root]# grep IO /opt/condor/local/condor_config.local
> MachineClass = "IO"
> Class = "IO"
> START = JobClass =!= UNDEFINED && JobClass == "IO"
>
> On a non-grid job, then the user should just add
> +JobClass = "IO"
> requirements = (MachineClass =!= UNDEFINED && MachineClass == "IO")
>
> to his condor submit file.
>
> You can force a inbound grid job for that user to do that
> by hacking condor.pm to add these extra two lines to the
> submit script file it writes.
>
> Steve
Thanks for the tip, Steve.
It almost works... I hacked condor.pm to add the +JobClass and
requirements. The job submit script on the CE shows that they get added:
...
Executable =
/home/uscms01/.globus/.gass_cache/local/md5/4a/67571a70a8ae3d2291019518204cc1/md5/81/2e7051cca30e7ea792099078f56ae3/data
+JobClass = "IO"
Requirements = OpSys == "LINUX" && Arch == "INTEL" && (MachineClass
=!= UNDEFINED && MachineClass == "IO")
X509UserProxy =
/home/uscms01/.globus/job/citgrid3.cacr.caltech.edu/29347.1150304652/x509_up
...
condor_config.local on the compute node also has the machineclass and
class configuration:
MachineClass = "IO"
Class = "IO"
START = JobClass =!= UNDEFINED && JobClass == "IO"
But it seems that the job's requirements prevent it from running
anywhere. When I submit the job and run condor_q -better-analyze[1], it
shows that the machineclass requirement is causing it to fail.
How can I query the remote machine to verify that it's loading the
condor_config.local settings as expected?
--Mike
[1]
59349.000: Run analysis summary. Of 8 machines,
8 are rejected by your job's requirements
0 reject your job because of their own requirements
0 match but are serving users with a better priority in the pool
0 match but reject the job for unknown reasons
0 match but will not currently preempt their existing job
0 are available to run your job
No successful match recorded.
Last failed match: Wed Jun 14 10:17:29 2006
Reason for last match failure: no match found
WARNING: Be advised:
No resources matched request's constraints
The Requirements expression for your job is:
( target.OpSys == "LINUX" && target.Arch == "INTEL" &&
( target.MachineClass isnt undefined && target.MachineClass == "IO" ) ) &&
( target.Disk >= DiskUsage ) && ( ( target.Memory * 1024 ) >= ImageSize ) &&
( target.HasFileTransfer )
Condition Machines Matched Suggestion
--------- ---------------- ----------
1 ( target.MachineClass isnt undefined && target.MachineClass == "IO" )
0 REMOVE
2 target.OpSys == "LINUX" 8
3 target.Arch == "INTEL" 8
4 ( target.Disk >= 76 ) 8
5 ( ( 1024 * target.Memory ) >= 1 ) 8
6 ( target.HasFileTransfer ) 8
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature