Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] Unsubscribe
- Date: Sat, 22 Nov 2008 03:16:25 +0900 (JST)
- From: sin260 <sin26018@xxxxxxxxx>
- Subject: [Condor-users] Unsubscribe
--- Vijay Shiv Kumar <vijayskumar@xxxxxxxxxxx> wrote:
>
> Dear all,
>
> I have set up a Condor pool spanning nodes from two
> clusters;
> each cluster has its own filesystem domain
> ('bmi.oar.net' and
> 'cse.oar.net'). The head-node of one of the clusters
> serves as the central
> manager, dedicated scheduler and the submit node for
> the entire pool.
>
> Now, I wish to execute certain jobs only on a
> specific cluster. I tried to
> achieve this by having these jobs require a specific
> FileSystemDomain in
> their classAd. However, this request is never
> matched even though
> unclaimed candidate resources exist in the pool.
>
> Specific e.g.: One of my jobs must be executed only
> on the cluster with
> filesystem domain 'cse.oar.net'. (The submit node
> and this target cluster
> do not share a common filesystem).
>
> The job's specific requirements are as follows:
>
> [vijayskumar@bm-login ~]$ condor_q -long | grep
> Requirements
> Requirements = (regexp("*.cse.oar.net",
> FileSystemDomain, "i")) && (Arch
> == "X86_64") && (OpSys == "LINUX") && (Disk >=
> DiskUsage) && ((Memory *
> 1024) >= ImageSize)
>
> The requested resources for the job are available:-
>
> [vijayskumar@bm-login ~]$ condor_status -const
> "regexp(\".cse.oar.net\", FileSystemDomain)"
>
> Name OpSys Arch State
> Activity LoadAv Mem ActvtyTime
>
> slot1@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle
> 0.000 2009 0+00:50:04
> slot1@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle
> 0.000 2009 0+00:50:04
> slot1@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle
> 0.000 2009 0+00:50:04
> slot1@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle
> 0.000 2009 0+00:50:04
>
> [vijayskumar@bm-login ~]$ condor_status -long | grep
> FileSys | grep cse
> FileSystemDomain = "cs41.cse.oar.net"
> FileSystemDomain = "cs42.cse.oar.net"
> FileSystemDomain = "cs43.cse.oar.net"
> FileSystemDomain = "cs44.cse.oar.net"
>
> However, a match never transpires, and the requests
> by the job keep
> getting rejected. (Here, 84544 is the jobID of the
> job that does not
> complete).
>
> [vijayskumar@bm-login ~] condor_q -better-analyze
>
------------------------------------------------------------------
> 84544.000: Run analysis summary. Of 20 machines,
> 20 are rejected by your job's requirements
> 0 reject your job because of their own
> requirements
> 0 match but are serving users with a better
> priority in the pool
> 0 match but reject the job for unknown
> reasons
> 0 match but will not currently preempt their
> existing job
> 0 are available to run your job
>
> WARNING: Be advised:
> No resources matched request's constraints
>
> The Requirements expression for your job is:
>
> ( regexp("*.cse.oar.net", FileSystemDomain, "i") )
> && ( target.Arch ==
> "X86_64" ) &&
> ( target.OpSys == "LINUX" ) && ( target.Disk >=
> DiskUsage ) &&
> ( ( target.Memory * 1024 ) >= ImageSize )
>
> Job ClassAd Requirements expression evaluates to
> false
>
----------------------------------------------------------------------
>
> Why does a match not occur? Is there something wrong
> with the regular
> expression in the job classAd? Any help is
> appreciated.
>
> Thanks for your time,
>
> -Vijay
>
> PS: here is the complete classAd for the job that
> just refuses to get
> executed:
>
> MyType = "Job"
> TargetType = "Machine"
> ClusterId = 84544
> QDate = 1227117554
> CompletionDate = 0
> Owner = "vijayskumar"
> RemoteWallClockTime = 0.000000
> LocalUserCpu = 0.000000
> LocalSysCpu = 0.000000
> RemoteUserCpu = 0.000000
> RemoteSysCpu = 0.000000
> ExitStatus = 0
> NumCkpts_RAW = 0
> NumCkpts = 0
> NumJobStarts = 0
> NumRestarts = 0
> NumSystemHolds = 0
> CommittedTime = 0
> TotalSuspensions = 0
> LastSuspensionTime = 0
> CumulativeSuspensionTime = 0
> ExitBySignal = FALSE
> CondorVersion = "$CondorVersion: 7.0.1 Feb 26 2008
> BuildID: 76180 $"
> CondorPlatform = "$CondorPlatform:
> X86_64-LINUX_RHEL3 $"
> RootDir = "/"
> Iwd =
>
"/home/vijayskumar/pegasusrun/vijayskumar/pegasus/Template_P10runC1/run0001"
> JobUniverse = 5
> TransferExecutable = FALSE
> Cmd =
>
"/home/vijayskumar/installed/pegasus/default/bin/kickstart"
> MinHosts = 1
> MaxHosts = 1
> CurrentHosts = 0
> WantRemoteSyscalls = FALSE
> WantCheckpoint = FALSE
> JobStatus = 1
> EnteredCurrentStatus = 1227117554
> JobPrio = 0
> User = "vijayskumar@.oar.net"
> NiceUser = FALSE
> EnvDelim = ";"
> JobNotification = 0
> WantRemoteIO = TRUE
> UserLog = "/tmp/Template_P10runC1-053166.log"
> CoreSize = 0
> KillSig = "SIGTERM"
> Rank = 0.000000
> In = "/dev/null"
> TransferIn = FALSE
> Out =
>
"/home/vijayskumar/pegasusrun/vijayskumar/pegasus/Template_P10runC1/run0001/Template_P10runC1_0_cseri_cdir.out"
> StreamOut = FALSE
> Err =
>
"/home/vijayskumar/pegasusrun/vijayskumar/pegasus/Template_P10runC1/run0001/Template_P10runC1_0_cseri_cdir.err"
> StreamErr = FALSE
> BufferSize = 524288
> BufferBlockSize = 32768
> ShouldTransferFiles = "NO"
> TransferFiles = "NEVER"
> ImageSize_RAW = 172
> ImageSize = 175
> ExecutableSize_RAW = 172
> ExecutableSize = 175
> DiskUsage_RAW = 172
> DiskUsage = 175
> Requirements = (regexp("*.cse.oar.net",
> FileSystemDomain, "i")) && (Arch
> == "X86_64") && (OpSys == "LINUX") && (Disk >= DiskU
> sage) && ((Memory * 1024) >= ImageSize)
> FileSystemDomain = ".bmi.oar.net"
> JobLeaseDuration = 1200
> PeriodicHold = FALSE
> PeriodicRelease = (NumSystemHolds <= 3)
> PeriodicRemove = (NumSystemHolds > 3)
> OnExitHold = FALSE
> OnExitRemove = TRUE
> LeaveJobInQueue = FALSE
> Arguments = "-n pegasus::dirmanager -N
> pegasus::dirmanager:1.0 -R cseri -w
> /home/vijayskumar/pegasusrun/work /home/vijayskuma
> r/installed/pegasus/default/bin/dirmanager --create
> --dir
>
/home/vijayskumar/pegasusrun/work/pegasusexec/vijayskumar/pegasus/T
> emplate_P10runC1/run0001"
> DAGNodeName = "Template_P10runC1_0_cseri_cdir"
> pegasus_job_id = "Template_P10runC1_0_cseri_cdir"
> pegasus_wf_xformation = "pegasus::dirmanager"
> pegasus_site = "cseri"
> pegasus_generator = "Pegasus"
>
=== 以下のメッセージは省略されました ===