Hi All,
Â
Before, I will explain the problem I encountered, let me tell you that I worked with Condor, more than 25 years ago, and I am really happy to see that it is still around and how it has grown so nicely, anyways, since that was so long ago, I am actually
starting now from scratch, so consider me a newbie.
Â
OK and now to my problem
Â
I have a Condor cluster with two machines. Ubu2 and ubu3. They are identical VMware VMâs running Ubuntu 16.04, each with 4 cores ( I add some commands output below, with the details of what I try. Also see attached my condor-profile.txt)
Â
Ubu3 is the master and I am only submitting jobs from it. The problem is that as long as I specify machine count <= 4 , all works well. But if I try to specify a higher machine count, so that ubu2âs cores will be needed, the job just remains idle, and
as far as I can tell, condor never attempts to run it.
Â
I skimmed trough mailing list posts and various documentation sources and I couldnât find anything that could help. I am sure I am missing something in the configuration but I canât figure out what.
Â
Please help!
Â
Thanks,
Â
Oren
Â
Â
Â
Â
oren@shilo-ubu3:~/condor$ condor_q
Â
Â
OWNER BATCH_NAMEÂÂÂÂÂ SUBMITTEDÂÂ DONEÂÂ RUNÂÂÂ IDLEÂÂ HOLDÂ TOTAL JOB_IDS
Â
0 jobs; 0 completed, 0 removed, 0 idle, 0 running, 0 held, 0 suspended
oren@shilo-ubu3:~/condor$ condor_status
NameÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ OpSysÂÂÂÂÂ ArchÂÂ StateÂÂÂÂ Activity LoadAv MemÂÂ ActvtyTime
Â
Â
 Total Owner Claimed Unclaimed Matched Preempting Backfill Drain
Â
ÂÂÂÂÂÂÂ X86_64/LINUXÂÂÂÂ 8ÂÂÂÂ 0ÂÂÂÂÂÂ 0ÂÂÂÂÂÂÂÂ 8ÂÂÂÂÂÂ 0ÂÂÂÂÂÂÂÂÂ 0ÂÂÂÂÂÂÂ 0ÂÂÂÂÂ 0
Â
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ TotalÂÂÂÂ 8ÂÂÂÂ 0ÂÂÂÂÂÂ 0ÂÂÂÂÂÂÂÂ 8ÂÂÂÂÂÂ 0ÂÂÂÂÂÂÂÂÂ 0ÂÂÂÂÂÂÂ 0ÂÂÂÂÂ 0
Â
Â
oren@shilo-ubu3:~/condor$ cat sleepp.sub
universe = parallel
executable = sleep.sh
log = logfile
output = outfile.$(Node)
error = errfile.$(Node)
machine_count = 5
request_cpus = 1
should_transfer_files = Yes
when_to_transfer_output = ON_EXIT
queue
Â
oren@shilo-ubu3:~/condor$ condor_submit sleepp.sub
Submitting job(s).
1 job(s) submitted to cluster 68.
oren@shilo-ubu3:~/condor$ condor_q
Â
Â
OWNERÂÂÂ BATCH_NAMEÂÂÂÂÂÂ SUBMITTEDÂÂ DONEÂÂ RUNÂÂÂ IDLEÂ TOTAL JOB_IDS
orenÂÂÂÂ CMD: sleep.shÂÂ 8/12 13:40ÂÂÂÂÂ _ÂÂÂÂÂ _ÂÂÂÂÂ 1ÂÂÂÂÂ 1 68.0
Â
1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended
oren@shilo-ubu3:~/condor$ condr_q -better-analyze
-bash: condr_q: command not found
Â
Â
oren@shilo-ubu3:~/condor$ condor_q -better-analyze
Â
Â
The Requirements _expression_ for job 68.000 is
Â
ÂÂÂ ( TARGET.Arch == "X86_64" ) && ( TARGET.OpSys == "LINUX" ) && ( TARGET.Disk >= RequestDisk ) && ( TARGET.Memory >= RequestMemory ) &&
ÂÂÂ ( TARGET.HasFileTransfer )
Â
Job 68.000 defines the following attributes:
Â
ÂÂÂ DiskUsage = 1
ÂÂÂ ImageSize = 1
ÂÂÂ RequestDisk = DiskUsage
ÂÂÂ RequestMemory = ifthenelse(MemoryUsage =!= undefined,MemoryUsage,( ImageSize + 1023 ) / 1024)
Â
The Requirements _expression_ for job 68.000 reduces to these conditions:
Â
ÂÂÂÂÂÂÂÂ Slots
Step Matched Condition
-----Â --------Â ---------
[0]ÂÂÂÂÂÂÂÂÂÂ 8Â TARGET.Arch == "X86_64"
[1]ÂÂÂÂÂÂÂÂÂÂ 8Â TARGET.OpSys == "LINUX"
[3]ÂÂÂÂÂÂÂÂÂÂ 8Â TARGET.Disk >= RequestDisk
[5]ÂÂÂÂÂÂÂÂÂÂ 8Â TARGET.Memory >= RequestMemory
[7]ÂÂÂÂÂÂÂÂÂÂ 8Â TARGET.HasFileTransfer
Â
Â
068.000:Â Job has not yet been considered by the matchmaker.
Â
Â
068.000: Run analysis summary ignoring user priority. Of 8 machines,
ÂÂÂÂÂ 0 are rejected by your job's requirements
ÂÂÂÂÂ 0 reject your job because of their own requirements
ÂÂÂÂÂ 0 match and are already running your jobs
ÂÂÂÂÂ 0 match but are serving other users
ÂÂÂÂÂ 8 are available to run your job
Â
Â
Â
Â
Â