Hi,
I'm running condor.x86_64 (8.6.5-1.el7), installed via yum, on a cluster of linux machines running RHEL7.
To test the install, I wrote a small python program (below) to submit to the pool.
So far as I can tell, the pool accepts the job, but then via condor_q the job "holds" indefinitely. Is there a config or submit detail I screwed up? I reread the install/config instructions and haven't found my error yet.
I'm submitting a job from my user (non-root) account on one of the cluster machines. All machines are eligible to submit. Do I need to start the job from shared (NFS) scratch space of something like that? I didn't see much about file structure in the install documentation.
Any suggestions would be appreciated!
Nathan
Here's the queue:
And here's the python program - note all machines have python3 available in path:
Within the pool, everything is unclaimed and idle:
[root@toulouse ~]# condor_status
Name OpSys Arch State Activity LoadAv Mem ActvtyTime
slot1@albatross LINUX X86_64 Unclaimed Idle 0.000 2674 0+00:34:36
slot2@albatross LINUX X86_64 Unclaimed Idle 0.000 2674 0+00:35:03
slot3@albatross LINUX X86_64 Unclaimed Idle 0.000 2674 0+00:35:03
slot4@albatross LINUX X86_64 Unclaimed Idle 0.000 2674 0+00:35:03
slot5@albatross LINUX X86_64 Unclaimed Idle 0.000 2674 0+00:35:03
slot6@albatross LINUX X86_64 Unclaimed Idle 0.000 2674 0+00:35:03
slot7@albatross LINUX X86_64 Unclaimed Idle 0.000 2674 0+00:35:03
slot8@albatross LINUX X86_64 Unclaimed Idle 0.000 2674 0+00:35:03
...slot3@wyandotte LINUX X86_64 Unclaimed Idle 0.000 3988 0+00:30:04
slot4@wyandotte LINUX X86_64 Unclaimed Idle 0.000 3988 0+00:30:04
slot5@wyandotte LINUX X86_64 Unclaimed Idle 0.000 3988 0+00:30:04
slot6@wyandotte LINUX X86_64 Unclaimed Idle 0.000 3988 0+00:30:04
slot7@wyandotte LINUX X86_64 Unclaimed Idle 0.000 3988 0+00:30:04
slot8@wyandotte LINUX X86_64 Unclaimed Idle 0.000 3988 0+00:30:04
Machines Owner Claimed Unclaimed Matched Preempting Drain
X86_64/LINUX 56 0 0 56 0 0 0
Total 56 0 0 56 0 0 0
|