[HTCondor-users] condor 8.6.5

Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

Hi,

I'm running condor.x86_64 (8.6.5-1.el7), installed via yum, on a cluster of linux machines running RHEL7.

To test the install, I wrote a small python program (below) to submit to the pool.

So far as I can tell, the pool accepts the job, but then via condor_q the job "holds" indefinitely. Is there a config or submit detail I screwed up? I reread the install/config instructions and haven't found my error yet.

I'm submitting a job from my user (non-root) account on one of the cluster machines. All machines are eligible to submit. Do I need to start the job from shared (NFS) scratch space of something like that? I didn't see much about file structure in the install documentation.

Any suggestions would be appreciated!

Nathan

Here's the queue:

[nmoore@pilgrim condor_sub]$ condor_q

-- Schedd: pilgrim : <199.17.158.20:9618?... @ 08/04/17 11:03:15

OWNER BATCH_NAME SUBMITTED DONE RUN IDLE HOLD TOTAL JOB_IDS

nmoore CMD: estimate_pi.py 8/4 10:07 _ _ _ 3 3 2.0 ... 4.0

nmoore CMD: estimate_pi-2.py 8/4 10:15 _ _ _ 1 1 5.0

4 jobs; 0 completed, 0 removed, 0 idle, 0 running, 4 held, 0 suspended

Here's a submit script:

[nmoore@pilgrim condor_sub]$ cat submit_file

executable = estimate_pi.py

universe = vanilla

output = job.out

error = job.error

log = job.log

queue

And here's the python program - note all machines have python3 available in path:

[nmoore@pilgrim condor_sub]$ cat estimate_pi.py

#!/usr/bin/python3

#

# Nathan Moore, Winona State

# 2017-Aug-4

#

# PROGRAM DESCRIPTION

#

# This example program estimates the value of pi via a random number generator.

# two random numbers, x and y, are generated, each in the space [0,1). This

# means a point within a square of edge length 1.0 has been randomly generated.

# If the square is overlaid with a circle, centered at 0,0, the area of the box

# is 1.0^2 and the area of the circle inside the box is (pi*1.0^2)/4, because only

# one quarter of the circle is inside the square.

# Then, the ratio of random points inside the circle over numbers generated (inside

# approaches the ratio of the quarter circle area over box area, pi/4

#

# Note, since this is a random process, uncertainty in num_inside goes as sqrt(num_inside)

# so convergence to a reasonable approximation to pi is quite slow (eg, if you want the

# method to be accurate to one in a hundred, you'll probably have to generate 100^2 points

import math

import random

seed_value=209

limit=1000

random.seed(seed_value)

num_inside=0

for i in range(limit):

x=random.random()

y=random.random()

r_sqr=x*x+y*y

if(r_sqr<1.0) :

num_inside+=1

#print(x,y,r_sqr,num_inside)

print("# estimate of pi/4 is ",num_inside/limit)

est_pi=4*num_inside/limit

print("# which gives pi as ",est_pi)

# print out data line

print("# seed, num_trials, num_inside, pi estimate")

print(seed_value,",",limit,",",num_inside,",",est_pi,",")

# In[33]:

# write the results to file, include the random seed value in the filename

# open the file

filename="pi_results.seed."+str(seed_value)+".csv"

#print(filename)

f=open(filename,"w")

# write results to file

line="# estimate of pi/4 is %6.4f \n" % (num_inside/limit)

f.write(line)

est_pi=4*num_inside/limit

line="# which gives pi as %6.4f \n" % (est_pi)

f.write(line)

line="# seed, num_trials, num_inside, pi estimate,\n"

f.write(line)

line="%d,%d,%d,%10.8f\n" % (seed_value,limit,num_inside,est_pi)

f.write(line)

f.close()

Within the pool, everything is unclaimed and idle:

[root@toulouse ~]# condor_status

Name OpSys Arch State Activity LoadAv Mem ActvtyTime

slot1@albatross LINUX X86_64 Unclaimed Idle 0.000 2674 0+00:34:36

slot2@albatross LINUX X86_64 Unclaimed Idle 0.000 2674 0+00:35:03

slot3@albatross LINUX X86_64 Unclaimed Idle 0.000 2674 0+00:35:03

slot4@albatross LINUX X86_64 Unclaimed Idle 0.000 2674 0+00:35:03

slot5@albatross LINUX X86_64 Unclaimed Idle 0.000 2674 0+00:35:03

slot6@albatross LINUX X86_64 Unclaimed Idle 0.000 2674 0+00:35:03

slot7@albatross LINUX X86_64 Unclaimed Idle 0.000 2674 0+00:35:03

slot8@albatross LINUX X86_64 Unclaimed Idle 0.000 2674 0+00:35:03

...

slot3@wyandotte LINUX X86_64 Unclaimed Idle 0.000 3988 0+00:30:04

slot4@wyandotte LINUX X86_64 Unclaimed Idle 0.000 3988 0+00:30:04

slot5@wyandotte LINUX X86_64 Unclaimed Idle 0.000 3988 0+00:30:04

slot6@wyandotte LINUX X86_64 Unclaimed Idle 0.000 3988 0+00:30:04

slot7@wyandotte LINUX X86_64 Unclaimed Idle 0.000 3988 0+00:30:04

slot8@wyandotte LINUX X86_64 Unclaimed Idle 0.000 3988 0+00:30:04

Machines Owner Claimed Unclaimed Matched Preempting Drain

X86_64/LINUX 56 0 0 56 0 0 0

Total 56 0 0 56 0 0 0

Mailing List Archives

Authenticated access

[HTCondor-users] condor 8.6.5