htcondor.JobEventLog(path_to_logfile) is indeed what you want to use here:
In case it's not clear from the condor_watch_q code, what you want to do is put an inner "for event in jel.events(stop_after=1)" loop inside an outer loop, then break from the outer loop after accumulating the number of htcondor.JobEventType.JOB_TERMINATED events that you expect to see (or after some timeout period).
The condor_watch_q code can be a little heavy, so I whipped up a simple function that I think accomplishes what you want to do:
import htcondor
import time
def wait_for_job(logfile, num_jobs, timeout=None):
start = time.time()
completed = 0
jel = htcondor.JobEventLog(my_log_file)
while True:
for event in jel.events(stop_after=0):
completed += int(event.type == htcondor.JobEventType.JOB_TERMINATED)
if event.type in { # catch some non-termination events that halt job progress
htcondor.JobEventType.JOB_ABORTED,
htcondor.JobEventType.JOB_HELD,
htcondor.JobEventType.CLUSTER_REMOVE,
}:
raise RuntimeError("A job was aborted, held, or removed")
if completed >= my_job_count: # jobs completed
break
if timeout is not None and (time.time() - start) > timeout:
raise RuntimeError("Timed out waiting for job to complete")
time.sleep(1) # wait one second before polling again
This is certainly not perfect by any means (there are other events you might want to raise an exception on, or maybe you don't want to raise an exception at all), but hopefully it gets the idea across.
Jason Patton