Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[HTCondor-users] Failed to connect to schedd
- Date: Sun, 07 Jan 2018 16:32:54 -0500
- From: Larry Martell <larry.martell@xxxxxxxxx>
- Subject: [HTCondor-users] Failed to connect to schedd
I am submitting jobs from python in a loop that has this:
sub = htcondor.Submit(submit_dict)
with schedd.transaction() as txn:
id = sub.queue(txn)
I want to submit thousands of jobs, each one with a different
submit_dict. What happens is the first 24 get submitted, then I start
to get 'Failed to connect to schedd' from the call to
schedd.transaction().
I'll get that twice, then I can submit 12 jobs, then I get the error
once, then I can submit 6 jobs. It continues like this, a few errors,
a few successful submits.
This is my MAX_JOBS_RUNNING setting on the master:
condor_config_val MAX_JOBS_RUNNING
MIN({23933, 10000})
And this is it on both execute hosts:
condor_config_val MAX_JOBS_RUNNING
MIN({128651, 10000})
condor_status shows 352 slots available.
I don't see any errors in the submit log. Anyone know how I can fix
this and/or debug it further?