Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] Intermittent failure submitting via Python bindings
- Date: Mon, 9 Nov 2020 15:29:11 +0000
- From: John M Knoeller <johnkn@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] Intermittent failure submitting via Python bindings
Is there a message in the SchedLog corresponding to that failure?
-----Original Message-----
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Michael Pelletier via HTCondor-users
Sent: Monday, November 9, 2020 9:20 AM
To: HTCondor-Users Mail List (htcondor-users@xxxxxxxxxxx) <htcondor-users@xxxxxxxxxxx>
Cc: Michael Pelletier <michael.v.pelletier@xxxxxxxxxxxx>
Subject: [HTCondor-users] Intermittent failure submitting via Python bindings
A user here is getting intermittent failures of job submissions through the Python bindings, throwing the following error text:
=====
File "/user/1148605/sandbox/hpc_interface/Resources/htcondor_compute_resource.py", line 125, in run
self.cluster_ids.append(sub.queue(txn, jobs, ad_results))
File "/scratch/ml/sandbox/1148605/mosp_htcondor/lib/python3.8/site-packages/htcondor/_lock.py", line 69, in wrapper
rv = func(*args, **kwargs)
RuntimeError: job 45263.-1 failed to set CurrentHosts=0 (110)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "main_python.py", line 52, in <module>
print(hpc.run(run_dict=submit_dict))
File "/user/1148605/sandbox/hpc_interface/hpc_interface.py", line 32, in run
return self.compute_resource.run(run, self.run_number, jobs)
File "/user/1148605/sandbox/hpc_interface/Resources/htcondor_compute_resource.py", line 125, in run
self.cluster_ids.append(sub.queue(txn, jobs, ad_results))
File "/scratch/ml/sandbox/1148605/mosp_htcondor/lib/python3.8/site-packages/htcondor/_lock.py", line 99, in __exit__
return self.cm.__exit__(*args, **kwargs)
File "/scratch/ml/sandbox/1148605/mosp_htcondor/lib/python3.8/site-packages/htcondor/_lock.py", line 69, in wrapper
rv = func(*args, **kwargs)
RuntimeError: Failed to abort transaction.
terminate called after throwing an instance of 'boost::python::error_already_set'
=====
As you can see, it gets the ClusterID (indicated by 45263.-1 above) so it's gotten past a certain point in the processing, but I'm having trouble pinning down the cause of the failure. Why did it fail to set CurrentHosts?
Thanks for any suggestions
Michael V Pelletier
Principal Engineer
C: +1 339.293.9149
michael.v.pelletier@xxxxxxx
Raytheon Technologies
Information Technology
50 Apple Hill Drive
Tewksbury, MA 01876-1198
RTX.com | LinkedIn | Twitter | Instagram
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/