Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] job status from python
- Date: Thu, 11 Jan 2018 22:07:31 -0500
- From: Larry Martell <larry.martell@xxxxxxxxx>
- Subject: Re: [HTCondor-users] job status from python
Thanks so much. This is extremely helpful to me. I have code for all
this implemented and it's almost working - I just have 1 issue - when
I try and remove 1 job from the queue they all get removed.
This is my queue:
$ condor_q -all
-- Schedd: bach.elucid.local : <192.168.10.2:9618?... @ 01/11/18 22:03:37
OWNER BATCH_NAME SUBMITTED DONE RUN IDLE
TOTAL JOB_IDS
prod_user CMD: compute_radiology.py 1/11 21:32 _ _ _
87 1739.0 ... 1825.0
87 jobs; 87 completed, 0 removed, 0 idle, 0 running, 0 held, 0 suspended
Then I run this:
schedd.act(htcondor.JobAction.Remove, '1763')
which returns:
[
TotalJobAds = 87;
TotalPermissionDenied = 0;
TotalAlreadyDone = 0;
TotalNotFound = 0;
TotalSuccess = 87;
TotalChangedAds = 1;
TotalBadStatus = 0;
TotalError = 0
]
and then the queue is empty:
$ condor_q -all
-- Schedd: bach.elucid.local : <192.168.10.2:9618?... @ 01/11/18 22:04:32
OWNER BATCH_NAME SUBMITTED DONE RUN IDLE HOLD TOTAL JOB_IDS
0 jobs; 0 completed, 0 removed, 0 idle, 0 running, 0 held, 0 suspended
How can I just remove the one job from the queue?
Thanks!
On Tue, Jan 9, 2018 at 9:34 AM, Jason Patton <jpatton@xxxxxxxxxxx> wrote:
> For looking up jobs that have finished, there is
> htcondor.Schedd().history(expression, projection), where you could
> query for something like history(expression = 'ClusterId == 123',
> projection = []) to get the ClassAd of job 123 *if* it has cleared the
> queue. However, querying the history is ***very slow***.
>
> Two better options:
> 1) Parse and watch the job log file, if your script can get to it. The
> job log file will update when the job has started running, update
> occasionally with resource usage, and update with the exit code when
> it has finished.
>
> 2) Leave the job in the queue when it's completed and have your script
> remove it:
>
> In your Submit objects, set { 'leave_in_queue': '(JobStatus == 4)' },
> which means when a job has completed, leave it in the queue. See
> https://research.cs.wisc.edu/htcondor/manual/current/12_Appendix_A.html
> to see what the value of JobStatus means.
>
> Query for jobs' ClassAds using htcondor.Schedd().xquery().
>
> If JobStatus == 4 and/or if ExitCode is defined, then you know the job
> is done. Remove it from the queue by sending a
> htcondor.Schedd().act(htcondor.JobAction.Remove, str(ClusterId)). (You
> can also send a list of ClusterIds.)
>
>
> Constantly querying the Schedd probably won't scale well with the size
> of the queue, so if you can use the job log, that's probably the
> better of the two solutions.
>
> Jason
>
> On Jan 8, 2018 5:24 PM, "Larry Martell" <larry.martell@xxxxxxxxx> wrote:
>>
>> Is there any python API for checking the status of jobs?
>>
>> On Sun, Jan 7, 2018 at 3:47 PM Larry Martell <larry.martell@xxxxxxxxx> wrote:
>>>
>>> I am submitting jobs like this:
>>>
>>> sub = htcondor.Submit(submit_dict)
>>> with schedd.transaction() as txn:
>>> id = sub.queue(txn)
>>>
>>> Now I want to be able to tell if the job has completed or not, and
>>> when it has completed, if it succeeded or failed. How can I do that?