[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] capturing job classad in DAG POST script

Date: Fri, 26 Mar 2010 12:07:16 -0400
From: Ian Stokes-Rees <ijstokes@xxxxxxxxxxxxxxxxxxx>
Subject: [Condor-users] capturing job classad in DAG POST script

I would like to capture the classad for a job that has just completed when my DAG POST script runs. The problem I have is that the classad is usually no longer available via condor_q -l $ClusterId. Is my only solution to add a "sleep X" statement, where X is suitably long for "condor_history -l $ClusterId" to work?

And in the "condor_history" classad, how much of the information about where the job ran will still be available? FWIW, I'm running jobs in the "grid/gt2" universe as part of Open Science Grid. What I'm looking for, in particular, are details about where failed jobs were trying to run. I can also pull this info out of the job log files, but because I run large DAGs, I have a single log file for the DAG, and a single shared log file for all DAG node jobs -- it is difficult to quickly pull out failure information from this, and would be much nicer if my POST script could capture this information quickly and record it to a "failed job log". Using the classad is my idea for how to capture this information.

If "sleep X" is my only option, what is a reasonable value of X, for a system where there are perhaps 6000 queued jobs, and jobs completing at a rate of about once every 10 seconds.

Thanks,

Ian

-- 
Ian Stokes-Rees, PhD                       W: http://hkl.hms.harvard.edu
ijstokes@xxxxxxxxxxxxxxxxxxx               T: +1 617 432-5608 x75
NEBioGrid, Harvard Medical School          C: +1 617 331-5993

begin:vcard
fn:Ian Stokes-Rees, PhD
n:Stokes-Rees;Ian
org:Harvard Medical School;Biological Chemistry and Molecular Pharmacology
adr;dom:;;250 Longwood Ave;Boston;MA;02115
email;internet:ijstokes@xxxxxxxxxxxxxxxxxxx
title:Research Associate, Sliz Lab
tel;work:+1 617 432-5608 x75
tel;fax:+1 617 432-5600
tel;cell:+1 617 331-5993
url:http://hkl.hms.harvard.edu
version:2.1
end:vcard

Prev by Date: Re: [Condor-users] Directory creation problem with 7.4.1 on Fedora 12?
Next by Date: Re: [Condor-users] Gracefully stopping DAGMAN
Previous by thread: Re: [Condor-users] Commercial Condor Providers
Next by thread: [Condor-users] Possibility of PVM in condor?
Index(es):
- Date
- Thread