[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Job Realtime output file



On 03/11/2013 04:34 PM, Derek Weitzel wrote:
Hi Guillermo,

Bosco does come with a built-in Condor pilot facility which can do streaming of remote output.  It does add a few more requirements to the setup, such as port 11000 open on the bosco submit host...

Anyways, it comes in the bosco package by default, so you should already it mostly setup.  As long as you meet the port requirements, it should 'just work' (famous last words?).  Here's a link to a glidein job submission file:
https://twiki.grid.iu.edu/bin/view/CampusGrids/BoscoInstall#5_2_2_Glidein_Job_submission_exa


-Derek
Already tried and reported. Glidein jobs are not working for my remote SGE cluster with grid universe.

- Guillermo



On Mar 11, 2013, at 8:19 AM, Francesco Prelz <Francesco.Prelz@xxxxxxxxxx> wrote:

On Sat, 9 Mar 2013, Guillermo Marco Puche wrote:

I know those directives are SGE directives. From my pov is SGE handles job he must be able also to handle it's own error and output logs.
The trouble here is that SGE is being handed, for a number of hard reasons, a russian doll of scripts to execute. Your job is the smallest doll, while the -o and -e directives (and yes, you are overriding the directives set by default by 'bosco') apply to the outermost doll. It's very likely that stdout and stderr are already being diverted at inner layers. If you'd really like to see streaming stdout from your job, your best option (until we have some form of out-of-the-box Condor 'standard universe' for 'grid' or 'vanilla' universe jobs, which would indeed come in handy for many other applications) is probably to set up some form of remote I/O yourself.

If you have at least outbound network connectivity from the worker nodes to the submit node you could try using 'chirp' (a standalone incarnation of the Aitch-Tee-Condor Remote I/O protocol, which may eventually be "re-"integrated into the 'grid' universe as the remote I/O method of choice).

In its simplest form:

0) Grab and install 'cctools', and make it available on the submit
  and worker nodes.
  http://www.cse.nd.edu/~ccl/software/download.shtml
  (the site seems to be down right now)

1) Start chirp_server on the submit node (will bind on port
  9094 by default, use *no* authentication/authorisation and
  write files in the current directory).

2) Run your payload on the worker nodes with
  ./payload |tee chirp_put -t -1 -b 4096 - submit_node.domain my_job_output.$$

You should then be getting a streaming update (with 4kB buffering, which is pretty much the minimum you can get by default from fstreams) of the stdout of your job(s) as 'my_job_output.script_PID' on submit_node.domain, in the directory from which you started chirp_server.

There are countless variations of this scheme (add authentication/authorisation, send the 'chirp_put' executable along with the job if you cannot install it on the worker nodes, use a different naming scheme, run the job via 'parrot', etc.) but it should serve your basic need in any environment.

Does this still make sense ?

Francesco Prelz
INFN-MI
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

      

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/