[HTCondor-devel] Advice on appropriate design / use of HT Condor


Date: Mon, 19 Oct 2020 21:43:19 +0000 (UTC)
From: fordesmith@xxxxxxxxx
Subject: [HTCondor-devel] Advice on appropriate design / use of HT Condor
ï
Hi there,

This is my first post. 

I've been experimenting with a POC of a HTCondor (v8.8.1) cluster on GCP (Centos7) to run a a c++ application that delivers a Monte Carlo simulation framework for contemporary financial risk analytics and value adjustments. 

I have set up the cluster and can run simple tests on it. It utilises a machine image as the base machine for the cluster with the relevant compiled code (as this process takes 1-2 hours to run). 

Currently the legacy application conducting this analysis can take up to 8 hours and this POC is aiming to provide a way to dramatically reduce the runtime and also produce additional analyses.

The current data/job flow I have been using doesn't work: 
* Submit job (and transfer credentials)
* Download analysis specification, product portfolio and market input files from google storage per counterparty (via gsutil) 
* Run  c++ app with initial input files
* Writes outputs (e.g. monte carlo simulation outputs) to local dir
* Upload back to google storage per counterparty

I have made sure I can run the app on the condor-compute node if I SSH directly to it.

Some questions ... perhaps you can help me understand how HTCondor works?
* Can I specify under which user the jobs run? Currently itâs running as the user "nobody" and permissions are at least one of the problems. Can I run as another user with the correct permissions? I haven`t been able to find information on this. 
* Does HTCondor allow what I am trying to do....create sub directories to pull inputs from, then write to a directory then upload to GCP? Most of the examples Iâve read require passing all of the files at the time of jobs submission.
* Finally, I tried to debug in interactive mode as per a tutorial but receive the notice "this account is not available" - I couldn't find information on this.

Overall, likely both a permission issue with the "nobody" user and perhaps an environment variable issue. 

Your help most appreciated. 

Regards,

Forde




[← Prev in Thread] Current Thread [Next in Thread→]