Hi Peter, just a random guess (out of paranoia), but can you check for the binaries etc. security contexts (and maybe other attributes?) ls -lZ /path/to/foo lsattr /path/to/foo getfattr -d /path/to/foo Just in case something odd happens between executions as the user vs. user switching. Cheers, Thomas On 27/10/2021 14.06, Jason Patton wrote: > Peter, > > Is HTCondor able to create the output and error files specified in your > job, and are you able to modify the runscript on the (or a targeted) > execute host to print some information to stdout or stderr? It could be > useful to have the runscript print out the environment at the line > before the solver runs and compare for both interactive and batch modes. > Also, consider having the runscript print out each command to see if the > script exits before it starts running the solver. > > Jason > > On 10/27/21 3:25 AM, Peter Ellevseth wrote: >> Christoph >> >> The runscript uses only absolute paths. >> >> We just got a new version of this code where I get this problem with >> the new version, and not with the old version. I check ldd for the >> binaries of both versions and get the same result. >> >> Have discussed with supplier of the cfd code and the didnât have any >> good suggestions yet. >> >> P >> >> *From:* HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> *On Behalf >> Of *Beyer, Christoph >> *Sent:* onsdag 27. oktober 2021 08:24 >> *To:* htcondor-users <htcondor-users@xxxxxxxxxxx> >> *Subject:* Re: [HTCondor-users] Job not starting correctly >> >> Hi, >> >> make sure all the paths you need are set in the bash script or use >> absolute paths if in doubt. The interactive login uses ssh mechanisms >> and therefore sources your environment which is not necessarily the >> case in a regular condor job. >> >> Try ldd <binary> to check if the libraries the binary uses are hidden >> somewhere and put all these paths in your bash script (LD_LIBRARY_PATH >> etc) ... >> >> best >> >> christoph >> >> >> -- >> Christoph Beyer >> DESY Hamburg >> IT-Department >> >> Notkestr. 85 >> Building 02b, Room 009 >> 22607 Hamburg >> >> phone:+49-(0)40-8998-2317 >> mail: christoph.beyer@xxxxxxx <mailto:christoph.beyer@xxxxxxx> >> >> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------! ---- >> >> >> *Von: *"Peter Ellevseth" <Peter.Ellevseth@xxxxxxxxxx >> <mailto:Peter.Ellevseth@xxxxxxxxxx>> >> *An: *"htcondor-users" <htcondor-users@xxxxxxxxxxx >> <mailto:htcondor-users@xxxxxxxxxxx>> >> *Gesendet: *Dienstag, 26. Oktober 2021 20:26:08 >> *Betreff: *Re: [HTCondor-users] Job not starting correctly >> >> Jason >> >> We have a shared file system between all nodes. When I run >> condor_submit -interactive I get a shell in the same folder as I was >> previously, but from the âviewâ of the execute node. I can then >> execute simply by â./runscriptâ. >> >> Yes, I get the normal log/out/error files. >> >> I have checked the env and there is nothing there that tells me why >> the job wonât start. >> >> I can also ssh to one of my startd machines and start the job manually >> with the runscript. >> >> Loss of ideas here now. >> >> P >> >> *From:*HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx >> <mailto:htcondor-users-bounces@xxxxxxxxxxx>> *On Behalf Of *Jason Patton >> *Sent:* tirsdag 4. mai 2021 14.43 >> *To:* HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx >> <mailto:htcondor-users@xxxxxxxxxxx>> >> *Subject:* Re: [HTCondor-users] Job not starting correctly >> >> Hi Peter, >> >> You say that when you submit an interactive job, you run the script by >> doing "./runscript". Do your jobs ever use condor file transfer or is >> your pool set up to assume a shared file system? >> >> When you submit the job normally, do you still get back the output >> (stdout) and error (stderr) files? It might be useful to print out the >> environment at the very beginning of the script and compare between a >> normal job and an interactive job. >> >> Jason Patton >> >> On Mon, May 3, 2021 at 5:04 PM Peter Ellevseth >> <Peter.Ellevseth@xxxxxxxxxx <mailto:Peter.Ellevseth@xxxxxxxxxx>> wrote: >> >>  Gents >> >>  We are running a commercial CFD-code via htcondor. Been doing it >> for years without any issued. I installed a new version of that >> software and want to run it via htcondor as per usual. I to this by >> telling condor to run a locally installed bash-script on the execute >> node which in turn starts the CFD-solver. I have to do it this to >> source some files need by the solver to start (license etc). >> >>  However, the new version is refusing to start. From the the >> StarterLog.slotX I see the job immediately stops with >> >>  05/03/21 23:56:33 (pid:4135578) Create_Process succeeded, pid=4135579 >> >>  05/03/21 23:56:33 (pid:4135578) Process exited, pid=4135579, >> status=139 >> >>  05/03/21 23:56:33 (pid:4135578) Got SIGQUIT. Performing fast >> shutdown. >> >>  If I ssh in to one of the execute nodes I can start it just and it >> runs as normal. >> >>  If I do condor_submit -interactive my_submit_file, I am able to >> run the script with ./runscript just fine. >> >>  The why wonât it start when I submit the file normally?? >> >>  Peter >> >>  _______________________________________________ >>  HTCondor-users mailing list >>  To unsubscribe, send a message to >> htcondor-users-request@xxxxxxxxxxx >> <mailto:htcondor-users-request@xxxxxxxxxxx>with a >>  subject: Unsubscribe >>  You can also unsubscribe by visiting >>  https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users >> <https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users> >> >>  The archives can be found at: >>  https://lists.cs.wisc.edu/archive/htcondor-users/ >> <https://lists.cs.wisc.edu/archive/htcondor-users/> >> >> >> _______________________________________________ >> HTCondor-users mailing list >> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx >> <mailto:htcondor-users-request@xxxxxxxxxxx> with a >> subject: Unsubscribe >> You can also unsubscribe by visiting >> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users >> <https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users> >> >> The archives can be found at: >> https://lists.cs.wisc.edu/archive/htcondor-users/ >> <https://lists.cs.wisc.edu/archive/htcondor-users/> >> >> >> _______________________________________________ >> HTCondor-users mailing list >> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx >> with a >> subject: Unsubscribe >> You can also unsubscribe by visiting >> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users >> >> The archives can be found at: >> https://lists.cs.wisc.edu/archive/htcondor-users/ >> > > _______________________________________________ > HTCondor-users mailing list > To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a > subject: Unsubscribe > You can also unsubscribe by visiting > https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users > > The archives can be found at: > https://lists.cs.wisc.edu/archive/htcondor-users/
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature