Dear Condor users, I am a new user of HTCondor and after lot of tutorials I am not able to understand what I am facing currently. I am using the following command : condor_submit jobs_desc_test_condor.cfg for this condor version :
the config file is very simple (the defaut Universe is Vanilla from what I understand) : Executable = $(Chunk)/./batchScript.sh
my python work environment builds the necessary directories dirs test_condor/*_Chunk* and the batchScript.sh are in in these directories. This batchScript.sh is mainly making a list of input files to be read by an executable to generate some output log files, and do the proper setups and get back the output files. I am confident that the executable is working fine interactively and on the batch system (I have even tried to run the remote command locally and it runs nicely). This executable can have a lot of input files and that is why I split the job in Chunks to speed up the process. For my test I do 10 Chunks.
What I am seeing is that if I run a batch job with the command : condor_submit jobs_desc_tttt_condor.cfg
I never have the 10 Chunks (sub-jobs) succeeding. And if I redo this exact command I got another set of Chunks succeeding ... And every Chunks can succeed but not all in the same time. Here
is the list of succeeded Chunks for each test. test -> succeeded Chunks : 1 -> 0, 2, 3, 7 2 -> 1, 6, 7 3 -> 0, 1, 3, 4, 5, 6, 8, 9 4 -> 1, 4, 8 5 -> 3, 5 6 -> 1, 2, 4, 9 7 -> 8 So I can see that each Chunk has the possibility to succeed !! So I conclude that my executable and the input files are safe.
Now I was wondering maybe there are problem with time or cpu limitations, so I have tried to play with : RequestCpus=4 and/or JobFlavour = "longlunch" or "microcentury" or "espresso" but for any of combinations I can have all the Chunks done successfully. (I know that each Chunk can run locally in 2 minutes). And when I use longlunch, I am stuck in idle for very long times
(more than 1 hour).
I cannot believe that HTCondor could be so weak to reproduce such easy tasks. So is there any tips I am missing to have all my Chunks successfully done ?
|