Hi Kent, out of curiosity: Why did the job fail when it had a *higher* limit of file descriptors? Cheers, Max > Am 16.02.2017 um 21:59 schrieb R. Kent Wenger <wenger@xxxxxxxxxxx>: > > "I have a job that runs on the command line; but it crashes when run > under HTCondor." -- probably many of us have faced a problem like > this. > > We recently worked with a user who had a job that exhibited this > behavior (it segfaulted when run under HTCondor). It > took us a while to figure out what the cause was -- environment variables > under HTCondor differed only trivially from the command line (the job > was using "getenv = true"), and the command line arguments were exactly > the same. > > We eventually figured out that the job was crashing because the file > descriptor limit when run under HTCondor was higher than when it was > run from the command line(!). This was a bit of a surprise, and clearly > indicates problems in the code of the program; but it also points up > an important, and somewhat non-obvious, way in which running a job under > HTCondor differs from running it on the command line. > > (HTCondor jobs inherit their limits from the HTCondor daemon that > spawns them. In the case of the file descriptor limit, some HTCondor > daemons need higher limits that most user jobs typically need. > We are considering changing this in the future, but this is the > current situation.) > > At any rate, system limits are something to keep in mind when debugging > this type of problem. > > Another thing that is likely to be different between running on the > command line and running under HTCondor is the umask setting (controlling > the permissions of files created by the job). This is one more thing > to check if you are having problems with jobs not working correctly > under HTCondor. > > Here's an example of a job that prints out the limits, changes the > stack size limit, and prints out the limits again. > > # File: change_limits.csh > #! /bin/csh > limit > echo "" > echo "Changing stacksize" > limit stacksize 4096 > echo "" > limit > > # File: change_limits.sub > universe = vanilla > executable = change_limits.csh > output = change_limits.out > queue > > # File: change_limits.out > cputime unlimited > filesize unlimited > datasize unlimited > stacksize unlimited > coredumpsize unlimited > memoryuse unlimited > vmemoryuse unlimited > descriptors 1024 > memorylocked 64 kbytes > maxproc 1024 > > Changing stacksize > > cputime unlimited > filesize unlimited > datasize unlimited > stacksize 4096 kbytes > coredumpsize unlimited > memoryuse unlimited > vmemoryuse unlimited > descriptors 1024 > memorylocked 64 kbytes > maxproc 1024 > > Note that the limits on your process under HTCondor will depend on your > HTCondor configuration. Also, the limits may vary according to which > universe your job runs under. > > This information is also posted on the HTCondor wiki for future > reference: > https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=JobFailsUnderCondor > > -- > R. Kent Wenger (wenger@xxxxxxxxxxx, 608-262-6627, > http://www.cs.wisc.edu/~wenger/) > Computer Sciences Department > University of Wisconsin-Madison > _______________________________________________ > HTCondor-users mailing list > To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a > subject: Unsubscribe > You can also unsubscribe by visiting > https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users > > The archives can be found at: > https://lists.cs.wisc.edu/archive/htcondor-users/
Attachment:
smime.p7s
Description: S/MIME cryptographic signature