On Mon, Feb 14, 2005 at 05:11:32PM -0600, Daniel Forrest wrote:
> Following up on my own post...
>
> >> I have hit a problem with Condor and PATH_MAX.
> >>
> >> When opening a file, if the path length is >= 239 and <= 243 or ==
> >> 246 then it exits with signal 11. If the path length is >= 244 and
> >> <= 252 and != 246 then it goes into an infinite loop where the
> >> ShadowLog shows over and over again "Requesting Primary Starter",
> >> but there is never any indication of why the shadow exits.
> >>
> >> I am guessing there is some kind of buffer overrun which is causing
> >> various kinds of problems depending on how much is overwritten. My
> >> understanding is that Condor supports POSIX's PATH_MAX of 256.
> >>
> >> This is Condor 6.6.2. Is this a known problem?
>
> Lookup through the release notes for 6.6.6 I see:
>
> Fixed a problem where the condor_starter could crash if the job it
> was running used Condor's file transfer mechanism and the full path
> names to the job's files became longer than a few hundred characters.
>
> So I updated to 6.6.8 on Friday and relinked my executables.
>
> Now if the path length is >= 242 and <= 246 it exits with signal 11.
> If the path length is >= 247 it goes into the infinite loop. So the
> behavior has changed, but it isn't fixed.
>
I know it's a problem, but I don't know the exact details.
What's happening is internally, the remote syscall library rewrites
some of the pathnames to be slightly different URLs - instead of opening
'/tmp/foo', the syscall library opens things like 'remote:/tmp/foo'. The
'remote:' counts against your POSIX_PATH_MAX.
Where the details get sketchy for me is why we don't internally allocate
something like CONDOR_POSIX_PATH_MAX that is bigger than P_P_M, I'll ask
around.
-Erik
|