[DynInst_API:] [PATCH] Improve script #! parsing


Date: Wed, 31 Oct 2012 19:25:46 -0700
From: Josh Stone <jistone@xxxxxxxxxx>
Subject: [DynInst_API:] [PATCH] Improve script #! parsing
Hi,

I found a few issues in the way Dyninst is parsing #! lines, and wrote
the attached patch to hopefully make it more robust.  Please let me know
if it needs any adjustment.

It's also possible for such scripts to reference yet another #! script
for the interpreter (in Linux, up to BINPRM_BUF_SIZE=128 levels deep).
I didn't write that recursion yet, as I doubt it's very common, but that
might be a good followup.

Thanks,
Josh
>From e44d4973997c911928b6d5f23fc19fce588884c1 Mon Sep 17 00:00:00 2001
From: Josh Stone <jistone@xxxxxxxxxx>
Date: Wed, 31 Oct 2012 19:22:33 -0700
Subject: [PATCH] Improve script #! parsing

This fixes a few shortcomings in BPatch's buildPath():

- The original path and argv[0] may not necessarily be the same, but the
  former should replace the latter in the new argv list.

- The #! line may optionally include a single argument for the
  interpreter, often used like "#!/usr/bin/env python".

- The NULL to terminate the new argv was clobbering the last argument
  from the original argv.

I modeled the exact #!-parsing details after Linux's fs/binfmt_script.c.
---
 dyninstAPI/src/BPatch.C | 59 +++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 47 insertions(+), 12 deletions(-)

diff --git a/dyninstAPI/src/BPatch.C b/dyninstAPI/src/BPatch.C
index d6f417d..e3c9bef 100644
--- a/dyninstAPI/src/BPatch.C
+++ b/dyninstAPI/src/BPatch.C
@@ -1070,21 +1070,56 @@ static void buildPath(const char *path, const char **argv,
    }
 
    // A shell script, so reinterpret path/argv
-   std::string interp = line.substr(2);
-   pathToUse = (char *) malloc(interp.length()+1);
-   strncpy(pathToUse, interp.c_str(), interp.length()+1);
-   // I'd prefer an argc, but hey
-   int count = 0;
-   while(argv[count] != NULL) {
-      count++;
+
+   // Modeled after Linux's fs/binfmt_script.c
+   // #! lines have the interpreter and optionally a single argument,
+   // all separated by spaces and/or tabs.
+
+   size_t pos_start = line.find_first_not_of(" \t", 2);
+   if (pos_start == std::string::npos) {
+      file.close();
+      return;
+   }
+   size_t pos_end = line.find_first_of(" \t", pos_start);
+   std::string interp = line.substr(pos_start, pos_end - pos_start);
+   pathToUse = strdup(interp.c_str());
+
+   std::string interp_arg;
+   pos_start = line.find_first_not_of(" \t", pos_end);
+   if (pos_start != std::string::npos) {
+      // The argument goes all the way to the last non-space/tab,
+      // even if there are spaces/tabs in the middle somewhere.
+      pos_end = line.find_last_not_of(" \t") + 1;
+      interp_arg = line.substr(pos_start, pos_end - pos_start);
+   }
+
+   // Count the old and new argc values
+   int argc = 0;
+   while(argv[argc] != NULL) {
+      argc++;
+   }
+   int argcToUse = argc + 1;
+   if (!interp_arg.empty()) {
+      argcToUse++;
+   }
+   argvToUse = (char **) malloc((argcToUse+1) * sizeof(char *));
+
+   // The interpreter takes the new argv[0]
+   int argi = 0;
+   argvToUse[argi++] = strdup(pathToUse);
+
+   // If there's an interpreter argument, that's the new argv[1]
+   if (!interp_arg.empty()) {
+      argvToUse[argi++] = strdup(interp_arg.c_str());
    }
-   argvToUse = (char **) malloc((count+1) * sizeof(char *));
-   argvToUse[0] = strdup(pathToUse);
 
-   for (int tmp = 0; tmp < count; ++tmp) {
-      argvToUse[tmp+1] = strdup(argv[tmp]);
+   // Then comes path, *replacing* the old argv[0],
+   // and the old argv[1..] are filled in for the rest
+   argvToUse[argi++] = strdup(path);
+   for (int tmp = 1; tmp < argc; ++tmp) {
+      argvToUse[argi++] = strdup(argv[tmp]);
    }
-   argvToUse[count] = NULL;
+   argvToUse[argcToUse] = NULL;
    file.close();
 }
 
-- 
1.7.11.7

[← Prev in Thread] Current Thread [Next in Thread→]
  • [DynInst_API:] [PATCH] Improve script #! parsing, Josh Stone <=