When reading through the doc, it kind of assumes some environment but never explicitly states what it is. So let me be explicit and perhaps it might trigger other ideas.
1. I have 3 nodes involved: a submit host which "spools" the job to the condor server (running master, collector, negotiator, procd, schedd, and n shadows), and an execution node (running master, startd, and starter).
2. /etc/condor/condor_config on the execution node contains: # Job hooks STARTER_JOB_HOOK_KEYWORD = MYHOOK #MYHOOK_HOOK_JOB_EXIT = /usr/local/bin/munin-node-condor-job-exit MYHOOK_HOOK_JOB_EXIT = /usr/local/bin/job_exit.sh 3. /usr/local/bin on the execution node contains: -rwxr-xr-x 1 root root 47 Jun 16 16:33 /usr/local/bin/job_exit.sh-rwxr-xr-x 1 root root 102 Jun 16 13:18 /usr/local/bin/munin-node-condor-job-exit
4. The job file contains: +HookKeyword="MYHOOK"I read a RH bug report that had bash scripts for the hooks, all with ".sh" suffixes, instead of a perl scripts so I thought I would try that and make sure that was not the problem. But it doesn't make any difference. So I'm still probing and hoping for other ideas. Colin.
Matthew Farrellee wrote:
Closer reading this morning...You need MYHOOK_HOOK_JOB_EXIT in your config and to add +HookKeyword="MYHOOK" in your submit file.Best, matt On 06/16/2011 06:57 PM, Colin Leavett-Brown wrote:Hi Matthes, I set ALL_DEBUG = FULL_DEBUG and transferred back from the execution node both StartLog and StarterLog; neither one have any indication the MYHOOK_HOOK_JOB_EXIT ran. Colin. Matthew Farrellee wrote:On 06/16/2011 04:42 PM, Colin Leavett-Brown wrote:Running Condor 7.6.1 under Scientific Linux 5.5 I am trying to run HOOK_JOB_EXIT at the conclusion of my job, but it appears that the hook is never run. I have created a simple job whose output file (x.out) accurately details my problem: [crlb@elephant condor]$ cat x.out 1. The job script: #!/bin/bash echo 1. The job script: cat condor_exec.exe echo echo 2. Hook config: grep -i HOOK /etc/condor/condor_config echo echo 3. Permissions on the hook: ls -l /usr/local/bin/munin-node-condor-job-exit echo echo 4. The hook: cat /usr/local/bin/munin-node-condor-job-exit echo echo 5. Test run of the hook: /usr/local/bin/munin-node-condor-job-exit echo echo 6. Job exit should produce a second line of output from the hook: 2. Hook config: # Job hooks HOOK_JOB_EXIT = /usr/local/bin/munin-node-condor-job-exit 3. Permissions on the hook: -rwxr-xr-x 1 root root 102 Jun 16 13:18 /usr/local/bin/munin-node-condor-job-exit 4. The hook: #!/usr/bin/perl open(DD, ">>x.out"); print DD "Executing munin-node-condor-job-exit\n"; close(DD); 5. Test run of the hook: Executing munin-node-condor-job-exit 6. Job exit should produce a second line of output from the hook: [crlb@elephant condor]$ But it doesn't! Any suggestions greatly appreciated.You should look in the StartLog/StarterLog to see if there is any indication that your hook was run. I would expect it is, unless there's a permissions issue (maybe a world writable dir in the script's path). The second invocation of the hook may not be writing to the file you're expecting. Best, matt
-- Colin Leavett-Brown Department of Physics & Astronomy University of Victoria 250-721-7728