Unfortunately, COD doesn't currently support transferring any files, including the x509 proxy file. Is it possible for you to rely upon a shared filesystem for this purpose? I haven't thought through what permissions would be necessary in order for this to work for gLExec.
--Dan Renzo Borgatti wrote:
Thanks Dan. I added both 'IWD' and 'User' without success. But I found a startd not crashing and a StartdLog more verbose:12/27 14:33:52 DaemonCore: Command received via TCP from condor@fcdfcaf445 from host <131.225.240.106:35843> 12/27 14:33:52 DaemonCore: received command 1000 (CA_AUTH_CMD), calling handler (command_classad_handler)12/27 14:33:52 Serving request for CA_ACTIVATE_CLAIM by user 'condor'12/27 14:33:52 vm2: State change: Suspending because a COD job is now running12/27 14:33:52 vm2: Changing activity: Retiring -> Suspended12/27 14:33:52 vm2: cannot use glexec to spawn starter: no proxy (is GLEXEC_STARTER set in the shadow?)12/27 14:33:52 vm2: writeJobAd: Write_Pipe failed 12/27 14:33:52 vm2: ERROR: exec_starter returned 0Looks like gLExec activation is used also to activate COD. I didn't mention before that gLExec is active in my configuration. The error has something to do with the X509 proxy not present. Is the mechanism to transport X509 the same as universe=grid jobs? Is it possible to specify with what X509 proxy the COD should run under?Thanks Renzo On Dec 27, 2006, at 12:29 PM, Dan Bradley wrote:Hello, I have a hunch that some of the ClassAd attributes that the COD manual claims are optional are actually required. --Dan Renzo Borgatti wrote:Hi, I have a problem activating claims using COD (Condor 6.9.0). This is what I'm doing:Successfully sent CA_REQUEST_CLAIM to startd at <131.225.212.148:39446>condor_cod request -addr "<131.225.212.148:39446>" -classad ci.outResult ClassAd written to ci.out ID of new claim is: "<131.225.212.148:39446>#1167216341#4"condor_cod activate -id "<131.225.212.148:39446>#1167216341#4" -classad ci.out -jobad TestCod Attempt to send CA_ACTIVATE_CLAIM to startd <131.225.212.148:39446> failed Reply ClassAd returned 'Failure' but does not have the ErrorString attribute On the worker node, I can see the following two lines in the StartdLog right before crashing: 12/27 11:50:05 DaemonCore: Command received via TCP from condor@fcdfcaf444 from host <131.225.240.106:45123> 12/27 11:50:05 DaemonCore: received command 1000 (CA_AUTH_CMD), calling handler (command_classad_handler) while in the MasterLog: 12/27 11:55:30 The STARTD (pid 15721) died due to signal 11 12/27 11:55:30 All daemons are gone. Exiting.12/27 11:55:32 **** condor_master (condor_MASTER) EXITING WITH STATUS 0TestCod is a file with the following 2 lines: Cmd="/bin/ps" Args="-aux" Am I using condor_cod the right way? Is there a way to have more debugging information to understand what happened? Thanks Renzo _______________________________________________ Condor-users mailing listTo unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with asubject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/condor-users The archives can be found at either https://lists.cs.wisc.edu/archive/condor-users/ http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR_______________________________________________ Condor-users mailing list To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/condor-users The archives can be found at either https://lists.cs.wisc.edu/archive/condor-users/ http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR