HTCondor Project List Archives



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-devel] Bug Report: Esoteric Windows email problem.



OS: Windows
Condor Version: all

Problem: 
The condor_shadow silently fails to send email when its current working
directory is on a shared file system.  The user would submit a job
remotely (SCHEDD_HOST was set) and specify a shared file system for file
locations, and they wouldn't get email.  When condor_submit -s was used
(telling condor to copy everything to the spool directory), then they
_would_ get email...but they didn't want to use -s.  There were no
errors of any sort in the log files, even with D_ALL.  They saw the line

Sending email via system(C:\Condor/bin/condor_mail.exe -s "[Condor]
Condor Job 1757.0" -relay ...)

in the Shadow log file no matter what.
 
Bizarre, eh?

The problem is one of permissions.  In email_open_implementation() (in
email.c), condor creates a temporary file using

email_tmp_file = tempnam("\\tmp","condoremail");

and does this outside of the set_priv() block - since this is the
shadow, this call, and the fopen() of email_tmp_file are done as the
submitting user.  The MSDN page for tempnam:

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vccore9
8/html/_crt__tempnam.2c_._wtempnam.2c_.tmpnam.2c_._wtmpnam.asp

Says that the file could be created in the directory (in order)...

- The value of the TMP environment variable, if it's valid
- \\tmp if it exists
- P_tmpdir (from stdio.h)
- The current working directory

I'm guessing it fell through to the last one, the current working
directory.  SO...now there's a temporary file, created by the submitting
user, sitting on a shared file system.  We then write the email message
to it, and then come to email_close().

In email_close, we:

priv = set_condor_priv();

and then essentially:

system ( "condor_mail.exe ... < email_tmp_file" );

The problem is that at this point we're user condor (which is local
system on windows) and hence we don't have permission to read the
temporary file that we created from tempnam.  The result is a silent
failure.

To solve this, I'd recommend having condor privs when the tempnam() and
fopen() are called, and I'd recommend using $(LOG) or $(SPOOL) instead
of \\tmp.  I believe (but haven't proved) that a workaround is to set
the TMP environment variable.


Disclaimer: I haven't verified any of the above myself; I've just been
staring at code.  However, it's the only explanation given the symptoms.

Mike Yoder
Principal Member of Technical Staff
Ask Mike: http://docs.optena.com
Direct  : +1.408.321.9000
Fax     : +1.408.321.9030
Mobile  : +1.408.497.7597
yoderm@xxxxxxxxxx

Optena Corporation
2860 Zanker Road, Suite 201
San Jose, CA 95134
http://www.optena.com