[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] schedd changes owner to a regular user andresults in queue crash



To answer your question below, the schedd runs as the submitting user when it writes events to the user's log file (i.e.  the file specified with "log=xxxxx" in the submit file).

The issue is likely file locking over nfs, assuming the log file sits on nfs.  
You can tell condor to not bother file locking by putting into condor_config:
   ENABLE_USERLOG_LOCKING = FALSE
There is no downside to doing this, assuming you do not have multiple jobs logging into the same job log file.

If you investigate, my guess is you will find a subset of nodes where file locking on nfs is broken consistently (until lockd is restarted or some such).


---
Todd Tannenbaum
Dept of Computer Sciences
University of Wisconsin-Madison
..Sent from a Palm Treo 680...

-----Original Message-----

From:  Junjun Mao <jmao@xxxxxxxxxxxxxxxxx>
Subj:  Re: [Condor-users] schedd changes owner to a regular user andresults in queue crash
Date:  Tue Dec 12, 2006 9:14 am
Size:  679 bytes
To:  Condor-Users Mail List <condor-users@xxxxxxxxxxx>

It turns out condor schedd was restarted due to other reason earlier. 
When the translation log job_queue.log was replayed, the scheduler got 
stuck with a regular user's job.

The question remains as in what events schedd runs as users other than 
condor?

Junjun
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at either
https://lists.cs.wisc.edu/archive/condor-users/
http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR