[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [condor-users] Bad Condor jobs killing GUI apps



I have also noticed something very similar, and this has happened to us on
6.6.0, 6.6.1 and 6.6.2 running on Windows XP Pro.

The w/s in question were all installed using the install program, and the user
is allowed to both run jobs on the w/s and submit from that w/s.

Since we are still evaluating Condor, the machine was setup to always run
Condor, and process jobs.

The following has happened to several users...

The user has logged onto the w/s, started Word or something, and then gone to a
meeting (leaving them selves logged in). During the meeting Condor starts
processing jobs on the w/s.

When the user comes back to use the w/s, moves the mouse, the w/s has
"crashed". All of the icons have vanished from the System Tray, and the w/s has
lost network connectivity.

This seems to be an intermittent fault, and like Andy, we are also unable to
reproduce the error.

My StarterLog file looks very similar to the one that Andy posted.

Regards,


Chris Tottle
ISG Windows Development (Team Leader)
INFOS
Cardiff University
39 - 41 Park Place
Cardiff
CF10 3BB

029 20875221

>>> agoar@xxxxxxxxxx 07/04/2004 15:48:20 >>>
We have a issue with running Condor that has proven very difficult to
debug. 

Our Condor pool consists of about 2000 desktop machines (all running
WindowsXP). Condor uses the machines when they are idle (mostly at
night). 

We have received occasional reports from users that when they come in in
the morning all (or almost all) of their GUI apps had been shut down.
Users were reporting that if they disable Condor on their machine (thus
removing the machine from the pool), then the problem would go away. At
first we through the GUI apps shutting down had nothing to do with
Condor. But it's happened enough times, and we have finally seen the
behavior for ourselves, to be convinced there is a link. One of our
admins was standing by a PC in the pool, when all of a sudden all the
GUI apps shut down. He looked at the Condor log files, and verified that
a job had just finished running on the machine. The starter log file
contains the following lines just before the GUI apps started shutting
down:

	4/6 08:39:21
******************************************************
	4/6 08:39:21 ** condor_starter (CONDOR_STARTER) STARTING UP
	4/6 08:39:21 ** $CondorVersion: 6.4.7 Jan 27 2003 $
	4/6 08:39:21 ** $CondorPlatform: INTEL-WINNT40 $
	4/6 08:39:21 ** PID = 3236
	4/6 08:39:21
******************************************************
	4/6 08:39:21 DaemonCore: Command Socket at <10.104.41.216:3239>
	4/6 08:39:21 Submitting machine is "admin-srv50.micron.com"
	4/6 08:39:21 entering init_user_ids()...watch out.
	4/6 08:39:22 File transfer completed successfully.
	4/6 08:39:23 Starting a VANILLA universe job.
	4/6 08:39:23 Output file:
C:\Progra~1\Condor/execute\dir_3236\admin-srv50_tppprod_21097_EngExt.bat
out
	4/6 08:39:23 Error file:
C:\Progra~1\Condor/execute\dir_3236\admin-srv50_tppprod_21097_EngExt.bat
err
	4/6 08:39:23 About to exec C:\WINNT\System32\cmd.exe /Q /C
condor_exec.bat 
	4/6 08:39:23 Create_Process succeeded, pid=3320
	4/6 08:40:04 Job exited, pid=3320, status=0
	4/6 08:40:06 Got SIGQUIT.  Performing fast shutdown.
	4/6 08:40:06 ShutdownFast all jobs.
	4/6 08:40:06 **** condor_starter (condor_STARTER) EXITING WITH
STATUS 0

Can someone explain the "Got QIGQUIT.." line? What's a fast shutdown? Is
this normal? Has anyone seen cases where the Condor starter daemon
finishing a job affects the interactive apps running on the same
machine?

So far, we have not been able to reproduce the issue at will (although
we are still trying). It does seem to be a specific job that causes this
every time.

Thanks.

Andy Goar
Middleware Group
Micron Technology Inc.
email: agoar@xxxxxxxxxx 
Phone: (208)368-3254
Support: (208)368-4850
    "Three things are certain:  Death, taxes, and lost data.  Guess
which has occurred?"



Condor Support Information:
http://www.cs.wisc.edu/condor/condor-support/
To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with
unsubscribe condor-users <your_email_address>