I never saw an answer to this question. Did one get proffered off the list? Could you please cross post it if that is the case. I too am curious about this delay as I'm seeing this in my flock of Windows XP machines.
Thanks! Ian
-----Original Message----- From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Marc Saric Sent: August 31, 2004 10:26 AM To: condor-users@xxxxxxxxxxx Subject: [Condor-users] Condor job submission delayed
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Hi all,
I am experimenting with a small Condor cluster (Condor 6.6.6, mostly on Windows-boxes unfortunately) as you can see from my various beginners mails popping up in the forum.
I have set up a bunch of Windows-machines (Win2k SP6 and WinXP Pro SP1) and a central Linux-Master-Server.
Submission of jobs works in principle (tested it with the hello-world-examples from http://www.liv.ac.uk/e-science/condor/hello.html but sometimes I observe a strange behaviour in that certain jobs need a very long time until they are beeing executed.
This happens while most of the machines are not busy and are listed as availabe (15 min no user + low CPU-utilization).
"condor_status" gives something like:
saric@u-191-srv2:~/tmp> condor_status
Name OpSys Arch State Activity LoadAv Mem ActvtyTime
u-191-srv2.pr LINUX INTEL Unclaimed Idle 0.010 1004 0+01:52:13 u-099-cpc-esi WINNT50 INTEL Owner Idle 0.240 512 0+01:16:34 vm1@u-099-csr WINNT50 INTEL Claimed Busy 0.000 1024 0+00:10:56 vm2@u-099-csr WINNT50 INTEL Unclaimed Idle 0.000 1024 0+01:43:03 u-099-cbb1 WINNT51 INTEL Unclaimed Idle 0.000 511 0+01:46:27 u-099-cnb2 WINNT51 INTEL Owner Idle 0.020 511 0+04:31:59 u-099-cpc-sek WINNT51 INTEL Owner Idle 0.040 512 0+00:10:14 u-099-cpc1 WINNT51 INTEL Owner Idle 0.000 512 0+00:06:20 u-099-cpc2 WINNT51 INTEL Owner Idle 0.030 512 0+00:01:20 u-099-cpc3 WINNT51 INTEL Unclaimed Idle 0.000 512 0+00:06:21 u-099-cpc4 WINNT51 INTEL Owner Idle -0.010 512 0+04:57:30 u-099-cpc5 WINNT51 INTEL Unclaimed Idle 0.000 512 0+00:31:21
so there are at least 4 unclaimed machines in the pool which should match requirements ((OpSys == "WINNT50") || (OpSys == "WINNT51"))..
The result of a "condor_q -analyze" takes quite a long time and gives back something like:
045.000: Run analysis summary. Of 12 machines, ~ 1 are rejected by your job's requirements ~ 6 reject your job because of their own requirements ~ 0 match, but are serving users with a better priority in the pool ~ 4 match, match, but reject the job for unknown reasons ~ 1 match, but will not currently preempt their existing job ~ 0 are available to run your job
I can't see why the 4 should reject for unknown reasons. Is there any place where I could look at to find out these unknown reasons (systemlog, local condor-log on machines???).
Thanks in advance!
- -- Bye, Marc Saric
Dr. Marc Saric, Bioinformatik, Proteom Centrum Tübingen, Auf der Morgenstelle 15, D-72076 Tübingen, Germany, Tel: +49 (0)7071 29 70557, marc.saric@xxxxxxxxxxxxxxxx http://www.proteom-centrum-tuebingen.de -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFBNIqQBLD6PjSWyL4RAlKLAJ4l64RE870+vfqESQJL5Cz5oMSGjQCbBmA6
WLrzxNGTr1sGB3oJv4bDW48=
=nKWt
-----END PGP SIGNATURE----- _______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx http://lists.cs.wisc.edu/mailman/listinfo/condor-users
_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
http://lists.cs.wisc.edu/mailman/listinfo/condor-users