[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Problem Condor Job Stays Idle Because of target.CkptArch



       Last successful match: Tue Nov 20 22:36:21 2007

This indicates that the job is successfully getting matched to a machine. Something must be going wrong when the Condor tries to run the job on that machine. Look for clues about what is going wrong here:
The "user log": /usr/local/globus-4.0.5//var/globus-condor.log
The ShadowLog (condor_config_val SHADOW_LOG)
The StartLog (condor_config_val STARTD_LOG)
The StarterLog (condor_config_val STARTER_LOG)

I hope that helps!

--Dan

Nitin Gavhane wrote:

hello all,
i am submitting job through globus to condor but the job stays in idle state. the job details are as follows.
================================================
*The Job Description Generated by GRAM is as follows *

[condor@niting-w2p etc]$ cat /tmp/condor_job_description
#
# description file for condor submission
#
Universe = standard
Notification = Never
Executable = /home/psegrid/NIP/nip
Requirements = OpSys == "LINUX"  && Arch == "INTEL"
Environment = GLOBUS_LOCATION=/usr/local/globus-4.0.5/;X509_CERT_DIR=/etc/grid-security/certificates;X509_USER_PROXY=;X509_USER_CERT=;X509_USER_KEY=;HOME=/home/psegrid;LOGNAME=psegrid;SCRATCH_DIRECTORY=/home/psegrid/.globus/scratch;JAVA_HOME=/usr/java/jdk1.6.0_03/jre;GLOBUS_GRAM_JOB_HANDLE= https://192.168.7.221:8443/wsrf/services/ManagedExecutableJobService?7f408200-9789-11dc-9f1a-b41f06e1e2ea;LD_LIBRARY_PATH= <https://192.168.7.221:8443/wsrf/services/ManagedExecutableJobService?7f408200-9789-11dc-9f1a-b41f06e1e2ea;LD_LIBRARY_PATH=>
Arguments =
InitialDir = /home/psegrid
Input = /dev/null
Log = /usr/local/globus-4.0.5//var/globus-condor.log
log_xml = True
#Extra attributes specified by client

Output = /home/psegrid/stdout
Error = /home/psegrid/stderr
queue 1
=======================================================================
*[psegrid@niting-w2p NIP]$ condor_q -better-analyze*


-- Submitter: niting-w2p.corp.cdac.in <http://niting-w2p.corp.cdac.in> : <192.168.7.221:42993 <http://192.168.7.221:42993>> : niting-w2p.corp.cdac.in <http://niting-w2p.corp.cdac.in>
---
005.000:  Run analysis summary.  Of 7 machines,
     4 are rejected by your job's requirements
     0 reject your job because of their own requirements
     0 match but are serving users with a better priority in the pool
     3 match but reject the job for unknown reasons
     0 match but will not currently preempt their existing job
     0 are available to run your job
       Last successful match: Tue Nov 20 22:36:21 2007

The Requirements expression for your job is:

( target.OpSys == "LINUX" && target.Arch == "INTEL" ) &&
( ( target.CkptArch == target.Arch ) || ( target.CkptArch is undefined ) ) && ( ( target.CkptOpSys == target.OpSys ) || ( target.CkptOpSys is undefined ) ) &&
( target.Disk >= DiskUsage ) && ( ( target.Memory * 1024 ) >= ImageSize )

   Condition                         Machines Matched    Suggestion
   ---------                         ----------------    ----------
1 target.Arch == "INTEL" 3 2 target.OpSys == "LINUX" 7 3 ( ( target.CkptArch == target.Arch ) || ( target.CkptArch is undefined ) ) 7 4 ( ( target.CkptOpSys == target.OpSys ) || ( target.CkptOpSys is undefined ) ) 7 5 ( target.Disk >= 20000 ) 7 6 ( ( 1024 * target.Memory ) >= 20000 )7


==========================================================
*[psegrid@niting-w2p NIP]$ condor_status*

Name OpSys Arch State Activity LoadAv Mem ActvtyTime
vm1@niting-w2 LINUX       INTEL  Unclaimed  Idle       0.000   469 
 0+00:05:26
vm2@niting-w2 LINUX       INTEL  Unclaimed  Idle       0.140   469 
 0+00:26:42
sskadam-w2p.c LINUX       INTEL  Unclaimed  Idle       0.000   248 
 0+00:44:38
vm1@psewebs-w LINUX       X86_64 Unclaimed  Idle       0.400   753 
 0+00:30:04
vm2@psewebs-w LINUX       X86_64 Unclaimed  Idle       0.000   753 
 0+00:30:05
vm3@psewebs-w LINUX       X86_64 Unclaimed  Idle       0.000   753 
 0+00:30:06
vm4@psewebs-w LINUX       X86_64 Unclaimed  Idle       0.000   753 
 0+00:30:27
                    Total Owner Claimed Unclaimed Matched Preempting 
Backfill
        INTEL/LINUX     3     0       0         3       0          0   
     0
       X86_64/LINUX     4     0       0         4       0          0   
     0
              Total     7     0       0         7       0          0   
     0
==============================================================
*The DAEMON details for all three machines are as follows *

[condor@niting-w2p etc]$ ./test.sh
current file: condor_config
##  checkpoint server isn't available or USE_CKPT_SERVER is set to
USE_CKPT_SERVER = True
CKPT_SERVER_HOST = psewebs-w2p.corp.cdac.in <http://psewebs-w2p.corp.cdac.in>
##  checkpoint server?  If False, the CKPT_SERVER_HOST set on
##  the submit machine is used.  Otherwise, the CKPT_SERVER_HOST set
STARTER_CHOOSES_CKPT_SERVER = True
#WALL_CLOCK_CKPT_INTERVAL = 3600
##  setting is only used if USE_CKPT_SERVER (from above) is True.
#COMPRESS_PERIODIC_CKPT = False
#COMPRESS_VACATE_CKPT = False
#SLOW_CKPT_SPEED = 0
DAEMON_LIST                     = MASTER, STARTD, SCHEDD
#DC_DAEMON_LIST = \
=============
current file: psewebs-w2p.local
USE_CKPT_SERVER = True
CKPT_SERVER_HOST = psewebs-w2p.corp.cdac.in <http://psewebs-w2p.corp.cdac.in>
DAEMON_LIST = MASTER, STARTD, SCHEDD
DAEMON_LIST   = MASTER, COLLECTOR, NEGOTIATOR, STARTD, SCHEDD
=============
current file: niting-w2p.local
USE_CKPT_SERVER = True
CKPT_SERVER_HOST = psewebs-w2p.corp.cdac.in <http://psewebs-w2p.corp.cdac.in>
DAEMON_LIST = MASTER, STARTD, SCHEDD
=============
current file: sskadam-w2p.local
USE_CKPT_SERVER = True
CKPT_SERVER_HOST = psewebs-w2p.corp.cdac.in <http://psewebs-w2p.corp.cdac.in>
DAEMON_LIST = MASTER, STARTD, SCHEDD
===============================

Please Tell what is wrong with job submission.
Thank you.
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Nitin M. Gavhane
MS in Adavanced Software Technologies
International Institute of Information Technology
P-14,Hinjewadi,Pune, India.
---------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at: https://lists.cs.wisc.edu/archive/condor-users/