Hi,
One of the job went into X state as soon as the job is released by
the using condor_release. First of all the Job is held by condor_hold it
seems to be before shadow exit, to start the Job we issued
condor_release. condor_release is successfully but when we see the
condor_q shows that job in X state.
The Job is submitted through SOAP api. we are using version 7.2.3
I think below logs will help to find what went wrong to sent job to X state.
In Schedd log:
8/9 21:03:03 (pid:5215) abort_job_myself: 514.0 action:Hold
log_hold:true notify:true
8/9 21:03:03 (pid:5215) Found shadow record for job 514.0, host =
<192.168.10.92:9620>
8/9 21:03:14 (pid:5215) No HoldReasonSubCode found for job 514.0
8/9 21:03:16 (pid:5215) Writing record to user
logfile=/mail/condor/log/VM_514_0.log owner=idealgrid
8/9 21:03:19 (pid:5215) FileLock object is updating timestamp on:
/mail/condor/log/VM_514_0.log
8/9 21:03:19 (pid:5215) FileLock::obtain(1) - @1249831999.611700 lock on
/mail/condor/log/VM_514_0.log now WRITE
8/9 21:03:21 (pid:5215) FileLock::obtain(2) - @1249832001.150186 lock on
/mail/condor/log/VM_514_0.log now UNLOCKED
8/9 21:03:22 (pid:5215) Shadow pid 6457 for job 514.0 exited with status 102
8/9 21:03:22 (pid:5215) Deleting shadow rec for PID 6457, job (514.0)
8/9 21:03:22 (pid:5215) Writing record to user
logfile=/mail/condor/log/VM_514_0.log owner=idealgrid
8/9 21:03:22 (pid:5215) FileLock object is updating timestamp on:
/mail/condor/log/VM_514_0.log
8/9 21:03:22 (pid:5215) FileLock::obtain(1) - @1249832002.754296 lock on
/mail/condor/log/VM_514_0.log now WRITE
8/9 21:03:24 (pid:5215) FileLock::obtain(2) - @1249832004.133541 lock on
/mail/condor/log/VM_514_0.log now UNLOCKED
8/9 21:03:24 (pid:5215) Job 514.0 is finished
8/9 21:03:24 (pid:5215) Job cleanup for 514.0 will not block, calling
jobIsFinished() directly
8/9 21:03:24 (pid:5215) jobIsFinished() completed, calling
DestroyProc(514.0)
In ShadowLog:
8/9 21:03:03 (514.0) (6457): In handleJobRemoval(), sig 10
8/9 21:03:03 (514.0) (6457): setting exit reason on
slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxx to 102
8/9 21:03:03 (514.0) (6457): Resource slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxx
changing state from EXECUTING to FINISHED
8/9 21:03:03 (514.0) (6457): Entering DCStartd::deactivateClaim(forceful)
8/9 21:03:04 (514.0) (6457): DCStartd::deactivateClaim: successfully
sent command
8/9 21:03:04 (514.0) (6457): Killed starter (fast) at <192.168.10.92:9620>
8/9 21:03:16 (514.0) (6457): Inside RemoteResource::updateFromStarter()
8/9 21:03:19 (514.0) (6457): Inside RemoteResource::resourceExit()
8/9 21:03:19 (514.0) (6457): setting exit reason on
slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxx to 107
8/9 21:03:19 (514.0) (6457): Resource slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxx
changing state from FINISHED to FINISHED
8/9 21:03:19 (514.0) (6457): Job 514.0 is being evicted from
slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxx
8/9 21:03:19 (514.0) (6457): FileLock::obtain(1) - @1249831999.591092
lock on /mail/condor/log/VM_514_0.log now WRITE
8/9 21:03:19 (514.0) (6457): FileLock::obtain(2) - @1249831999.610029
lock on /mail/condor/log/VM_514_0.log now UNLOCKED
8/9 21:03:21 (514.0) (6457): Updating Job Queue:
SetAttribute(LastJobLeaseRenewal = 1249831999)
8/9 21:03:21 (514.0) (6457): Updating Job Queue:
SetAttribute(RemoteSysCpu = 4.000000)
8/9 21:03:21 (514.0) (6457): Updating Job Queue:
SetAttribute(RemoteUserCpu = 3435.000000)
8/9 21:03:21 (514.0) (6457): Updating Job Queue:
SetAttribute(LastVacateTime = 1249831999)
8/9 21:03:21 (514.0) (6457): Updating Job Queue: SetAttribute(BytesSent
= 0.000000)
8/9 21:03:21 (514.0) (6457): Updating Job Queue: SetAttribute(BytesRecvd
= 9785.000000)
8/9 21:03:22 (514.0) (6457): **** condor_shadow (condor_SHADOW) pid 6457
EXITING WITH STATUS 102
In Starter Log
8/9 21:03:04 ProcAPI::buildFamily() Found daddypid on the system: 11157
8/9 21:03:08 Got SIGQUIT. Performing fast shutdown.
8/9 21:03:08 ShutdownFast all jobs.
8/9 21:03:08 Inside VMProc::ShutdownFast()
8/9 21:03:08 Inside VMProc::StopVM
8/9 21:03:08 VMGAHP[11157] <- 'CONDOR_VM_STOP 243 1'
8/9 21:03:09 VMGAHP[11157] -> 'S'
8/9 21:03:10 VMGAHP[11157] <- 'RESULTS'
8/9 21:03:11 VMGAHP[11157] -> 'R'
8/9 21:03:11 VMGAHP[11157] -> 'S' '1'
8/9 21:03:11 VMGAHP[11157] -> '243' '0' 'NULL'
8/9 21:03:11 PID for VM is changed from [23754] to [0]
8/9 21:03:12 Inside VM_GAHP_SERVER::cleanup()
8/9 21:03:12 VMGAHP[11157] <- 'QUIT'
8/9 21:03:17 VMGAHP[11157] -> 'S'
8/9 21:03:18 VMGahpServer::killVM() failed!
8/9 21:03:18 End of VM_GAHP_SERVER::cleanup
8/9 21:03:19 Inside VMProc::cleanup()
8/9 21:03:19 ProcAPI::buildFamily() Found daddypid on the system: 11157
In UserLog
001 (514.000.000) 08/09 15:39:59 Job executing on host: <192.168.10.92:9620>
...
004 (514.000.000) 08/09 21:03:19 Job was evicted.
(0) Job was not checkpointed.
Usr 0 00:57:15, Sys 0 00:00:04 - Run Remote Usage
Usr 0 00:00:00, Sys 0 00:00:00 - Run Local Usage
0 - Run Bytes Sent By Job
1957 - Run Bytes Received By Job
...
013 (514.000.000) 08/09 21:03:19 Job was released.
via condor_release (by user daemon)
...
009 (514.000.000) 08/09 21:03:22 Job was aborted by the user.
...
thanks
Johnson
Please do not print this email unless it is absolutely necessary.
The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.
WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.
www.wipro.com
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/