Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] SOAP API... Jobs don't run
- Date: Tue, 26 Jul 2005 17:27:46 +0200
- From: "Cargnelli, Matthieu" <Matthieu.Cargnelli@xxxxxxxx>
- Subject: Re: [Condor-users] SOAP API... Jobs don't run
Cargnelli, Matthieu a écrit :
>Hi all,
>
>I'm trying to work with the SOAP API for condor. I use the
>SOAPScheddApiHelper from
>http://www.cs.wisc.edu/condor/birdbath/SOAPScheddApiHelper.java
>When I send a job, it enqueues correctly but never runs. If I try to
>delete a job, it never seems to be thrown away from the queue completely
>
>
Hi again,
It seems that my original problem evolved a little. My pool is still a
single machine (my own) so everything should be as simple as possible...
Now, when I submit a job involving the transfer of 2 files, I find my
files in the spool directory, as it should, but the logs are weird:
SchedLog:7/26 16:36:39 Activity on stashed negotiator socket
SchedLog:7/26 16:36:39 Negotiating for owner:
condor@xxxxxxxxxxxxxxxxxxxxxxx
SchedLog:7/26 16:36:39 Checking consistency running and runnable jobs
SchedLog:7/26 16:36:39 Tables are consistent
SchedLog:7/26 16:36:39 Out of jobs - 1 jobs matched, 0 jobs idle, flock
level = 0
SchedLog:7/26 16:36:39 Sent ad to central manager for
nobody@xxxxxxxxxxxxxxxxxxxxxxx
SchedLog:7/26 16:36:39 Sent ad to 1 collectors for
nobody@xxxxxxxxxxxxxxxxxxxxxxx
SchedLog:7/26 16:36:39 Sent ad to central manager for
condor@xxxxxxxxxxxxxxxxxxxxxxx
SchedLog:7/26 16:36:39 Sent ad to 1 collectors for
condor@xxxxxxxxxxxxxxxxxxxxxxx
SchedLog:7/26 16:36:43 Starting add_shadow_birthdate(20.0)
SchedLog:7/26 16:36:43 Started shadow for job 20.0 on
"<10.251.147.33:56110>", (shadow pid = 4937)
SchedLog:7/26 16:36:44 Sent ad to central manager for
nobody@xxxxxxxxxxxxxxxxxxxxxxx
SchedLog:7/26 16:36:44 Sent ad to 1 collectors for
nobody@xxxxxxxxxxxxxxxxxxxxxxx
SchedLog:7/26 16:36:44 Sent ad to central manager for
condor@xxxxxxxxxxxxxxxxxxxxxxx
SchedLog:7/26 16:36:44 Sent ad to 1 collectors for
condor@xxxxxxxxxxxxxxxxxxxxxxx
SchedLog:7/26 16:36:44 Shadow pid 4937 for job 20.0 exited with status 100
SchedLog:7/26 16:36:44 contraint ((NiceUser == FALSE) &&
((AccountingGroup =?= "condor") || (AccountingGroup =?= UNDEFINED &&
Owner =?= "condor"))) does not
evaluate to bool
SchedLog:7/26 16:36:44 contraint ((NiceUser == FALSE) &&
((AccountingGroup =?= "condor") || (AccountingGroup =?= UNDEFINED &&
Owner =?= "condor"))) does not
evaluate to bool
SchedLog:7/26 16:36:44 contraint ((NiceUser == FALSE) &&
((AccountingGroup =?= "condor") || (AccountingGroup =?= UNDEFINED &&
Owner =?= "condor"))) does not
evaluate to bool
SchedLog:7/26 16:36:44 contraint ((NiceUser == FALSE) &&
((AccountingGroup =?= "condor") || (AccountingGroup =?= UNDEFINED &&
Owner =?= "condor"))) does not
evaluate to bool
SchedLog:7/26 16:36:44 contraint ((NiceUser == FALSE) &&
((AccountingGroup =?= "condor") || (AccountingGroup =?= UNDEFINED &&
Owner =?= "condor"))) does not
evaluate to bool
SchedLog:7/26 16:36:44 contraint ((NiceUser == FALSE) &&
((AccountingGroup =?= "condor") || (AccountingGroup =?= UNDEFINED &&
Owner =?= "condor"))) does not
evaluate to bool
SchedLog:7/26 16:36:44 contraint ((NiceUser == FALSE) &&
((AccountingGroup =?= "condor") || (AccountingGroup =?= UNDEFINED &&
Owner =?= "condor"))) does not
evaluate to bool
SchedLog:7/26 16:36:44 contraint ((NiceUser == FALSE) &&
((AccountingGroup =?= "condor") || (AccountingGroup =?= UNDEFINED &&
Owner =?= "condor"))) does not
evaluate to bool
SchedLog:7/26 16:36:44 contraint ((NiceUser == FALSE) &&
((AccountingGroup =?= "condor") || (AccountingGroup =?= UNDEFINED &&
Owner =?= "condor"))) does not
evaluate to bool
SchedLog:7/26 16:36:44 match (<10.251.147.33:56110>#1122044962#540) out
of jobs (cluster id 20); relinquishing
SchedLog:7/26 16:36:44 Sent RELEASE_CLAIM to startd on
<10.251.147.33:56110>
SchedLog:7/26 16:36:44 Match record (<10.251.147.33:56110>, 20, -1) deleted
SchedLog:7/26 16:36:44 DaemonCore: Command received via TCP from host
<10.251.147.33:34130>
SchedLog:7/26 16:36:44 DaemonCore: received command 443
(VACATE_SERVICE), calling handler (vacate_service)
SchedLog:7/26 16:36:44 Got VACATE_SERVICE from <10.251.147.33:34130>
SchedLog:7/26 16:41:44 Sent ad to central manager for
nobody@xxxxxxxxxxxxxxxxxxxxxxx
SchedLog:7/26 16:41:44 Sent ad to 1 collectors for
nobody@xxxxxxxxxxxxxxxxxxxxxxx
SchedLog:7/26 16:41:44 Sent ad to central manager for
condor@xxxxxxxxxxxxxxxxxxxxxxx
SchedLog:7/26 16:41:44 Sent ad to 1 collectors for
condor@xxxxxxxxxxxxxxxxxxxxxxx
SchedLog:7/26 16:46:44 Sent ad to central manager for
nobody@xxxxxxxxxxxxxxxxxxxxxxx
SchedLog:7/26 16:46:44 Sent ad to 1 collectors for
nobody@xxxxxxxxxxxxxxxxxxxxxxx
SchedLog:7/26 16:46:44 Sent ad to central manager for
condor@xxxxxxxxxxxxxxxxxxxxxxx
SchedLog:7/26 16:46:44 Sent ad to 1 collectors for
condor@xxxxxxxxxxxxxxxxxxxxxxx
SchedLog:7/26 16:51:44 Sent ad to central manager for
nobody@xxxxxxxxxxxxxxxxxxxxxxx
SchedLog:7/26 16:51:44 Sent ad to 1 collectors for
nobody@xxxxxxxxxxxxxxxxxxxxxxx
SchedLog:7/26 16:51:44 Sent ad to central manager for
condor@xxxxxxxxxxxxxxxxxxxxxxx
SchedLog:7/26 16:51:44 Sent ad to 1 collectors for
condor@xxxxxxxxxxxxxxxxxxxxxxx
SchedLog:7/26 16:56:44 Sent ad to central manager for
nobody@xxxxxxxxxxxxxxxxxxxxxxx
SchedLog:7/26 16:56:44 Sent ad to 1 collectors for
nobody@xxxxxxxxxxxxxxxxxxxxxxx
SchedLog:7/26 16:56:44 Sent ad to central manager for
condor@xxxxxxxxxxxxxxxxxxxxxxx
SchedLog:7/26 16:56:44 Sent ad to 1 collectors for
condor@xxxxxxxxxxxxxxxxxxxxxxx
StartLog:7/26 16:36:43 DaemonCore: Command received via TCP from host
<10.251.147.33:34114>
StartLog:7/26 16:36:43 DaemonCore: received command 444
(ACTIVATE_CLAIM), calling handler (command_activate_claim)
StartLog:7/26 16:36:43 vm1: Got activate_claim request from shadow
(<10.251.147.33:34114>)
StartLog:7/26 16:36:43 vm1: Remote job ID is 20.0
StartLog:7/26 16:36:43 vm1: Got universe "VANILLA" (5) from request classad
StartLog:7/26 16:36:43 vm1: State change: claim-activation protocol
successful
StartLog:7/26 16:36:43 vm1: Changing activity: Idle -> Busy
StartLog:7/26 16:36:44 DaemonCore: Command received via TCP from host
<10.251.147.33:34127>
StartLog:7/26 16:36:44 DaemonCore: received command 404
(DEACTIVATE_CLAIM_FORCIBLY), calling handler (command_handler)
StartLog:7/26 16:36:44 vm1: Called deactivate_claim_forcibly()
StartLog:7/26 16:36:44 Starter pid 4938 exited with status 0
StartLog:7/26 16:36:44 vm1: State change: starter exited
StartLog:7/26 16:36:44 vm1: Changing activity: Busy -> Idle
StartLog:7/26 16:36:44 DaemonCore: Command received via UDP from host
<10.251.147.33:33650>
StartLog:7/26 16:36:44 DaemonCore: received command 443 (RELEASE_CLAIM),
calling handler (command_release_claim)
StartLog:7/26 16:36:44 vm1: State change: received RELEASE_CLAIM command
StartLog:7/26 16:36:44 vm1: Changing state and activity: Claimed/Idle ->
Preempting/Vacating
StartLog:7/26 16:36:44 vm1: State change: No preempting claim, returning
to owner
StartLog:7/26 16:36:44 vm1: Changing state and activity:
Preempting/Vacating -> Owner/Idle
StartLog:7/26 16:36:44 vm1: State change: IS_OWNER is false
StartLog:7/26 16:36:44 vm1: Changing state: Owner -> Unclaimed
StartLog:7/26 16:36:44 DaemonCore: Command received via UDP from host
<10.251.147.33:33650>
StartLog:7/26 16:36:44 DaemonCore: received command 443 (RELEASE_CLAIM),
calling handler (command_release_claim)
StartLog:7/26 16:36:44 Error: can't find resource with ClaimId
(<10.251.147.33:56110>#1122044962#540)
StarterLog.vm1:7/26 16:36:43 Using config file:
/opt/condor-6.7.8/etc/condor_config
StarterLog.vm1:7/26 16:36:43 Using local config files:
/opt/condor-6.7.8/local.patrouille/condor_config.local
StarterLog.vm1:7/26 16:36:43 DaemonCore: Command Socket at
<10.251.147.33:34115>
StarterLog.vm1:7/26 16:36:43 Done setting resource limits
StarterLog.vm1:7/26 16:36:43 Communicating with shadow
<10.251.147.33:34113>
StarterLog.vm1:7/26 16:36:43 Submitting machine is
"patrouille.grideads.net"
StarterLog.vm1:7/26 16:36:43 File transfer completed successfully.
StarterLog.vm1:7/26 16:36:44 Starting a VANILLA universe job with ID: 20.0
StarterLog.vm1:7/26 16:36:44 IWD:
/opt/condor-6.7.8/local.patrouille/execute/dir_4938
StarterLog.vm1:7/26 16:36:44 About to exec
/opt/condor-6.7.8/local.patrouille/execute/dir_4938/condor_exec.exe toto
TRUE
StarterLog.vm1:7/26 16:36:44 Create_Process succeeded, pid=4940
StarterLog.vm1:7/26 16:36:44 Process exited, pid=4940, status=0
StarterLog.vm1:7/26 16:36:44 Got SIGQUIT. Performing fast shutdown.
StarterLog.vm1:7/26 16:36:44 ShutdownFast all jobs.
StarterLog.vm1:7/26 16:36:44 **** condor_starter (condor_STARTER)
EXITING WITH STATUS 0
Has someone ever seen this kind of problem before ?
Regards,
Matthieu Cargnelli