I am new to Condor. I have been able to successfully set up a
personal Condor (version 7.0.0), submit and run some simple jobs of
Java and C program via command line. Then I attempted to submit jobs
via SOAP client written in Java by following the IBM tutorial
article. It seems condor received the job but always put the job on
"idle",
Here are the java code I used to submit a job:
files[0] = "/workspace/condor/jobs/submit.java";
WebServicesHelper.submitJobHelper(schedd, "aa0586",
UniverseType.JAVA, "java", "Simple 4 10", null, files);
and submit.java is the file which works fine with command
"condor_submit submit.java", The content of the file is shown as
below:
Universe = java Executable = Simple.class Arguments = Simple 4 10
Log = simple.log Output = simple.out Error =
simple.error Queue
Can any one tell me how I should pass parameters to
WebServicesHelper.submitJobHelper()? I beleive this source code is
provided by Condor group with method sigature like:
public static void submitJobHelper(CondorScheddPortType schedd,
String owner, UniverseType type, String cmd, String args, String
requirements, String[] files) throws JobSubmissionException,
SendFileException, java.io.IOException, java.rmi.RemoteException { }
I also provided the log file below for analysis.
Thanks and regards,
Zhifeng
-- Submitter: localhost.localdomain : : localhost.localdomain ID
OWNER/NODENAME SUBMITTED RUN_TIME ST PRI SIZE CMD 9.0 aa0586
3/19 21:58 0+00:00:00 I 0 0.0 java Simple 4 10 1 jobs; 1 idle,
0 running, 0 held
Negotiator.log, it seems that negotiation is aborted in the middle
as,
3/19 22:05:33 ---------- Started Negotiation Cycle ---------- 3/19
22:05:33 Phase 1: Obtaining ads from collector ... 3/19 22:05:33
Getting all public ads ... 3/19 22:05:33 Sorting 6 ads ... 3/19
22:05:33 Getting startd private ads ... 3/19 22:05:33 Got ads: 6
public and 2 private 3/19 22:05:33 Public ads include 1 submitter, 2
startd 3/19 22:05:33 Phase 2: Performing accounting ... 3/19
22:05:33 Phase 3: Sorting submitter ads by priority ... 3/19
22:05:33 Phase 4.1: Negotiating with schedds ... 3/19 22:05:33
Negotiating with aa0586@localdomain at 3/19 22:05:33 0 seconds so far
3/19 22:05:33 Request 00009.00000: 3/19 22:05:33 Matched
9.0 aa0586@localdomain preempting none slot1@xxxxxxxxxxxxxxxxxxxxx
3/19 22:05:33 Successfully matched with
slot1@xxxxxxxxxxxxxxxxxxxxx 3/19 22:05:33 Got NO_MORE_JOBS; done
negotiating 3/19 22:05:33 ---------- Finished Negotiation Cycle
----------
And starter.log indicates signal error:
3/19 22:05:33 slot1: match_info called 3/19 22:05:33 slot1: Received
match #1205981602#4#... 3/19 22:05:33 slot1: State change: match
notification protocol successful 3/19 22:05:33 slot1: Changing state:
Unclaimed -> Matched 3/19 22:05:33 slot1: Request accepted. 3/19
22:05:33 slot1: Remote owner is aa0586@localdomain 3/19 22:05:33
slot1: State change: claiming protocol successful 3/19 22:05:33
slot1: Changing state: Matched -> Claimed 3/19 22:05:35 slot1: Got
activate_claim request from shadow () 3/19 22:05:36 slot1: Remote job
ID is 9.0 3/19 22:05:36 slot1: Got universe "JAVA" (10) from request
classad 3/19 22:05:36 slot1: State change: claim-activation protocol
successful 3/19 22:05:36 slot1: Changing activity: Idle -> Busy 3/19
22:05:36 slot1: Called deactivate_claim_forcibly() 3/19 22:05:36
attempt to connect to failed: Connection refused (connect errno =
111). 3/19 22:05:36 Send_Signal: ERROR sending signal 3 (SIGQUIT) to
pid 3517 (still alive) 3/19 22:05:36 slot1: Error sending signal to
starter, errno = 25 (Inappropriate ioctl for device) 3/19 22:05:37
Starter pid 3517 exited with status 4 3/19 22:05:37 slot1: State
change: starter exited 3/19 22:05:37 slot1: Changing activity: Busy
-> Idle 3/19 22:05:37 slot1: State change: received RELEASE_CLAIM
command 3/19 22:05:37 slot1: Changing state and activity:
Claimed/Idle -> Preempting/Vacating 3/19 22:05:37 slot1: State
change: No preempting claim, returning to owner 3/19 22:05:37 slot1:
Changing state and activity: Preempting/Vacating -> Owner/Idle 3/19
22:05:37 slot1: State change: IS_OWNER is false 3/19 22:05:37 slot1:
Changing state: Owner -> Unclaimed
And shadow file looks like: 3/19 22:05:35
****************************************************** 3/19 22:05:35
** condor_shadow (CONDOR_SHADOW) STARTING UP 3/19 22:05:35 **
/usr/local/condor/sbin/condor_shadow 3/19 22:05:35 ** $CondorVersion:
7.0.0 Jan 22 2008 BuildID: 72173 $ 3/19 22:05:35 ** $CondorPlatform:
I386-LINUX_RHEL3 $ 3/19 22:05:35 ** PID = 3516 3/19 22:05:35 ** Log
last touched 3/19 21:55:41 3/19 22:05:35
****************************************************** 3/19 22:05:35
Using config source: /usr/local/condor/etc/condor_config 3/19
22:05:35 Using local config sources: 3/19 22:05:35
/home/aa0586/pool/condor_config.local 3/19 22:05:35 DaemonCore:
Command Socket at 3/19 22:05:35 Initializing a JAVA shadow for job
9.0 3/19 22:05:36 (9.0) (3516): Request to run on was ACCEPTED 3/19
22:05:36 (9.0) (3516): ReliSock::put_file_with_permissions(): Failed
to stat file '/home/aa0586/pool/spool/cluster9.proc0.subproc0/java':
No such file or directory (errno: 2, si_error: 1) 3/19 22:05:36 (9.0)
(3516): DoUpload: (Condor error code 13, subcode 2) SHADOW at
192.168.0.20 failed to send file(s) to : error reading from
/home/aa0586/pool/spool/cluster9.proc0.subproc0/java: (errno 2) No
such file or directory; STARTER failed to receive file(s) from 3/19
22:05:36 (9.0) (3516): Job 9.0 going into Hold state (code 13,2):
Error from starter on slot1@xxxxxxxxxxxxxxxxxxxxx: STARTER failed to
receive file(s) from 3/19 22:05:36 (9.0) (3516): ZKM: setting default
map to (null) 3/19 22:05:36 (9.0) (3516): **** condor_shadow
(condor_SHADOW) EXITING WITH STATUS 112
_______________________________________________ Condor-users mailing
list To unsubscribe, send a message to
condor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You can
also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/