Thank you very much Simon
I have tried to change acl, as I am running Condor not as root. but
it's always the same, and I have this in the StarterLog :
******************************************************
8/7 10:41:49 Using config source: /home/condor/condor_config
8/7 10:41:49 Using local config sources:
8/7 10:41:49 /home/condor/hosts/balsa/condor_config.local
8/7 10:41:49 DaemonCore: Command Socket at <143.234.88.55:63601>
8/7 10:41:49 Done setting resource limits
8/7 10:41:49 Communicating with shadow <143.234.88.55:63599>
8/7 10:41:49 Submitting machine is "balsa.macaulay.ac.uk"
8/7 10:41:50 File transfer completed successfully.
8/7 10:41:51 Starting a VANILLA universe job with ID: 176.0
8/7 10:41:51 IWD: /home/condor/hosts/balsa/execute/dir_24354
8/7 10:41:51 Output file: /home/condor/hosts/balsa/execute/
dir_24354/condor_output
8/7 10:41:51 Error file: /home/condor/hosts/balsa/execute/dir_24354/
condor_error
8/7 10:41:51 About to exec /home/condor/hosts/balsa/execute/
dir_24354/condor_exec.exe Simul --batch -cfg /home/sp5978/simul2/
configFiles/neutral/config-neutral0
8/7 10:41:51 Create_Process succeeded, pid=24357
8/7 10:42:03 Process exited, pid=24357, status=134
8/7 10:42:03 condor_write(): send() 65536 bytes to unknown source
returned -1, timeout=30, errno=32 (Broken pipe). Assuming failure.
8/7 10:42:03 ReliSock::put_bytes_nobuffer: Send failed.
8/7 10:42:03 ReliSock::put_file: failed to put 65536 bytes
(put_bytes_nobuffer() returned -1)
8/7 10:42:03 DoUpload: STARTER at 143.234.88.55 failed to send file
(s) to <143.234.88.55:63599>: error sending /home/condor/hosts/
balsa/execute/dir_24354/core.176.0; SHADOW at 143.234.88.55 failed
to receive file /home/sp5978/simul2/condorRes/neutral/out0/
condor_output
8/7 10:42:03 File transfer failed, forcing disconnect.
8/7 10:42:03 JIC::allJobsDone() failed, waiting for job lease to
expire or for a reconnect attempt
8/7 10:42:03 Accepted request to reconnect from <0.0.0.0:0>
8/7 10:42:03 Ignoring old shadow <143.234.88.55:63599>
8/7 10:42:03 Communicating with shadow <143.234.88.55:63599>
8/7 10:42:04 condor_write(): send() 65536 bytes to unknown source
returned -1, timeout=30, errno=32 (Broken pipe). Assuming failure.
8/7 10:42:04 ReliSock::put_bytes_nobuffer: Send failed.
8/7 10:42:04 ReliSock::put_file: failed to put 65536 bytes
(put_bytes_nobuffer() returned -1)
8/7 10:42:04 DoUpload: STARTER at 143.234.88.55 failed to send file
(s) to <143.234.88.55:63599>: error sending /home/condor/hosts/
balsa/execute/dir_24354/core.176.0; SHADOW at 143.234.88.55 failed
to receive file /home/sp5978/simul2/condorRes/neutral/out0/
condor_output
8/7 10:42:04 File transfer failed, forcing disconnect.
8/7 10:42:04 JIC::allJobsDone() failed, waiting for job lease to
expire or for a reconnect attempt
8/7 10:42:04 Accepted request to reconnect from <0.0.0.0:0>
8/7 10:42:04 Ignoring old shadow <143.234.88.55:63599>
8/7 10:42:04 Communicating with shadow <143.234.88.55:63599>
8/7 10:42:04 condor_write(): send() 65536 bytes to unknown source
returned -1, timeout=30, errno=32 (Broken pipe). Assuming failure.
8/7 10:42:04 ReliSock::put_bytes_nobuffer: Send failed.
8/7 10:42:04 ReliSock::put_file: failed to put 65536 bytes
(put_bytes_nobuffer() returned -1)
8/7 10:42:04 DoUpload: STARTER at 143.234.88.55 failed to send file
(s) to <143.234.88.55:63599>: error sending /home/condor/hosts/
balsa/execute/dir_24354/core.176.0; SHADOW at 143.234.88.55 failed
to receive file /home/sp5978/simul2/condorRes/neutral/out0/
condor_output
8/7 10:42:04 File transfer failed, forcing disconnect.
8/7 10:42:04 JIC::allJobsDone() failed, waiting for job lease to
expire or for a reconnect attempt
8/7 10:42:04 Accepted request to reconnect from <0.0.0.0:0>
8/7 10:42:04 Ignoring old shadow <143.234.88.55:63599>
8/7 10:42:04 Communicating with shadow <143.234.88.55:63599>
8/7 10:42:05 condor_write(): send() 65536 bytes to unknown source
returned -1, timeout=30, errno=32 (Broken pipe). Assuming failure.
8/7 10:42:05 ReliSock::put_bytes_nobuffer: Send failed.
8/7 10:42:05 ReliSock::put_file: failed to put 65536 bytes
(put_bytes_nobuffer() returned -1)
8/7 10:42:05 DoUpload: STARTER at 143.234.88.55 failed to send file
(s) to <143.234.88.55:63599>: error sending /home/condor/hosts/
balsa/execute/dir_24354/core.176.0; SHADOW at 143.234.88.55 failed
to receive file /home/sp5978/simul2/condorRes/neutral/out0/
condor_output
8/7 10:42:05 File transfer failed, forcing disconnect.
8/7 10:42:05 JIC::allJobsDone() failed, waiting for job lease to
expire or for a reconnect attempt
8/7 10:42:05 Accepted request to reconnect from <0.0.0.0:0>
8/7 10:42:05 Ignoring old shadow <143.234.88.55:63599>
8/7 10:42:05 Communicating with shadow <143.234.88.55:63599>
8/7 10:42:05 condor_write(): send() 65536 bytes to unknown source
returned -1, timeout=30, errno=32 (Broken pipe). Assuming failure.
8/7 10:42:05 ReliSock::put_bytes_nobuffer: Send failed.
8/7 10:42:05 ReliSock::put_file: failed to put 65536 bytes
(put_bytes_nobuffer() returned -1)
8/7 10:42:05 DoUpload: STARTER at 143.234.88.55 failed to send file
(s) to <143.234.88.55:63599>: error sending /home/condor/hosts/
balsa/execute/dir_24354/core.176.0; SHADOW at 143.234.88.55 failed
to receive file /home/sp5978/simul2/condorRes/neutral/out0/
condor_output
8/7 10:42:05 JIC::allJobsDone() failed, waiting for job lease to
expire or for a reconnect attempt
8/7 10:42:18 Got SIGQUIT. Performing fast shutdown.
8/7 10:42:18 ShutdownFast all jobs.
8/7 10:42:18 **** condor_starter (condor_STARTER) EXITING WITH
STATUS 0
I can see the results in the condor_output file, but the job
restarts.
"Simon Hammond" <simon.hammond@xxxxxxxxx> 07/08/2007 09:53 >>>
I guess you are running Condor not as root?
If not, you can use ACL's to give the user Condor is running as
access to
the file
e.g. setfacl -m u:condor:rwx ./myfile.txt
This will enable just the Condor user to read/write the file. You
may need
to adjust the mask to get this to work correctly.
On 07/08/07, Sophie Prieur <s.prieur@xxxxxxxxxxxxxx> wrote:
Hi everybody,
I have a problem when I submit a job, I have this in ShadowLog :
ReliSock::get_file_with_permissions(): Failed to chmod file
'/home/sp5978/simul2/condorRes/neutral/out1/condor_output': Not
owner
(errno: 1)
and this in StarterLog
DoUpload: STARTER at 143.234.88.55 failed to send file(s) to <
143.234.88.55:51883>; SHADOW at 143.234.88.55 failed to receive
file
/home/sp5978/simul2/condorRes/neutral/out0/condor_output
The submit file is this :
Universe = vanilla
Executable = /software/guiswarm/swarm-2.2/bin/javaswarm
Log = condor_log
Error = condor_error
Output = condor_output
getenv = true
requirements = ((( OpSys == "SOLARIS29" ) && ( Arch == "SUN4u" ))
|| ((
OpSys == "SOLARIS28" ) && ( Arch == "SUN4u" )))
transfer_input_files = /home/sp5978/simul2/bin/Simul.class,
/home/sp5978/simul2/bin/BatchSwarm.class,
/home/sp5978/simul2/bin/Beq0.class, /home/sp5978/simul2/bin/
Cell.class,
/home/sp5978/simul2/bin/DNA.class, /home/sp5978/simul2/bin/
DsupK.class,
/home/sp5978/simul2/bin/incorrectValue.class,
/home/sp5978/simul2/bin/Individual.class, /home/sp5978/simul2/bin/
Map.class,
/home/sp5978/simul2/bin/missingValue.class,
/home/sp5978/simul2/bin/ModelSwarm.class,
/home/sp5978/simul2/bin/noEnoughValues.class,
/home/sp5978/simul2/bin/ObserverSwarm.class,
/home/sp5978/simul2/bin/Param.class, /home/sp5978/simul2/bin/
Plant.class,
/home/sp5978/simul2/bin/Project.class, /home/sp5978/simul2/bin/
Seed.class,
/home/sp5978/simul2/bin/Specie.class,
/home/sp5978/simul2/bin/SwarmUtils.class
transfer_files = ALWAYS
InitialDir = /home/sp5978/simul2/condorRes/neutral/out0
Arguments = Simul --batch -cfg
/home/sp5978/simul2/configFiles/neutral/config-neutral0
Queue
InitialDir = /home/sp5978/simul2/condorRes/neutral/out1
Arguments = Simul --batch -cfg
/home/sp5978/simul2/configFiles/neutral/config-neutral1
Queue
And the right for the condor_output :
bash-2.03$ ls -l ../condorRes/neutral/out0
total 377
-rw-rw-r-- 1 sp5978 staff 0 Aug 6 11:18
condor_error
-rw-rw-r-- 1 sp5978 staff 370097 Aug 7 09:30 condor_log
-rwxrwxrwx 1 sp5978 staff 400 Aug 7 09:29
condor_output
The job is running, I can see the results in the condor_output
files but
it doesn't stop, it still remains in the queue and restart after a
while.
Can someone help me?
Thanks in advance
Sophie
--
Please note that the views expressed in this e-mail are those of
the
sender and do not necessarily represent the views of the Macaulay
Institute. This email and any attachments are confidential and are
intended solely for the use of the recipient(s) to whom they are
addressed. If you are not the intended recipient, you should not
read,
copy, disclose or rely on any information contained in this e-
mail, and
we would ask you to contact the sender immediately and delete the
email
from your system. Thank you.
Macaulay Institute and Associated Companies, Macaulay Drive,
Craigiebuckler, Aberdeen, AB15 8QH.
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/
--
Please note that the views expressed in this e-mail are those of the
sender and do not necessarily represent the views of the Macaulay
Institute. This email and any attachments are confidential and are
intended solely for the use of the recipient(s) to whom they are
addressed. If you are not the intended recipient, you should not
read,
copy, disclose or rely on any information contained in this e-mail,
and
we would ask you to contact the sender immediately and delete the
email
from your system. Thank you.
Macaulay Institute and Associated Companies, Macaulay Drive,
Craigiebuckler, Aberdeen, AB15 8QH.
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/