What happens:
- it goes to the queue
- it is removed from the queue
- it does not run (log files empty and file in tmp is not created) and
it seems to fall into some black hole... :(
The maximum that I could reach that shows any error is the ShadowLog
file, that gives me:
12/01/17 18:51:25 (?.?) (74915):******* Standard Shadow starting up *******
12/01/17 18:51:25 (?.?) (74915):** $CondorVersion: 8.4.12 Jul 06 2017
BuildID: 409562 $
12/01/17 18:51:25 (?.?) (74915):** $CondorPlatform: x86_64_Ubuntu14 $
12/01/17 18:51:25 (?.?) (74915):*******************************************
12/01/17 18:51:25 (?.?) (74915):uid=0, euid=122, gid=0, egid=131
12/01/17 18:51:25 (?.?) (74915):Hostname =
"<xxx.xxx.xxx.xxx:17345?addrs=xxx.xxx.xxx.xxx-17345>", Job = 37.0
12/01/17 18:51:25 (37.0) (74915):Requesting Primary Starter
12/01/17 18:51:25 (37.0) (74915):Shadow: Request to run a job was ACCEPTED
12/01/17 18:51:25 (37.0) (74915):Shadow: RSC_SOCK connected, fd = 17
12/01/17 18:51:25 (37.0) (74915):Shadow: CLIENT_LOG connected, fd = 18
12/01/17 18:51:25 (37.0) (74915):My_Filesystem_Domain = "my domain"
12/01/17 18:51:25 (37.0) (74915):My_UID_Domain = "my domain"
12/01/17 18:51:25 (37.0) (74915):*Can't get address for checkpoint
server host (NULL): No such file or directory*
12/01/17 18:51:25 (37.0) (74915):ÂÂÂ Entering pseudo_get_file_stream
12/01/17 18:51:25 (37.0) (74915):ÂÂÂ file =
"/var/lib/condor/spool/37/cluster37.ickpt.subproc0"
12/01/17 18:51:25 (37.0) (74915):Created TCP listen socket
<xxx.xxx.xxx.xxx:41412>
12/01/17 18:51:25 (37.0) (74915):Shadow: Job 37.0 exited, termsig = 0,
coredump = 128, retcode = 0
12/01/17 18:51:25 (37.0) (74915):user_time = 1 ticks
12/01/17 18:51:25 (37.0) (74915):sys_time = 0 ticks
12/01/17 18:51:25 (37.0) (74915):*Shadow: Cannot notify user( Condor Job
37.0, tavares, w )*
12/01/17 18:51:25 (37.0) (74915):Static Policy: removing job because
OnExitRemove has become true
12/01/17 18:51:25 (37.0) (74915):********** Shadow Exiting(102) **********
Just to keep it simple, I'd rather to avoid to use the checkpoint
server. Is it possible?
I'm a little clueless now... can you give me any help on that?
Thank you!!!!
Roberto
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/