We do specify output and error log at the time of submission but didn't find any information in these files.Â
Are these not sufficient ? Or do we need to Âspecify "stream_error" in the submit file ?
cat /var/log/condor/StarterLog.slot34 | grep -A25 60990.13
02/28/25 15:05:14 (pid:543127) Job 60990.13 set to execute immediately
02/28/25 15:05:15 (pid:543127) Starting a VANILLA universe job with ID: 60990.13
02/28/25 15:05:15 (pid:543127) Checking to see if htcondor is a writeable cgroup
02/28/25 15:05:15 (pid:543127) Â Â Cgroup memory/htcondor is useable
02/28/25 15:05:15 (pid:543127) Â Â Cgroup cpu,cpuacct/htcondor is useable
02/28/25 15:05:15 (pid:543127) Â Â Cgroup freezer/htcondor is useable
02/28/25 15:05:15 (pid:543127) Current mount, /, is shared.
02/28/25 15:05:15 (pid:543127) Current mount, /, is shared.
02/28/25 15:05:15 (pid:543127) IWD: /sims-logs/pol/YN/208
02/28/25 15:05:15 (pid:543127) Output file: /sims-logs/pol/YN/208/logs/20250205.stdout
02/28/25 15:05:15 (pid:543127) Error file: /sims-logs/pol/YN/208/logs/20250205.stderr
02/28/25 15:05:15 (pid:543127) Renice expr "0" evaluated to 0
02/28/25 15:05:15 (pid:543127) Running job as user pol
02/28/25 15:05:15 (pid:543127) About to exec /sims-logs/pol/YN/208/bin/bin.20250228.2/eye /sims-logs/pol/YN/208/logs/20250228.150859.cfg 20250205 &
02/28/25 15:05:15 (pid:543127) Â Â Cgroup memory/htcondor is useable
02/28/25 15:05:15 (pid:543127) Â Â Cgroup cpu,cpuacct/htcondor is useable
02/28/25 15:05:15 (pid:543127) Â Â Cgroup freezer/htcondor is useable
02/28/25 15:05:15 (pid:543157) Moved process 543157 to cgroup /sys/fs/cgroup/memory/htcondor/condor_var_lib_condor_execute_slot34@ms-s5
02/28/25 15:05:15 (pid:543157) Moved process 543157 to cgroup /sys/fs/cgroup/cpu,cpuacct/htcondor/condor_var_lib_condor_execute_slot34@ms-s5
02/28/25 15:05:15 (pid:543157) Moved process 543157 to cgroup /sys/fs/cgroup/freezer/htcondor/condor_var_lib_condor_execute_slot34@ms-s5
02/28/25 15:05:15 (pid:543157) Moved process 543157 to cgroup /sys/fs/cgroup/devices/htcondor/condor_var_lib_condor_execute_slot34@ms-s5
02/28/25 15:05:15 (pid:543127) Create_Process succeeded, pid=543157
02/28/25 15:07:03 (pid:543127) Process exited, pid=543157, signal=902/28/25 15:07:03 (pid:543127) Failed to write ToE tag to .
job.ad file (13): Permission denied
02/28/25 15:07:03 (pid:543127) All jobs have exited... starter exiting
02/28/25 15:07:03 (pid:543127) **** condor_starter (condor_STARTER) pid 543127 EXITING WITH STATUS 0
02/28/25 15:36:08 (pid:543847) ******************************************************
Can you please help figure out what could be the reason for the job crash with signal 9 with the above detail.