Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[HTCondor-users] condor_dagman not creating jobs
Hello
I've run into an issue where dagman seems to be unable to
create jobs because condor_submit segfaults.
.condor_dagman.out contains:
10/27/21 12:52:35 ERROR: submit attempt failed
10/27/21 12:52:35 submit command was: /usr/bin/condor_submit
-a dag_node_name' '=' 'job2 -a submit_event_notes' '=' 'DAG'
'Node:' 'job2 -a dagman_log' '='
'/mnt/scratch/tyuan/refit/./refit.prob.dag.nodes.log -a
+DAGManNodesMask' '='
'"0,1,2,4,5,7,9,10,11,12,13,16,17,24,27,35,36" -a JOB=job2
-a OUTPUT_DIR' '='
'/data/user/tyuan/studies/tablemaker/refits/prob -a
INPUT_DIR' '=' '/data/user/chill/photo-table -a FILE_NAME'
'='
'cascade_halftable_spice_3.2.1_flat_z0_zen100_azi180_nevents40000_0_range.fits
-a DAG_STATUS' '=' '2 -a FAILED_COUNT' '=' '1 -a
notification' '=' 'never -a +DAGParentNodeNames' '=' '""
refit.prob.sub
10/27/21 12:52:35 Job submit try 1/6 failed, will try again
in >= 1 second.
dmesg contains:
[2335469.858471] condor_submit[2260162]: segfault at a ip
00007efd3f70e2cb sp 00007ffd24306b40 error 4 in
libglobus_gsi_credential.so.1.6.14[7efd3f707000+9000]
[2335469.864387] Code: 00 48 c7 44 24 08 00 00 00 00 48 85
ff 74 07 e8 9b 93 ff ff 89 c5 4d 85 ff 74 3f 4c 8d 6c 24 08
49 8b 07 4c 89 ee 48 8b 40 20 <48> 8b 78 08 e8 bc 92 ff ff
85 c0 75 78 48 8b 03 48 8b 54 24 08 48
We are running version 9.0.6 on Centos 8.
My simple test dags seem to be fine, so it doesn't always
fail. Perhaps it has something to do with sending x509
proxies with the jobs?
Any help would be appreciated.
Vlad