Thanks guys for the suggestions.
We are already looking into the late materialization. Version update to newer versions like 23.x will take some time.Â
We noticed this issue can happen with smaller batches as well.
Ex: In this case we do have approx 3500 jobs in queue and submitting new batches, I see a batch of 20 jobs getting submitted in the queue and causing the slowness issue. Approx 20-30 jobs submitted per min in queue - this number is not huge. Even if we move to late materialization we can't go below this number otherwise we will be having a long delay in job completions.Â
I do see the following logs and silence in the log file when the condor_q output starts hanging.Â
06/07/24 13:06:53 (pid:54976) job_transforms for 147247.0: 1 considered, 1 applied (SetTestTeam)
06/07/24 13:06:53 (pid:54976) job_transforms for 147247.1: 1 considered, 1 applied (SetTestTeam)
06/07/24 13:06:53 (pid:54976) job_transforms for 147247.2: 1 considered, 1 applied (SetTestTeam)
06/07/24 13:06:53 (pid:54976) job_transforms for 147247.3: 1 considered, 1 applied (SetTestTeam)
06/07/24 13:06:53 (pid:54976) job_transforms for 147247.4: 1 considered, 1 applied (SetTestTeam)
06/07/24 13:06:53 (pid:54976) job_transforms for 147247.5: 1 considered, 1 applied (SetTestTeam)
06/07/24 13:06:53 (pid:54976) job_transforms for 147247.6: 1 considered, 1 applied (SetTestTeam)
06/07/24 13:06:53 (pid:54976) job_transforms for 147247.7: 1 considered, 1 applied (SetTestTeam)
06/07/24 13:06:53 (pid:54976) job_transforms for 147247.8: 1 considered, 1 applied (SetTestTeam)
06/07/24 13:06:53 (pid:54976) job_transforms for 147247.9: 1 considered, 1 applied (SetTestTeam)
06/07/24 13:06:53 (pid:54976) job_transforms for 147247.10: 1 considered, 1 applied (SetTestTeam)
06/07/24 13:06:53 (pid:54976) job_transforms for 147247.11: 1 considered, 1 applied (SetTestTeam)
06/07/24 13:06:53 (pid:54976) job_transforms for 147247.12: 1 considered, 1 applied (SetTestTeam)
06/07/24 13:06:53 (pid:54976) job_transforms for 147247.13: 1 considered, 1 applied (SetTestTeam)
06/07/24 13:06:53 (pid:54976) job_transforms for 147247.14: 1 considered, 1 applied (SetTestTeam)
06/07/24 13:06:53 (pid:54976) job_transforms for 147247.15: 1 considered, 1 applied (SetTestTeam)
06/07/24 13:06:53 (pid:54976) job_transforms for 147247.16: 1 considered, 1 applied (SetTestTeam)
06/07/24 13:06:53 (pid:54976) job_transforms for 147247.17: 1 considered, 1 applied (SetTestTeam)
06/07/24 13:06:53 (pid:54976) job_transforms for 147247.18: 1 considered, 1 applied (SetTestTeam)
06/07/24 13:06:53 (pid:54976) job_transforms for 147247.19: 1 considered, 1 applied (SetTestTeam)
06/07/24 13:06:53 (pid:54976) job_transforms for 147247.20: 1 considered, 1 applied (SetTestTeam)Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â<<<<< 3 min logs missing from log file.Â
06/07/24 13:09:43 (pid:54976) condor_write(): Socket closed when trying to write 47 bytes to <10.xx.xx.xx:28479>, fd is 25
Our transform condition is very simple to add a tag in the job.Â
Thanks & Regards,f
Vikrant Aggarwal