|
These messages look to me like the consequence of an earlier failure. Is there no indication of a problem in either log before 15:28:17?
-tj
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Antonio Delgado Peris via HTCondor-users <htcondor-users@xxxxxxxxxxx>
Sent: Tuesday, December 9, 2025 12:01 PM To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx> Cc: Antonio Delgado Peris <Antonio.Delgado.Peris@xxxxxxx> Subject: [HTCondor-users] Download called during active transfer Hi,
We’re running 24.0.7 in schedds (24.0.3 in workers), and we’re seeing some user jobs being restarted many times due to the shadow exiting on an exception like the following:
12/09/25 15:28:18 (pid:3543188) (13437533.0) (3543188): ERROR "FileTransfer::Download called during active transfer!" at line 2044 in file /var/lib/condor/execute/slot1/dir_2606725/userdir/build-qRBc1D/BUILD/condor-24.0.7/src/condor_utils/file_transfer.cpp 12/09/25 15:28:18 (pid:3543188) (13437533.0) (3543188): Daemon exiting before all child processes gone; killing 3543192
This is matched by the following on the starter:
12/09/25 15:28:17 (pid:2029580) File transfer failed (status=0). 12/09/25 15:28:17 (pid:2029580) Failed to transfer files: reason unknown. 12/09/25 15:28:17 (pid:2029580) Skipping execution of Job 13437533.0 because of setup failure. 12/09/25 15:28:18 (pid:2029580) condor_read(): Socket closed abnormally when trying to read 5 bytes from daemon at <188.185.121.235:9618>, errno=104 Connection reset by peer 12/09/25 15:28:18 (pid:2029580) Failed to receive GoAhead message from 188.185.121.235. 12/09/25 15:28:18 (pid:2029580) DoUpload: exiting at 5220
The jobs are retried and eventually they do succeed, or they hit the max retries number and are put on hold.
I’m seeing this recently for particular schedds and users, and I don’t know if it’s just caused by errors when accessing the input files of the job (AFS), but the error seems to indicate that a transfer for a file was already active when a new one was initiated and that immediately causes the shadow to exit. If that’s really the case, how could that happen? Has somebody seen something similar?
Thank you!
Cheers, Antonio
|