Re: [HTCondor-users] Download called during active transfer

Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

Well, there are many messages before, because this was the Nth attempt, but they all look the same, no different error messages. It may well be just a problem accessing the storage, but the error message seemed confusing to me.

Now, I just noticed that this job in particular eventually succeeded not much later, and if I read correctly the XferStatsLog info, the job had 40 thousand input files:

12/09/25 15:29:58 (pid:3543472) (13437533.0) (3543468): File Transfer Upload: JobId: 13437533.0 files: 40542 bytes: 277927438 seconds: 43.09 dest: 188.185.196.91 rto: 201000 ato: 40000 snd_mss: 1448 rcv_mss: 755 unacked: 0 sacked: 0 lost: 0 retrans: 0 fackets: 0 pmtu: 1500 rcv_ssthresh: 31820 rtt: 324 snd_ssthresh: 136 snd_cwnd: 195 advmss: 1448 reordering: 68 rcv_rtt: 0 rcv_space: 14600 total_retrans: 55

12/09/25 15:40:07 (pid:3545504) (13437533.0) (3543468): File Transfer Download: JobId: 13437533.0 files: 9 bytes: 133301 seconds: 0.04 dest: 188.185.196.91 rto: 201000 ato: 40000 snd_mss: 1448 rcv_mss: 1448 unacked: 1 sacked: 0 lost: 0 retrans: 0 fackets: 0 pmtu: 1500 rcv_ssthresh: 66490 rtt: 225 snd_ssthresh: 2147483647 snd_cwnd: 10 advmss: 1448 reordering: 3 rcv_rtt: 300 rcv_space: 72117 total_retrans: 0

There were also many such jobs from the same user at the same time. And they probably access similar or the same files. That amount of file transfers is probably a good candidate for one failure or another...

From: John M Knoeller <johnkn@xxxxxxxxxxx>
Sent: Tuesday, December 9, 2025 8:29 PM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Cc: Antonio Delgado Peris <Antonio.Delgado.Peris@xxxxxxx>
Subject: Re: Download called during active transfer

These messages look to me like the consequence of an earlier failure. Is there no indication of a problem in either log before 15:28:17?

-tj

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Antonio Delgado Peris via HTCondor-users <htcondor-users@xxxxxxxxxxx>
Sent: Tuesday, December 9, 2025 12:01 PM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Cc: Antonio Delgado Peris <Antonio.Delgado.Peris@xxxxxxx>
Subject: [HTCondor-users] Download called during active transfer

Hi,

We’re running 24.0.7 in schedds (24.0.3 in workers), and we’re seeing some user jobs being restarted many times due to the shadow exiting on an exception like the following:

12/09/25 15:28:18 (pid:3543188) (13437533.0) (3543188): ERROR "FileTransfer::Download called during active transfer!" at line 2044 in file /var/lib/condor/execute/slot1/dir_2606725/userdir/build-qRBc1D/BUILD/condor-24.0.7/src/condor_utils/file_transfer.cpp

12/09/25 15:28:18 (pid:3543188) (13437533.0) (3543188): Daemon exiting before all child processes gone; killing 3543192

This is matched by the following on the starter:

12/09/25 15:28:17 (pid:2029580) File transfer failed (status=0).

12/09/25 15:28:17 (pid:2029580) Failed to transfer files: reason unknown.

12/09/25 15:28:17 (pid:2029580) Skipping execution of Job 13437533.0 because of setup failure.

12/09/25 15:28:18 (pid:2029580) condor_read(): Socket closed abnormally when trying to read 5 bytes from daemon at <188.185.121.235:9618>, errno=104 Connection reset by peer

12/09/25 15:28:18 (pid:2029580) Failed to receive GoAhead message from 188.185.121.235.

12/09/25 15:28:18 (pid:2029580) DoUpload: exiting at 5220

The jobs are retried and eventually they do succeed, or they hit the max retries number and are put on hold.

I’m seeing this recently for particular schedds and users, and I don’t know if it’s just caused by errors when accessing the input files of the job (AFS), but the error seems to indicate that a transfer for a file was already active when a new one was initiated and that immediately causes the shadow to exit. If that’s really the case, how could that happen? Has somebody seen something similar?

Thank you!

Cheers,

Antonio

Mailing List Archives

Authenticated access

Re: [HTCondor-users] Download called during active transfer