On 9/1/06, Christopher Mellen <Chris.Mellen@xxxxxxxxxx> wrote:
I'm trying to upload a very large input file, size in excess of 4Gb, using Condor 6.6.11 on a cluster of XP machines. The file continually fails with a 'failed to do upload message' (or similar) reported in the job logs. Looking at the ShadowLog on the submit machine I see the following errors reported : 9/1 13:40:38 Initializing a VANILLA shadow 9/1 13:40:38 (30740.0) (19384): Request to run on <xxx.xxx.xxx.176:4284> was ACCEPTED 9/1 13:41:35 (30742.0) (13248): ReliSock: put_file: TransmitFile() failed, errno=10022 9/1 13:41:35 (30742.0) (13248): ERROR "DoUpload: Failed to send file E:\Temp_2\\XXX_depthprocessed_ts.txt , exiting at 1399 " at line 1398 in file ..\src\condor_c++_util\file_transfer.C In the above it is the XXX_depthprocessed_ts.txt file that is > 4Gb. Hence the problem seems to be in src\condor_c++_util\file_transfer.C. Is this a Condor related fault or a fault in the underlying winsock file transfer mechanism ? Any ideas much appreciated ....
There is a bug in the file transfer on windows in 6.6 series when transferring > 2GB files in 6.6.11 this was mitigated by allowing the sum total of the files transferred to be > 2GB but (IIRC) no individual file can still be over 2 GB. (This is all totally fixed on 6.8 series) I am not 100% sure about this though because the bug I found would actually *work* on files from 4 to 6 GB since it was an int overflow bug. I suggest you split the input file into 4 files, each 1 GB and try transferring that way. If this works you * know it's a bug in 6.6.11 and you can move to 6.8 where this should be fixed * have a work around :) in the meantime If you are submitting a text file this big have you considered compressing it pre transfer and reading it with a automatic decompression stream? You may not be able to change your program on this front - but if you can may decent libraries exist to do this. Matt