Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Globus error 129 with large files

Date: Fri, 11 Mar 2016 22:33:50 +0100
From: Emir Imamagic <eimamagi@xxxxxxx>
Subject: Re: [HTCondor-users] Globus error 129 with large files

On 11.3.2016. 21:39, Brian Bockelman wrote:

Is it possible that the threshold between working and not working is either 2.1GB (about 2^31) or 4.2GB (about 2^32)?  That would help narrow down the potential sources of error.


I performed the following tests:

The 2^31 test worked:
#!/bin/sh
dd if=/dev/zero of=./testmonkey bs=1M count=2048

The 2^32 test also worked :
#!/bin/sh
dd if=/dev/zero of=./testmonkey bs=1M count=4096

In both cases generated file was successfully transferred back toCondor-G submit machine.


Seems to me that problems start with 2^33:
#!/bin/sh
dd if=/dev/zero of=./testmonkey bs=1M count=8192

Job ended up successful, but only 719M was transferred back.

With 2^34 things get more complicated. Job ended, transfer back startedand then gahp_server on UI side started devouring memory until OOMkilled it:Mar 11 22:31:25 ui2 kernel: Out of memory: Kill process 1150031(gahp_server) score 911 or sacrifice childMar 11 22:31:25 ui2 kernel: Killed process 1150031, UID 500,(gahp_server) total-vm:9569828kB, anon-rss:7495604kB, file-rss:544kB

Interesting bit is that job did not end in H state, but instead condorrevived gahp_server and OOM killed it again. This continued up to thepoint when I deleted the job.


I failed to mention we're running CentOS 6 on both CE and submit machine.

Hope this helps
--
Emir Imamagic
SRCE - University of Zagreb University Computing Centre, www.srce.unizg.hr
Emir.Imamagic@xxxxxxx, tel: +385 1 616 5809, fax: +385 1 616 5559

Follow-Ups:
- Re: [HTCondor-users] Globus error 129 with large files
  - From: Jaime Frey

References:
- [HTCondor-users] Globus error 129 with large files
  - From: Emir Imamagic
- Re: [HTCondor-users] Globus error 129 with large files
  - From: Brian Bockelman

Prev by Date: Re: [HTCondor-users] Globus error 129 with large files
Next by Date: [HTCondor-users] Condor Jobs Logging for recurring jobs
Previous by thread: Re: [HTCondor-users] Globus error 129 with large files
Next by thread: Re: [HTCondor-users] Globus error 129 with large files
Index(es):
- Date
- Thread

Mailing List Archives

Authenticated access

Re: [HTCondor-users] Globus error 129 with large files