Bump. Is it possible to get this looked at/changed? i.e. the storage max size referring to individual file sizes rather than the folder/directory size? We’re now at the stage of having to set each of our 5 central managers to keep viewhist data and querying them independently, and collating the data, rather than just using our 1 condorview server that they all report to. Even then, one of the largest pools (4400 slots) still loses data < 1 month old. Other pools (2380, 1650, 1440, 1200 slots) are OK for the moment, in terms of retaining viewhist data for at least 1 month before rolling over. Obviously we could split the largest pool into 2 but that seems a bit too kludgy, and would create deployment problems/issues for us. Thanks Cheers Greg From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Greg.Hitchen@xxxxxxxx Hi Steve It’s probably worth noting that we have exacerbated the issue as we have changed from the default time for the startd’s to send updates to the collector of 300 secs (5 mins) to 30 secs (in fact we’ve probably changed all 300 sec intervals in the config files to 30 secs). We have the size currently set to 2,000,000,000 (2Gb) but larger values give errors due to integer overflow (the default size is 10,000,000 (10Mb)). Cheers Greg From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Steven C Timm Hi— In earlier condor versions the viewhist files used to get much bigger than that (I’ve seen one grow to 2 GB). But now Greg is right, condor 7.6 and greater does limit the size of all of the viewhist, even though some of them never grow to the full size. It seems that someone has changed the definition of POOL_HISTORY_MAX_STORAGE—it used to be size of kilobytes And now it has gone to being bytes as with most other condor variables. I have 6000 cores in my pool and my condor_stats still goes back a full year, with pool_history_max_storage currently Set to 500,000,000. I would think that Greg should be able to boost the value further and get more data. If I had actually read the release notes correctly, would I have seen these changes mentioned? Steve From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Greg.Hitchen@xxxxxxxx Any chance this can get looked at so that we can store stats for > 1 month? It appears that the POOL_HISTORY_MAX_STORAGE applies to the whole directory, and makes the assumption that there will be 27 viewhist* files so therefore assumes that if all files reach max size then any one file can’t be > 66.7 Mb in size? Thanks Cheers Greg P.S. the silence was deafening from my previous post J (below) so should I be sending this to condor-admin instead? From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Greg.Hitchen@xxxxxxxx We keep pool history info in our Condor setup on our Condor ViewServer machine. This is a standalone machine that collects info from 5 separate Central Managers. For historical reasons we had forced condor to represent each machine as one “resource”, i.e. NUM_CPUS=1 We have recently enabled core detection and now have a total of ~ 10,000 cores in across all 5 pools. In recently using condor_stats to produce some monthly stats it’s become obvious that information is being lost, i.e. it appears we don’t have info going back far enough (> 1 month). Just increasing the POOL_HISTORY_MAX_STORAGE doesn’t work (currently set at 2000,000,000 = 2 Gb, increased to 20000000000 = 20Gb) as we get the following error message in CollectorLog. 04/04/12 13:21:05 ERROR "POOL_HISTORY_MAX_STORAGE in the condor configuration is out of bounds for an integer (20000000000). Please set it to an integer in the range -2147483648 to 2147483647 (default 10000000)." at line 1693 in file /home /condor/execute/dir_30458/userdir/src/condor_utils/condor_config.cpp From what I can see our viewhistory directory is only 737Mb in size (see below). There does seem to be some forced file rotation though at individual file sizes of ~66.7Mb. Can anyone confirm what’s meant to happen with these storage limits and file sizes and rotations? Thanks Cheers Greg >ll total 736780 -rw-r--r-- 1 condor condor 31434648 Apr 4 15:49 viewhist0.0.new -rw-r--r-- 1 condor condor 66666695 Aug 9 2011 viewhist0.0.old -rw-r--r-- 1 condor condor 41337118 Apr 4 15:37 viewhist0.1.new -rw-r--r-- 1 condor condor 333359 Apr 6 2006 viewhist0.1.old -rw-r--r-- 1 condor condor 10537682 Apr 4 15:21 viewhist0.2.new -rw-r--r-- 1 condor condor 333380 Jan 19 2006 viewhist0.2.old -rw-r--r-- 1 condor condor 42825820 Apr 4 15:49 viewhist1.0.new -rw-r--r-- 1 condor condor 67153008 Apr 4 09:36 viewhist1.0.old -rw-r--r-- 1 condor condor 27489661 Apr 4 15:37 viewhist1.1.new -rw-r--r-- 1 condor condor 66981236 Apr 3 23:24 viewhist1.1.old -rw-r--r-- 1 condor condor 35099274 Apr 4 15:21 viewhist1.2.new -rw-r--r-- 1 condor condor 66884552 Mar 31 17:43 viewhist1.2.old -rw-r--r-- 1 condor condor 1208195 Apr 4 15:49 viewhist2.0.new -rw-r--r-- 1 condor condor 66666869 Mar 27 18:51 viewhist2.0.old -rw-r--r-- 1 condor condor 1889444 Apr 4 15:37 viewhist2.1.new -rw-r--r-- 1 condor condor 66666889 Feb 15 05:28 viewhist2.1.old -rw-r--r-- 1 condor condor 17508889 Apr 4 15:21 viewhist2.2.new -rw-r--r-- 1 condor condor 333437 Mar 7 2006 viewhist2.2.old -rw-r--r-- 1 condor condor 41038505 Apr 4 15:49 viewhist3.0.new -rw-r--r-- 1 condor condor 66666970 Nov 4 2010 viewhist3.0.old -rw-r--r-- 1 condor condor 27010376 Apr 4 15:37 viewhist3.1.new -rw-r--r-- 1 condor condor 333372 Mar 15 2006 viewhist3.1.old -rw-r--r-- 1 condor condor 6818397 Apr 4 15:21 viewhist3.2.new -rw-r--r-- 1 condor condor 333371 Mar 13 2006 viewhist3.2.old -rw-r--r-- 1 condor condor 0 Sep 12 2005 viewhist4.0.new -rw-r--r-- 1 condor condor 0 Sep 12 2005 viewhist4.1.new -rw-r--r-- 1 condor condor 0 Sep 12 2005 viewhist4.2.new |