HTCondor Project List Archives



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-devel] View Collector & condor_stats



Hello,

I'm investigating a rust ticket where the user reports this output:

$ condor_stats -pool <myCM> -lasthours 1 -resgrouplist
INTEL/WINNT61
Total
INTEL/WINNT51
INTEL/WINNT52
X86_64/LINUX

They are expecting this output instead:

$ condor_stats -pool <myCM> -lasthours 1 -resgrouplist
Total
INTEL/WINNT61
INTEL/WINNT51
INTEL/WINNT52
X86_64/LINUX

So I'm looking into it. I hypothesize that the entries are being printed out
in hash table order and we've gotten "lucky" for, uh, since the beginning of
time that Total came out in the position it did in the viewserver files.

However....

While stumbling though the control flow in view_server.cpp, seeing where
"Total" is altered and how it gets written out, I found the following
function detailed below (from V7_4_2-branch, where the problem was
reported).

I've marked the line I'm curious about with an asterik in column 1.

Why does st have 1 subtracted from it? Wouldn't that be updating the
wrong totals bucket thereby invalidating many years of totals in the
view collector? 

Thanks.

-pete

//---------------------------------------------------------------------
// Scan function for the startd data
//---------------------------------------------------------------------

int ViewServer::StartdScanFunc(ClassAd* cad)
{
    char Name[200] = "";
    char StateDesc[50];
    float LoadAvg;
    int KbdIdle;
    
    // Get Data From Class Ad

    if ( !cad->LookupString(ATTR_NAME,Name) ) return 1;
    if ( !cad->LookupInteger(ATTR_KEYBOARD_IDLE,KbdIdle) ) KbdIdle=0;
    if ( !cad->LookupFloat(ATTR_LOAD_AVG,LoadAvg) ) LoadAvg=0;
    if ( !cad->LookupString(ATTR_STATE,StateDesc) ) strcpy(StateDesc,"");
    State StateEnum=string_to_state( StateDesc );

    // This block should be kept in sync with view_server.h and
    // condor_state.h.
    ViewStates st = VIEW_STATE_UNDEFINED;
    switch(StateEnum) {
    case owner_state:
        st=VIEW_STATE_OWNER;
        break;
    case preempting_state:
        st=VIEW_STATE_PREEMPTING;
        break;
    case claimed_state:
        st=VIEW_STATE_CLAIMED;
        break;
    case matched_state:
        st=VIEW_STATE_MATCHED;
        break;
    case unclaimed_state:
        st=VIEW_STATE_UNCLAIMED;
        break;
    case shutdown_state:
        st=VIEW_STATE_SHUTDOWN;
        break;
    case delete_state:
        st=VIEW_STATE_DELETE;
        break;
    case backfill_state:
        st=VIEW_STATE_BACKFILL;
        break;
    default:
        dprintf( D_ALWAYS,
                 "WARNING: Unknown machine state %d from '%s' (ignoring)",
                 (int)StateEnum, Name );
        return 1;
    }

    // Get Group Name

    char tmp[200];
    if (cad->LookupString(ATTR_ARCH,tmp)<0) strcpy(tmp,"Unknown");
    MyString GroupName=MyString(tmp)+"/";
    if (cad->LookupString(ATTR_OPSYS,tmp)<0) strcpy(tmp,"Unknown");
    GroupName+=tmp;

    // Add to group Totals
    // NRL: I'm not sure exactly what this block of code does, but
    // now it at least does it safely.  It's obviously updating
    // the GroupHash <group name> and "Total" chunks.

*   int group_index = (int)st - 1;
    if( group_index > (int)VIEW_STATE_MAX_OFFSET ) {
        EXCEPT( "Invalid group_index = %d (max %d)",
                group_index, (int)VIEW_STATE_MAX_OFFSET );
    }
    GeneralRecord* GenRec=GetAccData(GroupHash,"Total");
    ASSERT( GenRec );
    GenRec->Data[group_index] += 1.0;

    GenRec=GetAccData(GroupHash,GroupName);
    ASSERT( GenRec );
    GenRec->Data[group_index] += 1.0;
    
    // Add to accumulated data

    int NumSamples;
    for (int j=0; j<HistoryLevels; j++) {
        GenRec=GetAccData(DataSet[StartdData][j].AccData, Name);
        ASSERT( GenRec );

        NumSamples=DataSet[StartdData][j].NumSamples;
        GenRec->Data[0]=(GenRec->Data[0]*NumSamples+KbdIdle)/(NumSamples+1);
        GenRec->Data[1]=(GenRec->Data[1]*NumSamples+LoadAvg)/(NumSamples+1);
        GenRec->Data[2]=st;
    }

    return 1;
}