[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Using htcondor.EventLog to to study Condor File Transfer



Hi Joe,

Unfortunately, the file transfer event only contains the string saying what type of transfer occurred and what happened (i.e. Input started/finished, and output started/finished). The good news is you can process the message string from the event. While it is sad that it has to be done this way, it does work. I actually recently updated condor_watch_q to check the file transfer events to inform users of jobs doing input and output transfer. See the code snippet below (pulled from the source code) to see what we are currently doing:

 if event.type == htcondor.JobEventType.FILE_TRANSFER:
     new_status = None
     msg = str(event).lower()
     if "started" in msg:
         if "input" in msg:
             new_status = JobStatus.TRANSFERRING_INPUT
         elif "output" in msg:
              new_status = JobStatus.TRANSFERRING_OUTPUT

Note that for this tool we only care about when the respective transfer starts because other events will transition our counters to different and appropriate states.

Hope this helps,
Cole Bollig

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Joseph Areeda <newsreply@xxxxxxxxxx>
Sent: Thursday, April 10, 2025 6:16 PM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: [HTCondor-users] Using htcondor.EventLog to to study Condor File Transfer
 

I am trying to analyze job logs that use Condor Fie Transfer for containers. I am specifically interested in timing and error detection.

The log entries I want to look at with the htcondor2.JobEvent class:

000 (45036388.002.000) 2025-04-10 10:16:18 Job submitted from host: <10.14.0.39:9618...
...
040 (45036388.002.000) 2025-04-10 10:17:15 Started transferring input files
        Transferring to host: <10.14.9.111:9618?addrs=10.14.9.111-9618&alias=node2111.cluster.ldas.cit&noUDP&sock=slot1_6_12567>
...
040 (45036388.002.000) 2025-04-10 10:17:19 Finished transferring input files

My question for event type 40 how tell the difference between Started and Finished? I cannot find anything in the event object.
Maybe it is possible to get the text of the line that created the event?

How are  errors and retries reported?

Any suggestions will be appreciated
Joe