CHTC outage concluded on March 10; Unrelated HTC Squid proxy issue (jobs held)


Date: Fri, 11 Mar 2022 15:35:53 +0000
From: chtc-users@xxxxxxxxxxx
Subject: CHTC outage concluded on March 10; Unrelated HTC Squid proxy issue (jobs held)

Greetings CHTC Users,

 

All CHTC services (HPC cluster, all HTC execute nodes) were brought back up on Thursday (3/10) after server room maintenance was completed.

 

Separately, weâre working on a new issue impacting HTC jobs that transfer files via Squid, resulting in *some* Squid-dependent jobs going on hold for a reason like the following (visible in job log files for with âcondor_q -holdâ:

Error from <SLOTNAME>: FILETRANSFER:1:non-zero exit (1) from /usr/libexec/condor/curl_plugin. Error: Aborted due to lack of progress using http_proxy=http://squid-cs-b240.chtc.wisc.edu:3128 (http://proxy.chtc.wisc.edu/SQUID/chtc/R361.tar.gz)

 

Users can view jobs held for this specific reason by using the following command:
$ condor_q -constraint 'HoldReasonCode == 12 && HoldReasonSubCode == 256'

 

Once we have resolved this issue, we will follow up with instructions on how users can release just the impacted jobs to run again. Please otherwise leave them held in the queue, for now.

 

As always, please let us know if you notice any other issues by emailing chtc@xxxxxxxxxxxx

 

Best,

Your CHTC Team

 

[← Prev in Thread] Current Thread [Next in Thread→]
  • CHTC outage concluded on March 10; Unrelated HTC Squid proxy issue (jobs held), chtc-users <=