Greetings CHTC users,
We have two quick announcements for your Wednesday morning.
- For high performance computing (HPC) cluster users:
- Yesterdayâs cluster maintenance took a bit longer than expected, but was completed successfully! As of 7:30 last night, the cluster was back to usual operation and jobs should be running again.
- For high throughput computing (HTC) system users:
- Due to a cooling issue in one of our server rooms, a subset of execute servers in our HTC system have been down since last night.
- Impact to users: jobs that were running on the impacted servers were interrupted but stayed in the queue and will be automatically re-run. The loss of this server room diminishes our overall capacity somewhat so you may see fewer jobs running in general.
- Users that use SQUID for file transfer should check for any jobs held with a message like "Error: Aborted due to lack of progress using http_proxy=http://squid-cs-b240.chtc.wisc.edu:3128," which can be safely released.
- There are no significant changes to the overall operation of the HTC system; users should continue to submit jobs as normal.
Thanks for your patience with the many emails this week - some of this was planned, but we obviously have experienced some unexpected issues this week. As always, contact us at
chtc@xxxxxxxxxxx with any questions or concerns.
Cheers,
Your CHTC team