Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] HTCondor file-transfer vs networked storage
- Date: Mon, 22 Aug 2022 21:47:49 +0000
- From: "Bockelman, Brian" <BBockelman@xxxxxxxxxxxxx>
- Subject: Re: [HTCondor-users] HTCondor file-transfer vs networked storage
Hi Matt,
I guess I wouldn't emphasize performance and throughput but rather reliability and predictability.
That is, when you use HTCondor file transfer's mechanism (even if it's just a plugin that stages things from the shared filesystem),
1) You know that condor won't start the job until the files are to a local disk, decreasing the likelihood a transient filesystem issue will fail out the job when it's 99% complete.
2) When a job fails, know that condor can tell you if the data staging was the underlying problem and provide a policy that's executed in such a situation.
Some users may value (1) much more highly than performance.
The reverse is also true -- some users might need absolute performance and run so few jobs that reliability is not relevant. It's all about tradeoffs and value systems in the end...
Brian
> On Aug 22, 2022, at 3:35 PM, Matthew T West via HTCondor-users <htcondor-users@xxxxxxxxxxx> wrote:
>
> Good evening Nick,
>
> I am readily aware of the value of HTCondor's file-transfer mechanism and associated sandboxing. But that wasn't my issue.
>
> My question was:
>
> When working on a single homogeneous compute cluster, are there any advantages to using HTCondor's file-transfer rather than working off shared network storage?
>
> So not a grid or distributed campus pool or pulling from remote storage, but a single homogeneous compute cluster in one location that includes a networked file-system. I do apologize if everything after the question in my original email confused matters. Here might be a better way to put it:
>
> Under what conditions does a shared file-server's degrade such that it would be better to work from local scratch, performance and throughput wise?
>
>
> Regards,
> Matt
>
> On 22/08/2022 20:37, Nick LeRoy wrote:
>> CAUTION: This email originated from outside of the organisation. Do not click links or open attachments unless you recognise the sender and know the content is safe.
>>
>>
>> On Sat, Aug 20, 2022 at 8:44 AM Matthew T West via HTCondor-users
>> <htcondor-users@xxxxxxxxxxx> wrote:
>>> Hi All,
>>>
>>> When working on a single homogeneous compute cluster, are there any advantages to using HTCondor's file-transfer rather than working off shared network storage? I guess it would depend on the network and storage speeds.
>>>
>>> It's just interesting that the "always work in local scratch" mindset I am used to is seen a serious backward step performance wise:
>>>
>>> Scratch therefore only useful if your network storage or interconnects are slow or saturated ... copying bulk data to local storage / getting all users to copy to local/scratch storage is a quick way to saturate your storage infrastructure.
>>>
>>> I can find other instances of this HPC conventional wisdom and it intuitively makes sense. But I don't understand networked storage well, so I am asking the HTCondor hivemind for their thoughts.
>> Matt,
>>
>> You need to remember that HTCondor can work in many different
>> environments, among these being WANs, campus-type structures, and
>> grids. For these types of scenarios, file transfer is preferable, if
>> not required.
>>
>> -Nick
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.cs.wisc.edu%2Fmailman%2Flistinfo%2Fhtcondor-users&data=05%7C01%7CM.T.West%40exeter.ac.uk%7C61d1edccc33040a29dbe08da847634a5%7C912a5d77fb984eeeaf321334d8f04a53%7C0%7C0%7C637967940467284512%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=bOzAtZtd%2Bt77nebH4TL92OuTF3rUFeP1V07VffPefBM%3D&reserved=0
>>
>> The archives can be found at:
>> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.cs.wisc.edu%2Farchive%2Fhtcondor-users%2F&data=05%7C01%7CM.T.West%40exeter.ac.uk%7C61d1edccc33040a29dbe08da847634a5%7C912a5d77fb984eeeaf321334d8f04a53%7C0%7C0%7C637967940467284512%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=8PjtvGZ08sVw63fyn1FndsweVGrh9GdFH2r1Z2scEZI%3D&reserved=0
>
> --
> Matthew T. West
> DevOps & HPC SysAdmin
> University of Exeter, Research IT
> www.exeter.ac.uk/research/researchcomputing/support/researchit
> 57 Laver Building, North Park Road, Exeter, EX4 4QE, United Kingdom
>
> Please note, I may send emails out of 'normal' working hours, as this fits my own work-life balance. I do not expect a response outside of your own working hours.
>
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/