Hi all, we have been observing several pool nodes crashing [1] and we assume that the crash pattern correlates with the overwhelming population of a node with particular users. So, we are thinking about how to foster the diversity of users per node. I.e., how to let a node preferably request jobs from other users/groups, if its currently allocated job slots belong 'mostly' to one user/group - except if there are no other user/groups waiting or the nominal share would be highly unfavourable? Maybe somebody has already a recipe for something like that? Cheers, Thomas [1] the FUSE kernel module crashed - apparently correlated with the number of open file handles via fuse
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature