Hi!
We are running a windows condor cluster configured with dynamic slots. Recently we added to the pool a new 16-cores machine and suddenly faced problems! Condor is unable to run more than 9 jobs on this new node! Here is what the StarterLog.slot1_10 is saying (the same with all slots upper than 10):
StarterLog.slot1_10
===========================
02/01/14 10:05:38 Communicating with shadow <###.###.###.###:61259>
02/01/14 10:05:38 Submitting machine is "###.###.###.###"
02/01/14 10:05:38 setting the orig job name in starter
02/01/14 10:05:38 setting the orig job iwd in starter
02/01/14 10:05:38 Account condor-reuse-slot1_10 creation failed! (err=2202)
02/01/14 10:05:38 update_psid() failed after account creation!
02/01/14 10:05:38 ERROR "Failed to create a user nobody" at line 610 in file c:\condor\execute\dir_29540\userdir\src\condor_utils\uids.cpp
02/01/14 10:05:38 ShutdownFast all jobs.
02/01/14 10:05:38 condor_read() failed: recv(fd=1460) returned -1, errno = 10054 , reading 5 bytes from <
147.125.99.159:61298>.
02/01/14 10:05:38 IO: Failed to read packet header
02/01/14 10:05:38 Error disabling account condor-reuse-slot1_10 (INVALID PARAMETER)
The problem source is more or less clear. We are not using a "run_as_owner" mode and therefore condor creates a temporal account on the running node. The account name has a template "condor-reuse-slot<X>". Windows limits the account name to 20 characters and therefore the name "condor-reuse-slot1_10" cannot be created. This seems to be a bug in condor!
Any ideas how to proceed?
Thanks,
Alexey