Hi!
We are running a windows condor cluster configured with
dynamic slots. Recently we added to the pool a new 16-cores
machine and suddenly faced problems! Condor is unable to run
more than 9 jobs on this new node! Here is what the
StarterLog.slot1_10 is saying (the same with all slots upper
than 10):
StarterLog.slot1_10
===========================
02/01/14 10:05:38 Communicating with shadow
<###.###.###.###:61259>
02/01/14 10:05:38 Submitting machine is "###.###.###.###"
02/01/14 10:05:38 setting the orig job name in starter
02/01/14 10:05:38 setting the orig job iwd in starter
02/01/14 10:05:38 Account condor-reuse-slot1_10 creation
failed! (err=2202)
02/01/14 10:05:38 update_psid() failed after account
creation!
02/01/14 10:05:38 ERROR "Failed to create a user nobody"
at line 610 in file
c:\condor\execute\dir_29540\userdir\src\condor_utils\uids.cpp
02/01/14 10:05:38 ShutdownFast all jobs.
02/01/14 10:05:38 condor_read() failed: recv(fd=1460)
returned -1, errno = 10054 , reading 5 bytes from <
147.125.99.159:61298>.
02/01/14 10:05:38 IO: Failed to read packet header
02/01/14 10:05:38 Error disabling account
condor-reuse-slot1_10 (INVALID PARAMETER)
The problem source is more or less clear. We are not using
a "run_as_owner" mode and therefore condor creates a temporal
account on the running node. The account name has a template
"condor-reuse-slot<X>". Windows limits the account name
to 20 characters and therefore the name
"condor-reuse-slot1_10" cannot be created. This seems to be a
bug in condor!
Any ideas how to proceed?
Thanks,
Alexey