| This is normal for parallel universe.  The reason is that the execute nodes must configured to respond to a single dedicated scheduler, so only jobs submitted to that scheduler will ever run.
 You would split your execute nodes up by configuring ½ of them to use schedd A as the dedicated scheduler, and and 1/2 to use schedd B as the dedicated scheduler.  Then you could submit jobs to either schedd A and schedd B, but those jobs
 would never be able to use more than ½ of the execute nodes.  This is the same whether your schedd and/or execute nodes are Windows or Linux. -tj From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx>
On Behalf Of Sofya Urbaniec I can run multi-processors jobs now but in order to submit a multi-cpu parallel job I have to submit it from a dedicated scheduler. In this case, the master. It means I have to login to the remote machine and submit
 from there. Is this behavior expected? Can it be because I run it on Windows and it's it has some limitations?  Thank you. From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Sofya Urbaniec <Sofya.Urbaniec@xxxxxxxxxx>   Hello, I'm trying to configure  to enable Parallel Jobs on HTCondor pool running on Windows. I'm using Condor version 8.4.1 My condor_config on master:     ###################################################################### ## ##  condor_config ## ##  This is the global configuration file for condor. This is where ##  you define where the local config file is. Any settings ##  made here may potentially be overridden in the local configuration ##  file.  KEEP THAT IN MIND!  To double-check that a variable is ##  getting set from the coniguration file that you expect, use ##  condor_config_val -v <variable name> ## ##  condor_config.annotated is a more detailed sample config file ## ##  Unless otherwise specified, settings that are commented out show ##  the defaults that are used if you don't define a value.  Settings ##  that are defined here MUST BE DEFINED since they have no default ##  value. ## ###################################################################### ##  Where have you installed the bin, sbin and lib condor directories?    RELEASE_DIR = C:\condorLOCAL_DIR = $(RELEASE_DIR)LOCAL_CONFIG_FILE = $(LOCAL_DIR)\condor_config.localREQUIRE_LOCAL_CONFIG_FILE = TRUELOCAL_CONFIG_DIR = $(LOCAL_DIR)#SETTABLE_ATTRS_CONFIG = *SETTABLE_ATTRS_OWNER = TDVERSSTARTD_ATTRS = COLLECTOR_HOST_STRING, TDVERSCONDOR_HOST = $(FULL_HOSTNAME)COLLECTOR_NAME = thermalUID_DOMAIN = domain.comCONDOR_ADMIN = condor_admin_svc@xxxxxxxxxxSMTP_SERVER = smtp.domain.comALLOW_READ = *ALLOW_WRITE = $(CONDOR_HOST), $(IP_ADDRESS), *.domain.comALLOW_ADMINISTRATOR = $(IP_ADDRESS), *.domain.comJAVA = C:\PROGRA~2\Java\JRE18~1.0_6\bin\java.exeSTART = FALSEWANT_VACATE = FALSEWANT_SUSPEND = TRUE#  Dedicated Scheduler Config to enable Parallel Jobs.DedicatedScheduler = "DedicatedScheduler@<FQDN of master>" STARTD_ATTRS = $(STARTD_ATTRS),DedicatedSchedulerDAEMON_LIST = MASTER SCHEDD COLLECTOR NEGOTIATOR # Space X Additional ConfigurationMAX_JOBS_RUNNING=225START_SCHEDULER_UNIVERSE = TotalSchedulerJobsRunning < 225START_LOCAL_UNIVERSE = TotalLocalJobsRunning < 225CREDD_HOST = <FQDN of master>CREDD_CACHE_LOCALLY = TrueSTARTER_ALLOW_RUNAS_OWNER = TrueALLOW_CONFIG = condor_admin_svc@*HOSTALLOW_CONFIG = *.domain.comSEC_CLIENT_AUTHENTICATION_METHODS = NTSSPI, PASSWORDSEC_CONFIG_NEGOTIATION = REQUIREDSEC_CONFIG_AUTHENTICATION = REQUIREDSEC_CONFIG_ENCRYPTION = REQUIREDSEC_CONFIG_INTEGRITY = REQUIREDI did condor_reconfig -all and condor_restart I changed condor_config on two nodes out of 9 to see if it works. It's a condor config from one of the nodes: ##########################################################################  condor_config####  This is the global configuration file for condor. This is where##  you define where the local config file is. Any settings##  made here may potentially be overridden in the local configuration##  file.  KEEP THAT IN MIND!  To double-check that a variable is##  getting set from the configuration file that you expect, use##  condor_config_val -v <variable name>####  condor_config.annotated is a more detailed sample config file####  Unless otherwise specified, settings that are commented out show##  the defaults that are used if you don't define a value.  Settings##  that are defined here MUST BE DEFINED since they have no default##  value.##########################################################################  Where have you installed the bin, sbin and lib condor directories?   RELEASE_DIR = E:\condor##  Where is the local condor directory for each host?  This is where the local config file(s), logs and##  spool/execute directories are located. this is the default for Linux and Unix systems.#LOCAL_DIR = $(TILDE)##  this is the default on Windows sytemsLOCAL_DIR = $(RELEASE_DIR)##  Where is the machine-specific local config file for each host?LOCAL_CONFIG_FILE = $(LOCAL_DIR)\condor_config.local##  If your configuration is on a shared file system, then this might be a better default#LOCAL_CONFIG_FILE = $(RELEASE_DIR)\etc\$(HOSTNAME).local##  If the local config file is not present, is it an error? (WARNING: This is a potential security issue.)REQUIRE_LOCAL_CONFIG_FILE = FALSE##  The normal way to do configuration with RPMs is to read all of the##  files in a given directory that don't match a regex as configuration files.##  Config files are read in lexicographic order.LOCAL_CONFIG_DIR = $(LOCAL_DIR)\config#LOCAL_CONFIG_DIR_EXCLUDE_REGEXP = ^((\..*)|(.*~)|(#.*)|(.*\.rpmsave)|(.*\.rpmnew))$##  Use a host-based security policy. By default CONDOR_HOST and the local machine will be alloweduse SECURITY : HOST_BASED##  To expand your condor pool beyond a single host, set ALLOW_WRITE to match all of the hosts#ALLOW_WRITE = *.cs.wisc.edu##  FLOCK_FROM defines the machines that grant access to your pool via flocking. (i.e. these machines can join your pool).#FLOCK_FROM =##  FLOCK_TO defines the central managers that your schedd will advertise itself to (i.e. these pools will give matches to your schedd).FLOCK_TO = <FQDN of Master> ##--------------------------------------------------------------------## Values set by the condor_configure script:##--------------------------------------------------------------------JAVA = C:\Program Files (x86)\Java\jre7\bin\java.exeCONDOR_HOST = <FQDN of Master> UID_DOMAIN = domain.com CONDOR_ADMIN = condor_admin_svc@xxxxxxxxxx SMTP_SERVER = smtp.domain.com ALLOW_READ = * ALLOW_WRITE = $(CONDOR_HOST), $(IP_ADDRESS), *.doamin.com ALLOW_ADMINISTRATOR = $(IP_ADDRESS) JAVA = C:\PROGRA~2\Java\JRE18~1.0_6\bin\java.exe DAEMON_LIST = MASTER SCHEDD STARTD KBDD # Dedicated Scheduler DedicatedScheduler = "DedicatedScheduler@<FQDN of Master>" STARTD_ATTRS = $(STARTD_ATTRS), DedicatedScheduler RANK_FACTOR = 10000 RANK = (Scheduler =?= $(DedicatedScheduler) * $(RANK_FACTOR)) # Space X Additional Configuration CREDD_HOST = <FQDN of Master> CREDD_CACHE_LOCALLY = True STARTER_ALLOW_RUNAS_OWNER = True ALLOW_CONFIG = condor_admin_svc@* HOSTALLOW_CONFIG = $(IP_ADDRESS),*.domain.com SEC_CLIENT_AUTHENTICATION_METHODS = NTSSPI, PASSWORD SEC_CONFIG_NEGOTIATION = REQUIRED SEC_CONFIG_AUTHENTICATION = REQUIRED SEC_CONFIG_ENCRYPTION = REQUIRED SEC_CONFIG_INTEGRITY = REQUIRED SLOTS_CONNECTED_TO_CONSOLE = 2 SLOTS_CONNECTED_TO_KEYBOARD = 2 NonCondorLoadAvg = (LoadAvg - CondorLoadAvg) HighLoad = 1.0 BgndLoad = 0.3 CPU_Busy = ($(NonCondorLoadAvg) >= $(HighLoad)) CPU_Idle = ($(NonCondorLoadAvg) <= $(BgndLoad)) KeyboardBusy = (KeyboardIdle < 10) MachineBusy = ($(CPU_Busy) || $(KeyboardBusy)) ActivityTimer = (CurrentTime - EnteredCurrentActivity) START = $(CPU_Idle) && KeyboardIdle > 300 SUSPEND = $(MachineBusy) CONTINUE = $(CPU_Idle) && KeyboardIdle > 120 PREEMPT = (Activity == "Suspended") && $(ActivityTimer) > 300 SUSPEND = Scheduler =!= $(DedicatedScheduler) && ($(SUSPEND)) PREEMPT = Scheduler =!= $(DedicatedScheduler) && ($(PREEMPT)) START = (Scheduler =?= $(DedicatedScheduler)) || ($(START)) KILL = $(ActivityTimer) > 300 SETTABLE_ATTRS_CONFIG = * SETTABLE_ATTRS_OWNER = TDVERS STARTD_ATTRS = COLLECTOR_HOST_STRING, TDVERS TDVERS = "5.8" I did condor_reconfig -all and condor_restart But if I submit a parallel job it stack forever in idle mode. This is an example of the job: universe = parallelshould_transfer_files = Yeswhen_to_transfer_output = ON_EXITnotify_user = <email address>machine_count = 1request_cpus = 2notification = Alwaysrun_as_owner = truegetenv = truelog = sleep_log.txtoutput = sleep_stdout.txterror = sleep_stderr.txt executable = sleep.batqueue Please advise. Thank you.  |