hello,
I am trying to submit a job to a specific machine in my test pool. Both the code and the submit file have been tested with other nodes, and the only difference this time is
Requirements = (Machine == "node2.netA.netB.netC")
When running condor_q I see that it is listed as I(dle). With
condor_q -analyze <job ID>
I see
  <ID>: Request has not yet been considered by the matchmaker.
  (...)
  <ID>: Run analysis summary. Of 23 machines,
  13 are rejected by your job's requirements
  Â0 reject your job because of their own requirements
  Â0 match but are serving other users
  Â10 are available to run your job
and indeed there are 10 slots in node2.netA.netB.netC (the remaining 13 are in the head node and another node). The only suggestion is to remove the machine specific requirement.
Looking at Â/var/log/condor/NegotiatorLog, I see
"Successfully matched with slot@xxxxxxxxxxxxxxxxxxxx"
In /var/log/condor/MatchLog, I see
"Matched <ID> <user> <IP for head:53694?addrs=IP for head-53694> preempting node2<IP for node:13698?address=IP for node2-13698> slot1@xxxxxxxxxxxxxxxxxxx
Both of these messages recur every minute or so.
On node2, only MASTER and STARTD are running, and neither of the respective logs show any mention of this job (using tail -f to track at the moment of submission.
/etc/condor/condor_config is precisely the same between node1 and node2. The only difference between them is that, despite having the same domain in their FQDN (netA.netB.netC) the actual subnets are different ( node 2 is in ip2.ipB.ipC, whereas node1 and the head node are in ip1.ipB.ipC). /etc/hosts contains <IP> <name> <FQDN> for all three machines in each one of them. In all condor_configs, I use
FILESYSTEM_DOMAIN = netA.netB.netC
UID_DOMAIN = netA.netB.netC
DEFAULT_DOMAIN_NAME = netA.netB.netC
TRUST_UID_DOMAIN = netA.netB.netC
SOFT_UID_DOMAIN = netA.netB.netC
TRUST_UID_DOMAIN = TRUE
STARTER_ALLOW_RUNAS_OWNER = TRUE
All nodes use home directories exported from the head node via NFS, and have matching UIDs and GIDs.
Might someone have encountered this situation? Given the absence of any relevant information on the logs of node2 I am at a loss as to how to proceed...
thank you for any help!
Francisco