Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] idle job and "Request has not yet been considered by the matchmaker"

Date: Mon, 17 Oct 2016 12:17:16 -0500
From: Todd Tannenbaum <tannenba@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] idle job and "Request has not yet been considered by the matchmaker"

On 10/15/2016 9:43 PM, Francisco Pereira wrote:


In /var/log/condor/MatchLog, I see

"Matched <ID> <user> <IP for head:53694?addrs=IP for head-53694>
preempting node2<IP for node:13698?address=IP for node2-13698>
slot1@xxxxxxxxxxxxxxxxxxx

Both of these messages recur every minute or so.

On node2, only MASTER and STARTD are running, and neither of the
respective logs show any mention of this job (using tail -f to track at
the moment of submission.

/etc/condor/condor_config is precisely the same between node1 and node2.
The only difference between them is that, despite having the same domain
in their FQDN (netA.netB.netC) the actual subnets are different ( node 2
is in ip2.ipB.ipC, whereas node1 and the head node are in ip1.ipB.ipC).
/etc/hosts contains <IP> <name> <FQDN> for all three machines in each
one of them.


Hi Francisco,

Skimming you post, it looks like the job is being matched to the slot,but the schedd on the submit machine is unable to claim the machine.Just a quick thought - maybe this due to your HTCondor authorizationsettings. Do you see any permission denied messages in the node2StartLog (i.e. grep -i "permission" StartLog)? Perhaps you are missingone of the subnets in the config knobs ALLOW_WRITE or HOSTALLOW_WRITE.If you are using FQDN names (i.e. *.wisc.edu) in your [HOST]ALLOW_WRITE,be aware that the proper way to list your /etc/hosts on linux is "<IP><FQDN> <name>", not "<IP> <name> <FQDN>". See https://is.gd/yXyiDG for adiscussion. Most of the time it doesn't matter if DNS is in use, butmaybe it is causing you grief; HTCondor is pretty sensitive to how IPsare mapped back to FQDNs.

Another thought is perhaps there is an issue preempting a previous jobon node2 - do you still have problems running on node2 even when node2is completely idle?


hope the above helps,
Todd

References:
- [HTCondor-users] idle job and "Request has not yet been considered by the matchmaker"
  - From: Francisco Pereira

Prev by Date: [HTCondor-users] Jobs stay in running state after PID exits
Next by Date: Re: [HTCondor-users] schedd stopped working (died?) with SchedLog filled with 'WriteUserLog checking for event, log rotation, but no lock'
Previous by thread: [HTCondor-users] idle job and "Request has not yet been considered by the matchmaker"
Next by thread: [HTCondor-users] Read full CMD in condor_q output
Index(es):
- Date
- Thread

Mailing List Archives

Authenticated access

Re: [HTCondor-users] idle job and "Request has not yet been considered by the matchmaker"