There must be something in the machine classad about the requirements
of what jobs it will start. Can you give a dump of condor_status -l
for one of the machines?
Steve Timm
On Wed, 31 Aug 2011, Mark Cafaro wrote:
Hi Garrett,
The job was successfully matched in the central manager's MatchLog (edited to
remove ip and port):
08/31/11 20:05:16 Matched 27.0 user@...washington.edu <ip:port>
preempting none <ip:port> slot1@...washington.edu
On the node's StartLog is where I see it being rejected:
08/31/11 20:05:16 slot1: match_info called
08/31/11 20:05:16 slot1: Received match <ip:port>#1314844613#28#...
08/31/11 20:05:16 slot1: State change: match notification protocol successful
08/31/11 20:05:16 slot1: Changing state: Unclaimed -> Matched
08/31/11 20:05:16 slot1: Job requirements not satisfied.
08/31/11 20:05:16 slot1: Request to claim resource refused.
08/31/11 20:05:16 slot1: State change: claiming protocol failed
08/31/11 20:05:16 slot1: Changing state: Matched -> Owner
08/31/11 20:05:16 slot1: State change: IS_OWNER is false
08/31/11 20:05:16 slot1: Changing state: Owner -> Unclaimed
condor_q -better-analyze returns:
027.000: Request has not yet been considered by the matchmaker.
because it was successfully matched.
Unfortunately I have been through all of the logs and there is no indication
of a problem anywhere except for the line "Job requirements not satisfied."
On Aug 31, 2011, at 7:45 PM, Koller, Garrett wrote:
Mr. Cafaro,
I'm confused. I thought the problem was that the job kept being rejected
with the error "Job requirements not satisfied." If that is so, how could
it be matched in the MatchLog? Was it just considered in the MatchLog or
was it actually assigned to a specific slot on a specific computer? If the
MatchLog says it found a proper match and actually assigned it to that
computer, check out
http://servo.cs.wlu.edu/dokuwiki/doku.php/condor/submit/troubleshoot for a
possible reason and solution to this problem.
Also run 'condor_q -better-analyze' for a more in-depth look on why your
job is being rejected. If the job is being rejected because of its
requirements, this should tell you specifically which requirement is
failing.
Either way, let me know if this helps and what you find out.
Best Regards,
~ Garrett Heath Koller
kollerg14@xxxxxxxxxxxx
Computer Science Major
Member of the Fraternity
Washington and Lee University
Undergraduate Class of 2014
P.O. Box 970
Lexington, VA 24450
Cell: (918) 246-6374
On Aug 31, 2011, at 10:17 PM, Mark Cafaro wrote:
No luck there either. That should certainly evaluate to true.
I am just about out of ideas. The only thing I can gather from the logs is
"Job requirements not satisfied." and condor_q -analyze says "Request has
not yet been considered by the matchmaker." apparently because the match
was made (I can see it in the MatchLog).
I am desperately hoping this is not a platform specific bug. We're on the
often forgotten Macintosh.
On Aug 31, 2011, at 7:00 PM, Koller, Garrett wrote:
Mr. Cafaro,
Sure, that's easy. Just run 'condor_status -long | grep
^IsValidCheckpointPlatform' to see the expression that defines the value
for "IsValidCheckpointPlatform". The expression depends a lot on the job
being submitted. Because of this, note that in this expression "MY.*"
refers to a variable in the machine's ClassAd (will be listed in
'condor_status -long') and "TARGET.*" refers to a variable in the job's
ClassAd (will be listed in 'condor_q -long').
Best Regards,
~ Garrett K.
Washington and Lee University
condor.cs.wlu.edu
On Aug 31, 2011, at 9:51 PM, Mark Cafaro wrote:
Hi Garrett,
I have investigated this possibility and found it is likely not causing
our problem. Requirements is appended, but I can overwrite the appended
requirements with condor_qedit. In either case, I would not expect a
match to be made if the manager wasn't able to match the requirements
with the node. The manager matchs, but the node refuses.
I am wondering if this doesn't have to do with the fact that the node
has:
Requirements = ( START ) && ( IsValidCheckpointPlatform )
I can't be sure that isValidCheckpointPlatform evaluates to true on my
platform. Is there any way to determine
this?
On Aug 31, 2011, at 6:37 PM, Koller, Garrett wrote:
Mr. Cafaro,
The job's requirements expression is probably being appended to after
it is submitted. Usually, the requirements in the submission file are
logically and-ed (&&) with an expression that says what the job needs
from its execution machine in terms of file transfer. When the job is
in the queue, run something like 'condor_q -long <Job_Cluster_ID> |
grep -i ^Requirements', where <Job_Cluster_ID> is the ID for the job
you just submitted. There you will see the Requirement expression in
its entirety. Most likely, you are asking Condor to do a file transfer
mechanism that isn't supported by your environment. See Section 2.5.4,
"Submitting Jobs Without a Shared File System: Condors File Transfer
Mechanism," in the Condor manual (7.6.1 for me) for more information
and note when it talks about "FileSystemDomain" and the like as this is
one of the things appended to the job's Requirements expression
depending on the type of file transfer desired.
Best Regards,
~ Garrett K.
Washington and Lee University
condor.cs.wlu.edu
On Aug 31, 2011, at 9:18 PM, Mark Cafaro wrote:
I am submitting sh_loop.cmd (from the condor examples) to my manager.
It matches with a node and sends the job off. The node, however,
refuses to accept the job claiming "Job requirements not satisfied.".
The job is set with Requirements = TRUE. How can requirements not be
satisfied and how can a match be made if the requirements were not
satisfied?
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with
a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with
a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/